Skip to content

sambhav37/sparkDetective

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SparkDetective

SparkDetective is a tool for performance troubleshooting of Apache Spark jobs

SparkDetective simplifies the collection and analysis of Spark performance metrics.
Use sparkDetective for troubleshooting interactive and batch Spark workloads.
Use it also to collect metrics for long-term retention or as part of a CI/CD pipeline.
SparkDetective is also intended as a working example of how to use Spark Listeners for collecting Spark task metrics data.

  • Main author and contact:
    • Kumar Sambhav
  • For Spark 2.x and 3.x
    • Tested on Spark 2.4, 3.0, 3.1
    • Note for Spark 2.1 and 2.2 -> please use sparkDetective version 0.16

Getting started with sparkDetective

  • Note: sparkDetective is available on maven
  • Spark 3.x and 2.4 with scala 2.12:
    • Scala: bin/spark-shell --packages io.github.sambhav37:sparkDetective_2.12:0.17
    • Python: bin/pyspark --packages io.github.sambhav37:sparkDetective_2.12:0.17
      • note: pip install sparkmeasure to get the Python wrapper API.
  • Spark 2.x with Scala 2.11:
    • Scala: bin/spark-shell --packages io.github.sambhav37:sparkDetective_2.11:0.17
    • Python: bin/pyspark --packages io.github.sambhav37:sparkDetective_2.11:0.17
      • note: pip install sparkmeasure to get the Python wrapper API.
  • Bleeding edge: build sparkDetective jar using sbt: sbt +package and use --jars with the jar just built instead of using --packages.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published