Utilities for writing tests that use Apache Spark.
SparkSuite
: a SparkContext
for each test suite
Add configuration options in subclasses using sparkConf(…)
, cf. KryoSparkSuite
:
sparkConf(
// Register this class as its own KryoRegistrator
"spark.kryo.registrator" → getClass.getCanonicalName,
"spark.serializer" → "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.referenceTracking" → referenceTracking.toString,
"spark.kryo.registrationRequired" → registrationRequired.toString
)
PerCaseSuite
: SparkContext
for each test case
SparkSuite
implementation that provides hooks for kryo-registration:
register(
classOf[Foo],
"org.foo.Bar",
classOf[Bar] → new BarSerializer
)
Also useful for subclassing once per-project and filling in that project's default Kryo registrar, then having concrete tests subclass that; see cf. hammerlab/guacamole and hammerlab/pageant for examples.
rdd.Util
: make an RDD with specific elements in specific partitions.NumJobsUtil
: verify the number of Spark jobs that have been run.RDDSerialization
: interface that allows for verifying that performing a serialization+deserialization round-trip on an RDD results in the same RDD.