Some Java examples for Apache Spark
If you want to know more, and be guided through your Java and Spark process, I can only recommend my book at Manning: Spark with Java. Find out more about Spark with Java on the Manning website. The book contains more examples, more explanation, is professionally written and edited.
Chapter 1 is an introduction to Spark and deals with basic ingestion examples.
Chapter 2 helps you build a mental model around Spark.
Chapter 3 WIP.
These labs rely on:
- Apache Spark 2.3.0 (based on Scala 2.11)
- Java 8
Notes on Branches
The master branch will always contain the latest version of Spark, currently v2.3.0.
A few labs around Apache Spark, exclusively in Java.
Organization is now in sub packages:
- l000_ingestion: Data ingestion from various sources.
- l020_streaming: Data ingestion via streaming. Special note on Streaming.
- l050_connection: Connect to Spark.
- l100_checkpoint: Checkpoint introduced in v2.1.0.
- l150_udf: UDF (User Defined Functions).
- l200_join: added join examples.
- l250_map: map (in the context of mapping, not always linked to map/reduce).
- l300_reduce: reduce.
- l400_industry_formats: working with industry formats, limited, for now, to HL7 and FHIR.
- l500_misc: other examples.
- l600_ml: ML (Machine Learning).
- l700_save: saving your results.
- l800_concurrency: labs around concurrency access, work in progress.
- l900_analytics: More complex examples of using Spark for Analytics.
If you would like to see more labs, send your request to email@example.com or @jgperrin on Twitter.