Apache Spark Java Cookbook
Java
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
src/main/java
.gitignore
LICENSE
README.md
pom.xml

README.md

Some Java examples for Apache Spark

If you want to know more, and be guided through your Java and Spark process, I can only recommend my book at Manning: Spark with Java. Find out more about Spark with Java on the Manning website. The book contains more examples, more explanation, is professionally written and edited.

Chapter 1 is an introduction to Spark and deals with basic ingestion examples.

Chapter 2 helps you build a mental model around Spark.

Chapter 3 WIP.

Environment

These labs rely on:

  • Apache Spark 2.3.0 (based on Scala 2.11)
  • Java 8

Notes on Branches

The master branch will always contain the latest version of Spark, currently v2.3.0.

Labs

A few labs around Apache Spark, exclusively in Java.

Organization is now in sub packages:

  • l000_ingestion: Data ingestion from various sources.
  • l020_streaming: Data ingestion via streaming. Special note on Streaming.
  • l050_connection: Connect to Spark.
  • l100_checkpoint: Checkpoint introduced in v2.1.0.
  • l150_udf: UDF (User Defined Functions).
  • l200_join: added join examples.
  • l250_map: map (in the context of mapping, not always linked to map/reduce).
  • l300_reduce: reduce.
  • l400_industry_formats: working with industry formats, limited, for now, to HL7 and FHIR.
  • l500_misc: other examples.
  • l600_ml: ML (Machine Learning).
  • l700_save: saving your results.
  • l800_concurrency: labs around concurrency access, work in progress.
  • l900_analytics: More complex examples of using Spark for Analytics.

If you would like to see more labs, send your request to jgp@jgp.net or @jgperrin on Twitter.