Spark Example
Scala Java
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Spark examples repo for getting started

Coverage Status

Insight from following course

Note: Code Examples are built on those provided From Pluralsight Course above

Spark Documentation

Setting it up (Pre-reqs)

Or start with Docker Image with preinstalled Spark


Starting Spark with scala

Open Command prompt or shell and run


All Spark jobs begin with sc (The Spark Context) which is supplied by the spark-shell

The shell creates 2 contexts

  • sc - a special interpreter-aware SparkContext is already created for you in spark shell
  • sqlContext - entry point for working with structured data (rows and columns) in Spark

Using Spark shell

:help for help
use two tabs to see method definitions rather than return (e.g. sc.parallelize)

Simple Example

Read in a text file and write first line

val textFile = sc.textFile("file:///<SPARK_HOME>/")

res0: String = # Apache Spark

Tokenize the File Data with a space

 val tokenizedFileData = textFile.flatMap(line=>line.split(" "))

This is the Map in Map/Reduce

Count the instances of each Word

Here the word is the key and the value is the count

 val countPrep =>(word,1))
 val counts = countPrep.reduceByKey((accumValue, newValue)=>accumValue + newValue)

This is the Reduce in Map/Reduce

Sort in decending order (_2 represents 2nd position in the tuple)

 val sortedCounts = counts.sortBy(kvPair=>kvPair._2, false)

Write File and check output parts


An even Simpler way using the Api