Sample spark program to run in docker setup
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
output
src/main/scala/com/pavanpkulkarni/dockerwordcount
.gitignore
README.md
build.gradle
data.txt
settings.gradle

README.md

Word Count in Spark - Write output to file

This repo contains a word count program that writes output to file.

To Run:

  1. Clone this repo

  2. Uncomment line 14 when running on local. This line is commented so that we can use Docker master.

    //.master("local") //uncomment this line when running on local

  3. Build the project by running - gradle clean build

  4. Run spark-submit command as

    spark-submit --master local[4] --verbose --class com.pavanpkulkarni.dockerwordcount.DockerWordCount build/libs/Docker_WordCount_Spark-1.0.jar <input_filename> <output_directory>

    E.g:

    spark-submit --master local[4] --verbose --class com.pavanpkulkarni.dockerwordcount.DockerWordCount build/libs/Docker_WordCount_Spark-1.0.jar "data.txt" "output"

Output

Output will be available under output/part-00000-xxxxx