Skip to content

mkwhitacre/simplesparkapp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Spark Application

A simple Spark application that counts the occurrence of each word in a corpus and then counts the occurrence of each character in the most popular words. Includes the same program implemented in Java and Scala.

To make a jar:

mvn package

To run from a gateway node in a CDH5 cluster:

spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master local \
  target/sparkwordcount-0.0.1-SNAPSHOT.jar <input file> 2

This will run the application in a single local process. If the cluster is running a Spark standalone cluster manager, you can replace "--master local" with "--master spark://<master host>:<master port>".

If the cluster is running YARN, you can replace "--master local" with "--master yarn".

About

Simple Spark Application

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 81.0%
  • Scala 19.0%