Spark Streaming

Integrate Spark with Kafka and do some simple processing for streaming data.

Requirements

Install Kafka and Spark. You don't have to set up a cluster. Deploying them on your local machine is fine.
Feed real time data to Kafka and then use Spark Streaming to fetch this data and count the frequency of each word and print complete results to the console (You should use "foreachRDD" method).

Generally speaking, you should first start your Kafka, submit your job to Spark, then put data into Kafka and the results(word, freq) should be shown in your console.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
SparkstreamConsumer.py		SparkstreamConsumer.py
Sparkstreaming.pdf		Sparkstreaming.pdf
keys.txt		keys.txt
producer.py		producer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SparkstreamConsumer.py

SparkstreamConsumer.py

Sparkstreaming.pdf

Sparkstreaming.pdf

keys.txt

keys.txt

producer.py

producer.py

Repository files navigation

Spark Streaming

Requirements

About

Releases

Packages

Languages

nehgu/cloud-mini-hw4

Folders and files

Latest commit

History

Repository files navigation

Spark Streaming

Requirements

About

Resources

Stars

Watchers

Forks

Languages