SparkTwitterPopularHashTags

A project on Spark Streaming to analyze Popular hashtags from live twitter data streams. Data is ingested from different input sources like Twitter source, Flume and Kafka and processed downstream using Spark Streaming.

Requirements

IDE
Apache Maven 3.x
JVM 6 or 7

General Info

The source folder is organized into 2 packages i.e. Kafka and Streaming. Each class in the Streaming package explores different approach to consume data from Twitter source. Below is the list of classes:

com/stdatalabs/Kafka
- KafkaTwitterProducer.java -- A Kafka Producer that publishes twitter data to a kafka broker
com/stdatalabs/Streaming
- SparkPopularHashTags.scala -- Receives data from Twitter datasource
- FlumeSparkPopularHashTags.scala -- Receives data from Flume Twitter producer
- KafkaSparkPopularHashTags.scala -- Receives data from Kafka Producer
- RecoverableKafkaPopularHashTags.scala -- Spark-Kafka receiver based approach. Ensures at-least once semantics
- KafkaDirectPopularHashTags.scala -- Spark-Kafka Direct approach. Ensures exactly once semantics
TwitterAvroSource.conf -- Flume conf for running Twitter avro source

Description

A Spark Streaming application that receives tweets on certain keywords from twitter datasource and finds the popular hashtags.

Discussed in blog -- Spark Streaming part 1: Real time twitter sentiment analysis
A Spark Streaming - Flume integration to find Popular hashtags from twitter. It receives events from a Flume source that connects to twitter and pushes tweets as avro events to sink.

Discussed in blog -- Spark streaming part 2: Real time twitter sentiment analysis using Flume
A Spark Streaming - Kafka integration to receive twitter data from kafka producer and find the popular hashtags

Discussed in blog -- Spark streaming part 3: Real time twitter sentiment analysis using kafka
A Spark Streaming - Kafka integration to ensure at-least once semantics

Discussed in blog -- Data guarantees in Spark Streaming with kafka integration
A Spark Streaming - Kafka integration to ensure exactly once semantics

Discussed in blog -- Data guarantees in Spark Streaming with kafka integration

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
visualizations		visualizations
.gitignore		.gitignore
README.md		README.md
dependency-reduced-pom.xml		dependency-reduced-pom.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

visualizations

visualizations

.gitignore

.gitignore

README.md

README.md

dependency-reduced-pom.xml

dependency-reduced-pom.xml

pom.xml

pom.xml

Repository files navigation

SparkTwitterPopularHashTags

Requirements

General Info

Description

A Spark Streaming application that receives tweets on certain keywords from twitter datasource and finds the popular hashtags.

A Spark Streaming - Flume integration to find Popular hashtags from twitter. It receives events from a Flume source that connects to twitter and pushes tweets as avro events to sink.

A Spark Streaming - Kafka integration to receive twitter data from kafka producer and find the popular hashtags

A Spark Streaming - Kafka integration to ensure at-least once semantics

A Spark Streaming - Kafka integration to ensure exactly once semantics

More articles on hadoop technology stack at stdatalabs

About

Releases

Packages

Languages

stdatalabs/sparkNLP-elasticsearch

Folders and files

Latest commit

History

Repository files navigation

SparkTwitterPopularHashTags

Requirements

General Info

Description

A Spark Streaming application that receives tweets on certain keywords from twitter datasource and finds the popular hashtags.

A Spark Streaming - Flume integration to find Popular hashtags from twitter. It receives events from a Flume source that connects to twitter and pushes tweets as avro events to sink.

A Spark Streaming - Kafka integration to receive twitter data from kafka producer and find the popular hashtags

A Spark Streaming - Kafka integration to ensure at-least once semantics

A Spark Streaming - Kafka integration to ensure exactly once semantics

More articles on hadoop technology stack at stdatalabs

About

Topics

Resources

Stars

Watchers

Forks

Languages