Skip to content

Twitter sentiment analysis using Spark and Stanford CoreNLP and visualization using elasticsearch and kibana

Notifications You must be signed in to change notification settings

stdatalabs/sparkNLP-elasticsearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SparkTwitterPopularHashTags

A project on Spark Streaming to analyze Popular hashtags from live twitter data streams. Data is ingested from different input sources like Twitter source, Flume and Kafka and processed downstream using Spark Streaming.

Requirements

  • IDE
  • Apache Maven 3.x
  • JVM 6 or 7

General Info

The source folder is organized into 2 packages i.e. Kafka and Streaming. Each class in the Streaming package explores different approach to consume data from Twitter source. Below is the list of classes:

  • com/stdatalabs/Kafka
    • KafkaTwitterProducer.java -- A Kafka Producer that publishes twitter data to a kafka broker
  • com/stdatalabs/Streaming
    • SparkPopularHashTags.scala -- Receives data from Twitter datasource
    • FlumeSparkPopularHashTags.scala -- Receives data from Flume Twitter producer
    • KafkaSparkPopularHashTags.scala -- Receives data from Kafka Producer
    • RecoverableKafkaPopularHashTags.scala -- Spark-Kafka receiver based approach. Ensures at-least once semantics
    • KafkaDirectPopularHashTags.scala -- Spark-Kafka Direct approach. Ensures exactly once semantics
  • TwitterAvroSource.conf -- Flume conf for running Twitter avro source

Description

More articles on hadoop technology stack at stdatalabs

About

Twitter sentiment analysis using Spark and Stanford CoreNLP and visualization using elasticsearch and kibana

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published