Skip to content

Streaming Twitter data in near real-time using Apache Spark Streaming API

Notifications You must be signed in to change notification settings

martinywwan/spark-twitter-streaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Streaming Twitter data using Apache Spark


Synopsis


Simple Spark application that connects to Twitter and prints twitter messages based on a filter (if any).
The Spark application can be run as a Standalone Application or on Hadoop.

Motivation


The motivation behind this project was to provide support to developers and researchers in connecting to Twitter using Apache Spark.

Execution


Prerequisites:
1)If you are running on Hadoop, ensure ${HADOOP_CONF_DIR} and ${HADOOP_HOME} are set

Instructions to run the application using an IDE:
1) Edit the run configuration to include the following arguments: [args0 - consumerKey] [args1 - consumerSecret] [args2 - accessToken] [args3 - accessTokenSecret]
2) Run the SparkApplication class - Main method is located here (Optional: edit the FILTERS array to filter out the tweets received)

Instructions to run the application on the command line:
1) Ensure maven is installed and enter "mvn clean package"
2) In the target folder, you should see a jar file with dependencies. Run "java -jar [generated_jar].jar [args0 - consumerKey] [args1 - consumerSecret] [args2 - accessToken] [args3 - accessTokenSecret]


About

Streaming Twitter data in near real-time using Apache Spark Streaming API

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages