Twitter clustering as a showcase for Apache Storm vs. Heron (Big Data Frameworks)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
hive
src
.gitignore
README.md
pom.xml

README.md

Twitter Analysis

This is an example project for my blog.

It compares Apache Storm with Twitter's Heron by analyzing a bunch of structured tweets, filtering them by "Americans" and try to figure out if they are Repulbican, Democrat or Undecided by using an prototype-based scoring algorithms which learns by analyzing Hillary Clinton's and Donald Trump's recent tweets.

This is configurable by the "config.properties" file in the resources folder.

Install

Fill in auth.properties with your Twitter API credentials

$ cp src/main/resources/auth-example.properties ~/auth.properties
$ vi ~/auth.properties
$ cp src/main/resources/config-example.properties ~/config.properties
$ vi ~/config.properties
$ mvn clean install  -Dmaven.test.skip=true -Pprod

Storm

$ storm jar target/TwitterAnalysis-1.0-SNAPSHOT.jar com.otterinasuit.twitter.Main ~/auth.properties

Heron

Follow this. Remove / comment out Storm dependency

$ heron submit local TwitterAnalysis-1.0-SNAPSHOT.jar com.otterinasuit.twitter.Main TwitterAnalysis --topology-args ~/auth.properties

Disclaimer

This is WORK IN PROGRESS AND WILL PROBABLY NOT RUN YET OR PRODUCE MEDIOCRE RESULTS