Twitter Stream Classifier

A Big Data pipeline (set of applications) that pulls tweets related to some movies, drops the tweets to Kafka, classifies them as positive/negative and, finally, exports counters and the tweets to Cassandra. Simple Web UI application to view the results is also included.

The purpose of this repository is to give a baseline to the community in development of In-Stream Big Data Pipelines. It can be taken as-is a prototype and filled in with custom logic. If you faced issues or feel that something can be improved - contributions are welcomed.

Modules

test-utils

Embedded Redis, Spark, Cassandra and some other testing stuff which is reused in tests across other modules.

twitter-consumer

A tool which searches tweets related to specified movie and keeps tracking them in real time. The tweets are pushed to Kafka. For more details, see module documentation

dictionary-populator

A tool and DAO to work with classification model dictionary and coefficients in Cassandra. The tool is used to import the model from TSV files. Find more details in module documentation

streaming

Spark Streaming application which classifies tweets from Kafka and sends results to Cassandra. Find more details, including Cassandra schemas in module documentation

Commands

How to Build the Project

The project does not have explicit build environment limitations. But it is confirmed to work with:

Linux, Windows + Cygwin, MacOS
Maven 3+
Java 1.7+

# the build process is very simple
mvn clean install
# or without tests
mvn clean install -DskipTests -DskipITs

Note, that clean is mandatory if tests are run due to peculiarities of Cassandra Maven Plugin and random ports reservation.

Work with license headers

There is an automatic check that files have proper license headers. You can either add the header manually, or use following command:

mvn license:format

License

Apache License Version 2.0. See LICENSE.txt

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
cassandra-schema		cassandra-schema
dictionary-populator		dictionary-populator
streaming		streaming
test-utils		test-utils
twitter-consumer		twitter-consumer
webclient		webclient
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cassandra-schema

cassandra-schema

dictionary-populator

dictionary-populator

streaming

streaming

test-utils

test-utils

twitter-consumer

twitter-consumer

webclient

webclient

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.md

README.md

pom.xml

pom.xml

Repository files navigation

Twitter Stream Classifier

Modules

test-utils

twitter-consumer

dictionary-populator

streaming

Commands

How to Build the Project

Work with license headers

License

About

Releases

Packages

Contributors 2

Languages

License

griddynamics/realtime-twitter-sentiment-analysis-example

Folders and files

Latest commit

History

Repository files navigation

Twitter Stream Classifier

Modules

test-utils

twitter-consumer

dictionary-populator

streaming

Commands

How to Build the Project

Work with license headers

License

About

Resources

License

Stars

Watchers

Forks

Languages