The purpose of this project is get a global vision about how Apache Storm works.
The structure of the topology created is as follows:
- TwitterSpout spout receives tweets streaming which match the pattern established using the library Twitter4j.
- AnalyzeTweetsBolt receives the tweets from the spout and filter between tweets with more than 140 characters and tweets written by verified accounts. In the first case the results will be emited over an stream called status, and in the second case the results will be emited over an stream called user.
- ExtractMentionsBolt receives the stream status from AnalyzeTweetsBolt and prints id and the number of mentions for each tweet.
- ExtractLocationBolt receives the stream user from AnalyzeTweetsBolt and, if the location contains a predefined pattern, prints in console.
To execute the project follow this steps (the easiest way):
- Clone this repo.
- Open project with your favorite IDE.
- Rename/copy
src/main/resources/twitter4j.properties.example
bytwitter4j.properties
and set your Twitter keys. You can create an app for Twitter here. - Set in run configuration the word/words you want to set as filter. Try with popular words (e.g. "Donald Trump"), hashtags in TT... to watch the real-time effect
- Run and enjoy! You should see something like this:
Ángel Francisco Sánchez Granados Ingeniería de Datos - Big Data Máster en Ingeniería Informática - URJC - 2017/18