Storm Demo - St. Louis Hadoop User Group

Last updated October 2017.

A word count topology originally forked from the storm-starter project and modified for a St. Louis Hadoop User Group presentation.

Outline:

RandomSentenceSpout: emits random sentence tuples
SplitSentenceBolt: splits each sentence into word tuples
WordCountBolt: keeps track of counts for each word and emits (word, count) tuples
OutputBolt: LOG the current word and the count
SolrIndexerBolt: Index the current word and the count in Solr

Environment

I developed this topology on Windows 10 using the following:

Start solr with .\bin\solr.cmd start
Bring up solr admin http://localhost:8983/solr/#/
Bring up banana http://localhost:8983/solr/banana/#/dashboard/solr/Word%20Dashboard?server=%2Fsolr%2F
Set auto-refresh to 3s and past 5 min
Clean up word_count_collection: .\bin\solr.cmd delete -c word_count_collection
Recreate word_count_collection: .\bin\solr.cmd create -c word_count_collection -d word_count_configs
Open up project in IntelliJ and debug WordCountTopology
Demo!
Stop solr .\bin\solr.cmd stop -all

Clone the project and open it in Eclipse. Make sure you're able to execute a maven build without errors.

Build the project using mvn clean package
Upload the jar to the cluster
Submit the topology: storm jar storm-stlhug-demo-0.0.1-SNAPSHOT.jar com.kitmenke.storm.WordCountTopology WordCountTopology

.\bin\solr.cmd start

Validate you can get to the Solr UI: http://localhost:8983/solr/

.\bin\solr.cmd create -c banana-int
.\bin\solr.cmd create -c word_count_collection -d word_count_configs

Import Word Dashboard.json into Banana to create the dashboard.

{
"id": "zombie", 
"count": 24,
"updated": "2015-09-05T21:28:00Z"
}

Delete a collection and all data in it

.\bin\solr.cmd delete -c word_count_collection