Last updated October 2017.
A word count topology originally forked from the storm-starter project and modified for a St. Louis Hadoop User Group presentation.
Outline:
- RandomSentenceSpout: emits random sentence tuples
- SplitSentenceBolt: splits each sentence into word tuples
- WordCountBolt: keeps track of counts for each word and emits (word, count) tuples
- OutputBolt: LOG the current word and the count
- SolrIndexerBolt: Index the current word and the count in Solr
I developed this topology on Windows 10 using the following:
- IntelliJ Community Edition 2017.2
- Storm 1.0.5 (local mode)
- Solr 7.0.1 + Banana ?.?.? (installed separately)
- Start solr with .\bin\solr.cmd start
- Bring up solr admin http://localhost:8983/solr/#/
- Bring up banana http://localhost:8983/solr/banana/#/dashboard/solr/Word%20Dashboard?server=%2Fsolr%2F
- Set auto-refresh to 3s and past 5 min
- Clean up word_count_collection: .\bin\solr.cmd delete -c word_count_collection
- Recreate word_count_collection: .\bin\solr.cmd create -c word_count_collection -d word_count_configs
- Open up project in IntelliJ and debug WordCountTopology
- Demo!
- Stop solr .\bin\solr.cmd stop -all
Clone the project and open it in Eclipse. Make sure you're able to execute a maven build without errors.
- Open com.kitmenke.storm.WordCountTopology
- Right click on the class, Debug As -> Java Application
- Build the project using
mvn clean package
- Upload the jar to the cluster
- Submit the topology:
storm jar storm-stlhug-demo-0.0.1-SNAPSHOT.jar com.kitmenke.storm.WordCountTopology WordCountTopology
Download solr from https://lucene.apache.org/solr/ and install https://lucene.apache.org/solr/guide/7_0/installing-solr.html#installing-solr
.\bin\solr.cmd start
Validate you can get to the Solr UI: http://localhost:8983/solr/
Copy word_count_configs to solr-7.0.1\server\solr\configsets example https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.0.0/solr/server/solr/configsets/basic_configs/conf/managed-schema
.\bin\solr.cmd create -c banana-int
.\bin\solr.cmd create -c word_count_collection -d word_count_configs
Install Banana from https://github.com/LucidWorks/banana/
Browse to Banana: http://localhost:8983/solr/banana/#/dashboard
Import Word Dashboard.json into Banana to create the dashboard.
Try creating an example document using the Solr admin UI: http://127.0.0.1:8983/solr/#/word_count_collection_shard1_replica1/documents
{
"id": "zombie",
"count": 24,
"updated": "2015-09-05T21:28:00Z"
}
Delete a collection and all data in it
.\bin\solr.cmd delete -c word_count_collection