Find file
Fetching contributors…
Cannot retrieve contributors at this time
107 lines (75 sloc) 4.42 KB

Storm Demo - St. Louis Hadoop User Group

A word count topology originally forked from the storm-starter project and modified for a St. Louis Hadoop User Group presentation.


  1. RandomSentenceSpout: emits random sentence tuples
  2. SplitSentenceBolt: splits each sentence into word tuples
  3. WordCountBolt: keeps track of counts for each word and emits (word, count) tuples
  4. OutputBolt: LOG the current word and the count
  5. SolrIndexerBolt: Index the current word and the count in Solr


I developed this topology on Windows 10 using the following:

  • Eclipse Mars IDE for Java Developers (includes maven and git)
  • TestNG plugin
  • Oracle Virtual Box
  • HortonWorks HDP 2.3 Sandbox
  • Solr 5.2.1 (already installed on the HDP 2.3 sandbox)
  • Storm (already installed on the HDP 2.3 sandbox)
  • Banana 1.5.0 (install instructions below)

Running the topology in local mode

Clone the project and open it in Eclipse. Make sure you're able to execute a maven build without errors.

  1. Open com.kitmenke.storm.WordCountTopology
  2. Right click on the class, Debug As -> Java Application

Running the topology on a cluster

For testing I'm using the Hortonworks HDP 2.3 Sandbox.

  1. Build the project using mvn clean package
  2. Upload the jar to the cluster
  3. Submit the topology: storm jar storm-stlhug-demo-0.0.1-SNAPSHOT.jar com.kitmenke.storm.WordCountTopology WordCountTopology

Solr and Banana

As part of the demo, we'll show indexing data in Solr. The HDP 2.3 Sandbox comes with solr installed in /opt/lucidworks-hdpsearch/solr. We will need to install Banana ourselves.

su - solr
cd /opt/lucidworks-hdpsearch/solr

Install Banana version 1.5.0:

mkdir /opt/lucidworks-hdpsearch/solr/server/solr-webapp/webapp/banana/
unzip -d /opt/lucidworks-hdpsearch/solr/server/solr-webapp/webapp/banana/

Start Solr in "cloud mode" using the local zookeeper instance at port 2181:

bin/solr start -c -z localhost:2181

Browse to the Solr UI:

Using your favorite SCP tool (like WinSCP) copy the banana-int and word_count_collection folders (in multilang) to /opt/lucidworks-hdpsearch/solr on the server. Then, run the script upload the configs to zookeeper:

server/scripts/cloud-scripts/ -zkhost localhost:2181 -cmd upconfig -confname banana-int -confdir banana-int
server/scripts/cloud-scripts/ -zkhost localhost:2181 -cmd upconfig -confname word_count_collection -confdir word_count_collection

Confirm uploaded correctly using tree view: You should see two new folders under the /configs directory.

Create the Banana Dashboards collection in SOLR:

Note: The solrconfig.xml included with banana was broken. I used the schema.xml from banana-int-solr-4.5/banana-int/conf and the default solrconfig.xml.

Create the Word Count collection in SOLR:

Browse to Banana:

Import Word Dashboard.json into Banana to create the dashboard.

Other useful commands / notes

Try creating an example document using the Solr admin UI:

"word": "zombie", 
"count": 24,
"updated": "2015-09-05T21:28:00Z"

In case you change configs or what to start over, use the clear command: server/scripts/cloud-scripts/ -zkhost localhost:2181 -cmd clear /configs/word_count_collection

Delete a collection (and all the data in it!):