Skip to content

qiozas/sentiment-analysis-storm

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 
 
 
 
 

Simple Sentiment Analysis using Apache Storm

A very simple project that simulates how to execute sentiment analysis using Apache Storm 1.0.x.

Workflow processing steps:

  • Random sentences are emitted by a dummy Spout.
  • Words of each sentence are splitted (by space) and stemmed based on a dummy collection (could be replaced with a real DB) of words.
  • New sentence is emitted again without the "useless" words and processed by PositiveBolt, where a positive score is calculated and emitted.
  • New sentence is processed again by NegativeBolt and a negative score is calculated and emitted, additional to positive score.
  • ScoreBolt compares 2 previous scores and decides if this sentence is positive or negative.
  • Then final result (original and modified sentences and score) logged (by LoggingBolt) too and persisted to HBase or to Kafka (topic "sentimentOut").

Storm external module Flux is used to define and deploy topology in Storm.

For extra details and comments check blog post.

Application has been tested with:

  • Apache HBase 1.1.2 and HBase provided by Cloudera 5.4.x/5.5.x
  • Apache Kafka 0.9.0.1

Prerequisites

In case you need just to LOG result, then there is no dependencies.

If you need to persist result, then you need any of the following:

  • Download HBase 1.1.x and extract tgz.
    • Run single node of HBase
      • $> cd bin
      • $> ./start-hbase.sh
    • Create required table
      • $> hbase shell
      • hbase(main):001:0> create 'SentimentAnalysisStorm', 'cf'
  • Download Kafka 0.9.0.x and extract tgz.
    • Start internal Zookeeper and then Kafka Broker (detailed steps).
    • Create Topic
      • bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic sentimentOut
    • Consume data from topic via CLI
      • bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic sentimentOut --from-beginning

Build

mvn clean package -DskipTests

Run Topology

Running in local mode with Flux

  • Just LOG:
    • storm jar target/sentiment-analysis-storm-0.0.1-SNAPSHOT.jar org.apache.storm.flux.Flux --local -s 10000 src/test/resources/flux/topology.yaml
  • HBase persistence:
    • storm jar target/sentiment-analysis-storm-0.0.1-SNAPSHOT.jar org.apache.storm.flux.Flux --local -s 10000 src/test/resources/flux/topology_hbase.yaml
  • Kafka persistence:
    • storm jar target/sentiment-analysis-storm-0.0.1-SNAPSHOT.jar org.apache.storm.flux.Flux --local -s 10000 src/test/resources/flux/topology_kafka.yaml

Running in cluster mode with Flux

  • Just LOG:
    • storm jar target/sentiment-analysis-storm-0.0.1-SNAPSHOT.jar org.apache.storm.flux.Flux --remote -c nimbus.host=localhost src/test/resources/flux/topology.yaml
  • HBase persistence:
    • storm jar target/sentiment-analysis-storm-0.0.1-SNAPSHOT.jar org.apache.storm.flux.Flux --remote -c nimbus.host=localhost src/test/resources/flux/topology_hbase.yaml
  • Kafka persistence:
    • storm jar target/sentiment-analysis-storm-0.0.1-SNAPSHOT.jar org.apache.storm.flux.Flux --remote -c nimbus.host=localhost src/test/resources/flux/topology_kafka.yaml

About

Simple sentiment analysis using Apache Storm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published