Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Storm is real time computing system which supports fault-tolerance, horizontal scalability and guaranteed message processing with amazing performance. Here is the library of sample projects which is essentially exposing reusable bolts for real time computation.
branch: master
Failed to load latest commit information.
src
README

README

Technical Details:

This is a fun project I created to leverage the newest real time computing platforms to process data generated from sensor devices. This project is about creating a library to continuously listen to and analyze stream of data generated by various sensor devices. This library is developed using storm-distributed stream computing platform (http://www.slideshare.net/nathanmarz/storm-distributed-and-faulttolerant-realtime-computation). The basic architecture of the platform is such that it continuously processes the streaming data generating another stream of result and you can continue creating of such pipeline of stream processing endlessly. 

I have written a library to compute moving average and spike detection for continuous stream of data that can be applied to finance or any other streams. The purpose of the library is I want to create a library of bolts that can do certain operations and we can reuse these bolts than starting over from scratch.

Let us start with deploying my project on single node. Deploying it on single node is very simple. I benchmarked this project to process 96,000 sensor values per second on cluster of 3 machines. It detects spike within few milliseconds.

Hardware Requirements: (Not necessary as you can generate input like sensor data input using inputStreamSpout)

1.	arduino kit with circuit design of photo resistor.
2.	Interface this kit with your laptop using serial port and run the light program I submitted to generate light intensity events.
3.	It will list the serial port being used on the machine.

Software Requirements:

1.	Download storm version 0.6.2 from https://github.com/nathanmarz/storm/downloads
2.	Install maven. I am considering Java 1.6 is installed.


Running my project:
1.	Download storm-starter project from https://github.com/nathanmarz/storm-starter/downloads, unzip the project, rename m2-pom.xml to pom.xml
2.	Copy my project movingAverageWithSpikeDetection.tar.gz to storm-starter/src/jvm directory. Unzip my project submitted movingAverageWithSpikeDetection.tar.gz using “tar –zxf movingAverageWithSpikeDetection.tar.gz”
3.	Build my project using maven with command in storm-starter folder – “mvn clean install” (it will install all the libraries for serialization and other stuff for distributed system.)
4.	Run “mvn eclipse:eclipse” to create eclipse .project file for simplicity.
5.	Open eclipse and import the movingAverageWithSpikeDetection project.
6.	Open LightEventSpout.java and change the PORT_NAMES[] entry according to the serial port on your machine which arduino kit is using. Baud rate is defined to be 9600 in LightEventSpout.java so if you change it for experiment make sure both the baud rates are matching, one from the device and one from the program.
7.	Upload the light program on the arduino kit.
8.	Run SpikeDetectionTopology.java, if you are getting an exception PortInUse then create a folder /var/lock/ and give 775 permissions to it, it is a problem with arduino to Java interface. (This will automatically invoke zookeeper distributed cluster management and run the program over it with one node)


Creating a cluster of machines and running my project on distributed cluster:

1.	Download zookeeper from - http://download.filehat.com/apache/zookeeper/zookeeper-3.3.3/
2.	Unzip zookeeper and change zoo_example.cfg file from the config folder to zoo.cfg
3.	Go to bin folder in zookeeper and start zookeeper instance using this command - “zkServer.sh start”
4.	Unzip storm-0.6.1 folder
5.	Copy storm.yaml.example to storm.yaml and add all the machine names (you can use IP address) to storm.zookeeper.servers that indicates zookeeper is running on every machine for co-ordination in distributed system
6.	Add master machine-name as nimbus.host which is interfaced with the arduino
(Remember all these steps needs to be done on every machine)


Now your cluster is set. Do the following to run the above project on cluster of machines:

1.	On every machine, go to storm-0.6.2 folder. Go to bin directory inside it and run “./storm supervisor”
2.	On master machine, run “./storm nimbus” that will start the master process called nimbus that distributes the runnable programs over the cluster dynamically.
3.	Now run our project from the master node using the command – “./storm jar movingAverageSpikeDetection.jar movingAverage.SpikeDetectionTopology”. if you are getting an exception PortInUse then create a folder /var/lock/ and give 775 permissions to it, it is a problem with arduino to Java interface. (This will automatically invoke zookeeper distributed cluster management and run the program over it with one node)

Something went wrong with that request. Please try again.