Log processing system using Flume and Cassandra
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is even with cloudian:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


CDR Logprocessing plugin for Flume

Source organization

flume-plugin - source of flume plugin that writes CDR logs to cassandra.
scripts - simple perl script that generates sample CDR logs for testing. 

Getting Flume & Thrift

https://github.com/cloudera/flume (master was used to test sample)

Thrift was compiled with the following option:

./configure --enable-gen-java=yes --enable-gen-cpp=yes --enable-gen-erlang=no --enable-gen-perl=no --enable-gen-py=no --enable-gen-php=no --with-boost=no; make

Assuming flume was installed under $HOME/flume, create a symlink to thrift-0.5.0/compiler/cpp/thrift, under $HOME/flume

Under $HOME/flume, ant


This plugin allows you to use Cassandra as a Flume sink for CDR logs. 

Getting Started

1) This plugin was built using flume-0.9.3-core.jar, which is delivered as part of package. 

2) cd cassandra; ant release;

3) Copy cdr_logprocessing-0.1.tar.gz to $HOME/flume directory and uncompress it.

4) Add the following to your .bashrc file

export FLUME_HOME=$HOME/flume
export FLUME_LOG_DIR=/tmp
export FLUME_PID_DIR=/tmp
export FLUME_CONF_DIR=$HOME/flume/conf
export FLUME_CLASSPATH=$HOME/flume/cdrplugin/lib/apache-cassandra-0.7.0.jar:$HOME/flume/cdrplugin/lib/avro-1.4.0-rc4.jar:$HOME/flume/cdrplugin/lib/cdr_logprocessing-0.1.jar:$HOME/flume/cdrplugin/lib/commons-lang-2.4.jar:$HOME/flume/cdrplugin/lib/hector-core-0.7.0-22.jar:$HOME/flume/cdrplugin/lib/high-scale-lib-1.1.1.jar:$HOME/flume/cdrplugin/lib/jug-asl-2.0.0.jar:$HOME/flume/cdrplugin/lib/log4j-1.2.14.jar:$HOME/flume/cdrplugin/lib/perf4j-0.9.13.jar:$HOME/flume/cdrplugin/lib/slf4j-api-1.5.11.jar:$HOME/flume/cdrplugin/lib/slf4j-log4j12-1.5.8.jar

4. Modify flume-site.xml (you may start out by copying
flume-site.xml.template and removing the body of the file) to include:

        <description>Comma separated list of plugin classes</description>


loggen.pl will write sample CDR entries to /tmp/cdr.log. We can use this script for testing our setup. 


This plugin primarily targets CDR log storage right now.

1) The following needs to be installed in cassandra using cli

connect <hostname>/9160;
create keyspace CDRLogs with replication_factor = 2 and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
use CDRLogs;
create column family MSISDNTimeline with column_type = 'Standard' and comparator = 'BytesType';
create column family CDREntry with column_type = 'Standard' and comparator = 'BytesType';
create column family HourlyTimeline with column_type = 'Standard' and comparator = 'BytesType';

2) In flume config you call this sink as



cassandra_host:cassandra_port - cassandra host/port combination
ColumnFamilyforRawCDR - CF where raw cdr entries for this market are to be stored. 

3) In our test environment, we had NodeM - running flume master, NodeA -  running flume agent and NodeC - running flume collector & cassandra-0.7.2
    3.1) On NodeM
	3.1.1) Export all environment variables. 
	3.1.2) cd $FLUME_HOME; bin/flume master
	3.1.3) http://NodeM:35871/flumemaster.jsp will all active nodes and their configuration.
    3.2) On NodeA
	3.2.1) Edit flume-site.xml and add NodeM as master
	3.2.2) cd $FLUME_HOME; bin/flume node_nowatch
	3.2.3) http://NodeA:35862/flumeagent.jsp will display statistics.
    3.3) On NodeC
	3.3.1) Edit flume-site.xml and add NodeM as master
	3.3.2) cd $FLUME_HOME; bin/flume node_nowatch -n collector
	3.3.3) http://NodeC:35862/flumeagent.jsp will display statistics.

4) Go to http://NodeM:35871/flumeconfig.jsp and configure the nodes. 
    4.1) For NodeA - Source is tail("/tmp/cdr.log") and Sink is agentSink("NodeC",35853)
    4.2) For NodeC - Source is collectorSource(35853) and Sink is CDRCassandraSink("NodeC:9160", "CDRRaw_market1")
5) Go to http://NodeM:35871/flumemaster.jsp and if nodes were configured correctly, all nodes should show up as 'ACTIVE'
6) On NodeA - run the script perl loggen.pl (NOTE: This script will write to log file in a for(;;) loop)
7) Verify data in cassandra using cassandra-cli;


1) CDR format currently supported is of form