Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

121 lines (77 sloc) 4.635 kB
CDR Logprocessing plugin for Flume
==================================
Source organization
flume-plugin - source of flume plugin that writes CDR logs to cassandra.
scripts - simple perl script that generates sample CDR logs for testing.
Getting Flume & Thrift
======================
https://github.com/cloudera/flume (master was used to test sample)
http://incubator.apache.org/thrift/download/
Thrift was compiled with the following option:
./configure --enable-gen-java=yes --enable-gen-cpp=yes --enable-gen-erlang=no --enable-gen-perl=no --enable-gen-py=no --enable-gen-php=no --with-boost=no; make
Assuming flume was installed under $HOME/flume, create a symlink to thrift-0.5.0/compiler/cpp/thrift, under $HOME/flume
Under $HOME/flume, ant
flume-plugin
============
This plugin allows you to use Cassandra as a Flume sink for CDR logs.
Getting Started
---------------
1) This plugin was built using flume-0.9.3-core.jar, which is delivered as part of package.
2) cd cassandra; ant release;
3) Copy cdr_logprocessing-0.1.tar.gz to $HOME/flume directory and uncompress it.
4) Add the following to your .bashrc file
export FLUME_HOME=$HOME/flume
export FLUME_LOG_DIR=/tmp
export FLUME_PID_DIR=/tmp
export FLUME_CONF_DIR=$HOME/flume/conf
export FLUME_CLASSPATH=$HOME/flume/cdrplugin/lib/apache-cassandra-0.7.0.jar:$HOME/flume/cdrplugin/lib/avro-1.4.0-rc4.jar:$HOME/flume/cdrplugin/lib/cdr_logprocessing-0.1.jar:$HOME/flume/cdrplugin/lib/commons-lang-2.4.jar:$HOME/flume/cdrplugin/lib/hector-core-0.7.0-22.jar:$HOME/flume/cdrplugin/lib/high-scale-lib-1.1.1.jar:$HOME/flume/cdrplugin/lib/jug-asl-2.0.0.jar:$HOME/flume/cdrplugin/lib/log4j-1.2.14.jar:$HOME/flume/cdrplugin/lib/perf4j-0.9.13.jar:$HOME/flume/cdrplugin/lib/slf4j-api-1.5.11.jar:$HOME/flume/cdrplugin/lib/slf4j-log4j12-1.5.8.jar
4. Modify flume-site.xml (you may start out by copying
flume-site.xml.template and removing the body of the file) to include:
<configuration>
<property>
<name>flume.plugin.classes</name>
<value>com.gemini.logprocessing.cassandra.CDRCassandraSink</value>
<description>Comma separated list of plugin classes</description>
</property>
</configuration>
scripts
=======
loggen.pl will write sample CDR entries to /tmp/cdr.log. We can use this script for testing our setup.
Usage
-----
This plugin primarily targets CDR log storage right now.
1) The following needs to be installed in cassandra using cli
connect <hostname>/9160;
create keyspace CDRLogs with replication_factor = 2 and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy';
use CDRLogs;
create column family MSISDNTimeline with column_type = 'Standard' and comparator = 'BytesType';
create column family CDREntry with column_type = 'Standard' and comparator = 'BytesType';
create column family HourlyTimeline with column_type = 'Standard' and comparator = 'BytesType';
2) In flume config you call this sink as
CDRCassandraSink("cassandra_host:cassandra_port",ColumnFamilyForRawCDR);
where
cassandra_host:cassandra_port - cassandra host/port combination
ColumnFamilyforRawCDR - CF where raw cdr entries for this market are to be stored.
3) In our test environment, we had NodeM - running flume master, NodeA - running flume agent and NodeC - running flume collector & cassandra-0.7.2
3.1) On NodeM
3.1.1) Export all environment variables.
3.1.2) cd $FLUME_HOME; bin/flume master
3.1.3) http://NodeM:35871/flumemaster.jsp will all active nodes and their configuration.
3.2) On NodeA
3.2.1) Edit flume-site.xml and add NodeM as master
3.2.2) cd $FLUME_HOME; bin/flume node_nowatch
3.2.3) http://NodeA:35862/flumeagent.jsp will display statistics.
3.3) On NodeC
3.3.1) Edit flume-site.xml and add NodeM as master
3.3.2) cd $FLUME_HOME; bin/flume node_nowatch -n collector
3.3.3) http://NodeC:35862/flumeagent.jsp will display statistics.
4) Go to http://NodeM:35871/flumeconfig.jsp and configure the nodes.
4.1) For NodeA - Source is tail("/tmp/cdr.log") and Sink is agentSink("NodeC",35853)
4.2) For NodeC - Source is collectorSource(35853) and Sink is CDRCassandraSink("NodeC:9160", "CDRRaw_market1")
5) Go to http://NodeM:35871/flumemaster.jsp and if nodes were configured correctly, all nodes should show up as 'ACTIVE'
6) On NodeA - run the script perl loggen.pl (NOTE: This script will write to log file in a for(;;) loop)
7) Verify data in cassandra using cassandra-cli;
Issues
------
1) CDR format currently supported is of form
operatorId,operatorMarket,transactionId,cdrType,messageTimestamp,moIMSI,moIP,mtIP,PTN,msgType,moDomain,mtDomain
Jump to Line
Something went wrong with that request. Please try again.