Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
CDR Logprocessing plugin for Flume ================================== Source organization flume-plugin - source of flume plugin that writes CDR logs to cassandra. scripts - simple perl script that generates sample CDR logs for testing. Getting Flume & Thrift ====================== https://github.com/cloudera/flume (master was used to test sample) http://incubator.apache.org/thrift/download/ Thrift was compiled with the following option: ./configure --enable-gen-java=yes --enable-gen-cpp=yes --enable-gen-erlang=no --enable-gen-perl=no --enable-gen-py=no --enable-gen-php=no --with-boost=no; make Assuming flume was installed under $HOME/flume, create a symlink to thrift-0.5.0/compiler/cpp/thrift, under $HOME/flume Under $HOME/flume, ant flume-plugin ============ This plugin allows you to use Cassandra as a Flume sink for CDR logs. Getting Started --------------- 1) This plugin was built using flume-0.9.3-core.jar, which is delivered as part of package. 2) cd cassandra; ant release; 3) Copy cdr_logprocessing-0.1.tar.gz to $HOME/flume directory and uncompress it. 4) Add the following to your .bashrc file export FLUME_HOME=$HOME/flume export FLUME_LOG_DIR=/tmp export FLUME_PID_DIR=/tmp export FLUME_CONF_DIR=$HOME/flume/conf export FLUME_CLASSPATH=$HOME/flume/cdrplugin/lib/apache-cassandra-0.7.0.jar:$HOME/flume/cdrplugin/lib/avro-1.4.0-rc4.jar:$HOME/flume/cdrplugin/lib/cdr_logprocessing-0.1.jar:$HOME/flume/cdrplugin/lib/commons-lang-2.4.jar:$HOME/flume/cdrplugin/lib/hector-core-0.7.0-22.jar:$HOME/flume/cdrplugin/lib/high-scale-lib-1.1.1.jar:$HOME/flume/cdrplugin/lib/jug-asl-2.0.0.jar:$HOME/flume/cdrplugin/lib/log4j-1.2.14.jar:$HOME/flume/cdrplugin/lib/perf4j-0.9.13.jar:$HOME/flume/cdrplugin/lib/slf4j-api-1.5.11.jar:$HOME/flume/cdrplugin/lib/slf4j-log4j12-1.5.8.jar 4. Modify flume-site.xml (you may start out by copying flume-site.xml.template and removing the body of the file) to include: <configuration> <property> <name>flume.plugin.classes</name> <value>com.gemini.logprocessing.cassandra.CDRCassandraSink</value> <description>Comma separated list of plugin classes</description> </property> </configuration> scripts ======= loggen.pl will write sample CDR entries to /tmp/cdr.log. We can use this script for testing our setup. Usage ----- This plugin primarily targets CDR log storage right now. 1) The following needs to be installed in cassandra using cli connect <hostname>/9160; create keyspace CDRLogs with replication_factor = 2 and placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'; use CDRLogs; create column family MSISDNTimeline with column_type = 'Standard' and comparator = 'BytesType'; create column family CDREntry with column_type = 'Standard' and comparator = 'BytesType'; create column family HourlyTimeline with column_type = 'Standard' and comparator = 'BytesType'; 2) In flume config you call this sink as CDRCassandraSink("cassandra_host:cassandra_port",ColumnFamilyForRawCDR); where cassandra_host:cassandra_port - cassandra host/port combination ColumnFamilyforRawCDR - CF where raw cdr entries for this market are to be stored. 3) In our test environment, we had NodeM - running flume master, NodeA - running flume agent and NodeC - running flume collector & cassandra-0.7.2 3.1) On NodeM 3.1.1) Export all environment variables. 3.1.2) cd $FLUME_HOME; bin/flume master 3.1.3) http://NodeM:35871/flumemaster.jsp will all active nodes and their configuration. 3.2) On NodeA 3.2.1) Edit flume-site.xml and add NodeM as master 3.2.2) cd $FLUME_HOME; bin/flume node_nowatch 3.2.3) http://NodeA:35862/flumeagent.jsp will display statistics. 3.3) On NodeC 3.3.1) Edit flume-site.xml and add NodeM as master 3.3.2) cd $FLUME_HOME; bin/flume node_nowatch -n collector 3.3.3) http://NodeC:35862/flumeagent.jsp will display statistics. 4) Go to http://NodeM:35871/flumeconfig.jsp and configure the nodes. 4.1) For NodeA - Source is tail("/tmp/cdr.log") and Sink is agentSink("NodeC",35853) 4.2) For NodeC - Source is collectorSource(35853) and Sink is CDRCassandraSink("NodeC:9160", "CDRRaw_market1") 5) Go to http://NodeM:35871/flumemaster.jsp and if nodes were configured correctly, all nodes should show up as 'ACTIVE' 6) On NodeA - run the script perl loggen.pl (NOTE: This script will write to log file in a for(;;) loop) 7) Verify data in cassandra using cassandra-cli; Issues ------ 1) CDR format currently supported is of form operatorId,operatorMarket,transactionId,cdrType,messageTimestamp,moIMSI,moIP,mtIP,PTN,msgType,moDomain,mtDomain