Skip to content

Integration between Cloudera's Flume and ElasticSearch

Notifications You must be signed in to change notification settings

jmhsieh/elasticflume

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation


Using ElasticSearch Flume integration

Pre-Conditions:
* have Flume installed, or at least cloned from the Flume git repo,
    if not, go here http://github.com/cloudera/flume , and build it (currently using 'ant', but follow their docs).

    From here on, this Flume directory will be referred to as FLUME_HOME

* Have ElasticSearch installed locally, we'll assume that from a Getting Started point of view you have a local
  ElasticSearch server running locally, if not go here http://github.com/elasticsearch/elasticsearch


Getting Started with elasticflume

0. First, setup some environment variables to your local paths, to make the following steps simpler:
    export FLUME_HOME=<path to where you have Flume checkedout/installed>
    export ELASTICSEARCH_HOME=<path to where you have ElasticSearch checked out>

    export ELASTICFLUME_HOME=path to where you have elasticflume checked out>

            (Be careful with these last 2 env vars because they are deceivingly similar)

1. Build it using Maven:

    1.1 Install the Flume library into your local Maven repo (because it's not available in central)
        Note: the below assumes you have done a 'git clone' of the Flume source, and have built it.

        mvn install:install-file -DgroupId=com.cloudera -DartifactId=flume -Dversion=0.9.1-dev -Dclassifier=core -Dfile=$FLUME_HOME/build/flume-0.9.1-dev-core.jar -Dpackaging=jar


    1.2 Build elasticflume
    cd $ELASTICFLUME_HOME
    mvn package

2. Now add the elasticflume jar into the classpath too, I do this personally with a symlink for testing, but copying is probably a better idea.. :):

    ln -s $ELASTICFLUME_HOME/target/elasticflume-1.0.0-SNAPSHOT-jar-with-dependencies.jar $FLUME_HOME/lib/

3. Ensure your Flume config is correct, check the $FLUME_HOME/conf/flume-conf.xml correctly identifies your local master, you
    may have to copy the template file that's in that directory to be 'flume-conf.xml' and then add the following:

  <property>
    <name>flume.master.servers</name>
    <value>localhost</value>
    <description>A comma-separated list of hostnames, one for each
      machine in the Flume Master.
    </description>
  </property>

  ... (the above may not be necessary, because it's the default, but I had to do it for some reason).

  You will also need to register the elasticflume plugin via creating a new a property block:

  <property>
      <name>flume.plugin.classes</name>
      <value>org.elasticsearch.flume.ElasticSearchSink</value>
      <description>Comma separated list of plugins</description>
  </property>


4. Startup Flume Master, and Flume nodes, you will need 2 different shells here.
    cd $FLUME_HOME
    bin/flume master

        VERIFY that you see in the startup log for the master the following log line, if you don't see this, you've missed at least Step 3:

        2010-09-14 14:20:53,861 [main] INFO conf.SinkFactoryImpl: Found sink builder elasticSearchSink in org.elasticsearch.flume.ElasticSearchSink


    bin/flume node_nowatch


5. Setup a basic console based source so you can type in data manually and have it indexed (pretending to be a log message)
    cd $FLUME_HOME
    bin/flume shell -c localhost -e "exec config localhost 'console' 'elasticSearchSink'"

    NOTE: For some reason my local testing Flume installaton used a default node name of my IP address, and not
        'localhost' which it is often.  If things are not working properly, you should check by:

        bin/flume shell -c localhost -e "getnodestatus"

       If you see a node listed using an IP address, then you may need to then map that to localhost inside flume with
       a logical name by doing this:

       bin/flume shell -c localhost -e "map <IP ADDRESS> localhost"


6. NOW FOR THE TEST! :)  In the console window you started the "node_nowatch" above,
   type (and yes, straight after all those log messages, just start typing, trust me..):

    hello world
    hello there good sir

    (ie. that is, type the 2 lines ensuring you press return after each)

7. Verify you can search for your "Hello World" log, in another console, use curl to search your local elasticsearch node:

    curl -XGET 'http://localhost:9200/flume/_search?pretty=true' -d '
    {
        "query" : {
            "term" : { "message" : "hello" }
        }
    }
    '

    You should get a pretty printed JSON formatted search results, something like:


  {
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.1976817,
    "hits" : [ {
      "_index" : "flume",
      "_type" : "LOG",
      "_id" : "4e5a6f5b-1dd3-4bb6-9fd9-c8d785f39680",
      "_score" : 1.1976817, "_source" : {"message":"hello world","timestamp":"2010-09-14T03:19:36.857Z","host":"192.168.1.170","priority":"INFO"}
    }, {
      "_index" : "flume",
      "_type" : "LOG",
      "_id" : "c77c18cc-af40-4362-b20b-193e5a3f6ff5",
      "_score" : 0.8465736, "_source" : {"message":"hello there good sir","timestamp":"2010-09-14T03:28:04.168Z","host":"192.168.1.170","priority":"INFO"}
    } ]
  }
  }


8. Go to the ElasticSearch website and learn all about the REST and other APIs for searching an ElasticSearch index.

About

Integration between Cloudera's Flume and ElasticSearch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published