Integration between Cloudera's Flume and ElasticSearch
Switch branches/tags
Nothing to show
Pull request Compare This branch is 77 commits behind Aconex:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Using ElasticSearch Flume integration

* have Flume installed, or at least cloned from the Flume git repo,
    if not, go here , and build it (currently using 'ant', but follow their docs).

    From here on, this Flume directory will be referred to as FLUME_HOME

* Have ElasticSearch installed locally, we'll assume that from a Getting Started point of view you have a local
  ElasticSearch server running locally, if not go here

Getting Started with elasticflume

0. First, setup some environment variables to your local paths, to make the following steps simpler:
    export FLUME_HOME=<path to where you have Flume checkedout/installed>
    export ELASTICSEARCH_HOME=<path to where you have ElasticSearch checked out>

    export ELASTICFLUME_HOME=path to where you have elasticflume checked out>

            (Be careful with these last 2 env vars because they are deceivingly similar)

1. Build it using Maven:

    1.1 Install the Flume library into your local Maven repo (because it's not available in central)
        Note: the below assumes you have done a 'git clone' of the Flume source, and have built it.

        mvn install:install-file -DgroupId=com.cloudera -DartifactId=flume -Dversion=0.9.1-dev -Dclassifier=core -Dfile=$FLUME_HOME/build/flume-0.9.1-dev-core.jar -Dpackaging=jar

    1.2 Build elasticflume
    mvn package

2. Now add the elasticflume jar into the classpath too, I do this personally with a symlink for testing, but copying is probably a better idea.. :):

    ln -s $ELASTICFLUME_HOME/target/elasticflume-1.0.0-SNAPSHOT-jar-with-dependencies.jar $FLUME_HOME/lib/

3. Ensure your Flume config is correct, check the $FLUME_HOME/conf/flume-conf.xml correctly identifies your local master, you
    may have to copy the template file that's in that directory to be 'flume-conf.xml' and then add the following:

    <description>A comma-separated list of hostnames, one for each
      machine in the Flume Master.

  ... (the above may not be necessary, because it's the default, but I had to do it for some reason).

  You will also need to register the elasticflume plugin via creating a new a property block:

      <description>Comma separated list of plugins</description>

4. Startup Flume Master, and Flume nodes, you will need 2 different shells here.
    cd $FLUME_HOME
    bin/flume master

        VERIFY that you see in the startup log for the master the following log line, if you don't see this, you've missed at least Step 3:

        2010-09-14 14:20:53,861 [main] INFO conf.SinkFactoryImpl: Found sink builder elasticSearchSink in org.elasticsearch.flume.ElasticSearchSink

    bin/flume node_nowatch

5. Setup a basic console based source so you can type in data manually and have it indexed (pretending to be a log message)
    cd $FLUME_HOME
    bin/flume shell -c localhost -e "exec config localhost 'console' 'elasticSearchSink'"

    NOTE: For some reason my local testing Flume installaton used a default node name of my IP address, and not
        'localhost' which it is often.  If things are not working properly, you should check by:

        bin/flume shell -c localhost -e "getnodestatus"

       If you see a node listed using an IP address, then you may need to then map that to localhost inside flume with
       a logical name by doing this:

       bin/flume shell -c localhost -e "map <IP ADDRESS> localhost"

6. NOW FOR THE TEST! :)  In the console window you started the "node_nowatch" above,
   type (and yes, straight after all those log messages, just start typing, trust me..):

    hello world
    hello there good sir

    (ie. that is, type the 2 lines ensuring you press return after each)

7. Verify you can search for your "Hello World" log, in another console, use curl to search your local elasticsearch node:

    curl -XGET 'http://localhost:9200/flume/_search?pretty=true' -d '
        "query" : {
            "term" : { "message" : "hello" }

    You should get a pretty printed JSON formatted search results, something like:

  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  "hits" : {
    "total" : 2,
    "max_score" : 1.1976817,
    "hits" : [ {
      "_index" : "flume",
      "_type" : "LOG",
      "_id" : "4e5a6f5b-1dd3-4bb6-9fd9-c8d785f39680",
      "_score" : 1.1976817, "_source" : {"message":"hello world","timestamp":"2010-09-14T03:19:36.857Z","host":"","priority":"INFO"}
    }, {
      "_index" : "flume",
      "_type" : "LOG",
      "_id" : "c77c18cc-af40-4362-b20b-193e5a3f6ff5",
      "_score" : 0.8465736, "_source" : {"message":"hello there good sir","timestamp":"2010-09-14T03:28:04.168Z","host":"","priority":"INFO"}
    } ]

8. Go to the ElasticSearch website and learn all about the REST and other APIs for searching an ElasticSearch index.