GitHub - jmhsieh/elasticflume: Integration between Cloudera's Flume and ElasticSearch

jmhsieh / elasticflume Public

Notifications You must be signed in to change notification settings
Fork 17
Star 2

Integration between Cloudera's Flume and ElasticSearch

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src/main/java/org/elasticsearch/flume		src/main/java/org/elasticsearch/flume
.gitignore		.gitignore
README		README
pom.xml		pom.xml

Repository files navigation


Using ElasticSearch Flume integration

Pre-Conditions:
* have Flume installed, or at least cloned from the Flume git repo,
    if not, go here http://github.com/cloudera/flume , and build it (currently using 'ant', but follow their docs).

    From here on, this Flume directory will be referred to as FLUME_HOME

* Have ElasticSearch installed locally, we'll assume that from a Getting Started point of view you have a local
  ElasticSearch server running locally, if not go here http://github.com/elasticsearch/elasticsearch


Getting Started with elasticflume

0. First, setup some environment variables to your local paths, to make the following steps simpler:
    export FLUME_HOME=<path to where you have Flume checkedout/installed>
    export ELASTICSEARCH_HOME=<path to where you have ElasticSearch checked out>

    export ELASTICFLUME_HOME=path to where you have elasticflume checked out>

            (Be careful with these last 2 env vars because they are deceivingly similar)

1. Build it using Maven:

    1.1 Install the Flume library into your local Maven repo (because it's not available in central)
        Note: the below assumes you have done a 'git clone' of the Flume source, and have built it.

        mvn install:install-file -DgroupId=com.cloudera -DartifactId=flume -Dversion=0.9.1-dev -Dclassifier=core -Dfile=$FLUME_HOME/build/flume-0.9.1-dev-core.jar -Dpackaging=jar


    1.2 Build elasticflume
    cd $ELASTICFLUME_HOME
    mvn package

2. Now add the elasticflume jar into the classpath too, I do this personally with a symlink for testing, but copying is probably a better idea.. :):

    ln -s $ELASTICFLUME_HOME/target/elasticflume-1.0.0-SNAPSHOT-jar-with-dependencies.jar $FLUME_HOME/lib/

3. Ensure your Flume config is correct, check the $FLUME_HOME/conf/flume-conf.xml correctly identifies your local master, you
    may have to copy the template file that's in that directory to be 'flume-conf.xml' and then add the following:

  <property>
    <name>flume.master.servers</name>
    <value>localhost</value>
    <description>A comma-separated list of hostnames, one for each
      machine in the Flume Master.
    </description>
  </property>

  ... (the above may not be necessary, because it's the default, but I had to do it for some reason).

  You will also need to register the elasticflume plugin via creating a new a property block:

  <property>
      <name>flume.plugin.classes</name>
      <value>org.elasticsearch.flume.ElasticSearchSink</value>
      <description>Comma separated list of plugins</description>
  </property>


4. Startup Flume Master, and Flume nodes, you will need 2 different shells here.
    cd $FLUME_HOME
    bin/flume master

        VERIFY that you see in the startup log for the master the following log line, if you don't see this, you've missed at least Step 3:

        2010-09-14 14:20:53,861 [main] INFO conf.SinkFactoryImpl: Found sink builder elasticSearchSink in org.elasticsearch.flume.ElasticSearchSink


    bin/flume node_nowatch


5. Setup a basic console based source so you can type in data manually and have it indexed (pretending to be a log message)
    cd $FLUME_HOME
    bin/flume shell -c localhost -e "exec config localhost 'console' 'elasticSearchSink'"

    NOTE: For some reason my local testing Flume installaton used a default node name of my IP address, and not
        'localhost' which it is often.  If things are not working properly, you should check by:

        bin/flume shell -c localhost -e "getnodestatus"

       If you see a node listed using an IP address, then you may need to then map that to localhost inside flume with
       a logical name by doing this:

       bin/flume shell -c localhost -e "map <IP ADDRESS> localhost"


6. NOW FOR THE TEST! :)  In the console window you started the "node_nowatch" above,
   type (and yes, straight after all those log messages, just start typing, trust me..):

    hello world
    hello there good sir

    (ie. that is, type the 2 lines ensuring you press return after each)

7. Verify you can search for your "Hello World" log, in another console, use curl to search your local elasticsearch node:

    curl -XGET 'http://localhost:9200/flume/_search?pretty=true' -d '
    {
        "query" : {
            "term" : { "message" : "hello" }
        }
    }
    '

    You should get a pretty printed JSON formatted search results, something like:


  {
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.1976817,
    "hits" : [ {
      "_index" : "flume",
      "_type" : "LOG",
      "_id" : "4e5a6f5b-1dd3-4bb6-9fd9-c8d785f39680",
      "_score" : 1.1976817, "_source" : {"message":"hello world","timestamp":"2010-09-14T03:19:36.857Z","host":"192.168.1.170","priority":"INFO"}
    }, {
      "_index" : "flume",
      "_type" : "LOG",
      "_id" : "c77c18cc-af40-4362-b20b-193e5a3f6ff5",
      "_score" : 0.8465736, "_source" : {"message":"hello there good sir","timestamp":"2010-09-14T03:28:04.168Z","host":"192.168.1.170","priority":"INFO"}
    } ]
  }
  }


8. Go to the ElasticSearch website and learn all about the REST and other APIs for searching an ElasticSearch index.