Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Hoop (HDFS over HTTP) Plugin for Fluentd data collector
Ruby
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib/fluent/plugin fix to use fluent-mixin-plaintextparser
test
vendor dirty fluentd position
.document Initial commit to fluent-plugin-hoop.
.gitignore update
.gitmodules add fluentd as submodule
AUTHORS update
Gemfile fix to use fluent-mixin-plaintextparser
LICENSE.txt update
README.rdoc update for CDH3u4
Rakefile fix to use fluent-mixin-plaintextparser
VERSION
fluent-plugin-hoop.gemspec v0.1.5: fix dependency

README.rdoc

Hoop plugin for Fluentd

Component

HoopOutput

Store fluent-event as plain text to HDFS, over Hoop (HTTP REST API for HDFS).

Hoop is originally written in Cloudera, and merged on Apache Hadoop 0.23 tree, but there is no API compatibility. See:

Cloudera Hoop doc (obsolete)

cloudera.github.com/hoop/docs/latest/index.html

Apache Hadoop dev doc

github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/index.apt.vm

If you want to use HttpFs (or WebHDFS), use 'fluent-plugin-webhdfs'.

HoopOutput slices data by time (for specified units), and store these data as plain text on hdfs. You can specify to:

  • format whole data as serialized JSON, single attribute or separated multi attributes

  • include time as line header, or not

  • include tag as line header, or not

  • change field separator (default: TAB)

On your hadoop cluster (especially on CDH3u4), you MUST configure 'append' operation enabled by 'dfs.support.broken.append' property. See: groups.google.com/a/cloudera.org/d/topic/cdh-user/eJY6R4ii2X0/discussion

Configuration

HoopOutput

Minimal configuration (output: TAB separated time,tag,json-serialized-data and terminated with newline):

<match hoop.**>
  type hoop
  hoop_server hoop-server.local:14000

  # %Y %m %d %H %M %S are available as conversion specifications in path on hdfs
  # If '%Y%m%d' specified, logs are sliced into per-day files automatically.
  path /hoop/log-%Y%m/log-%Y%m%d.log

  # 'username' is used pseudo authentication, see http://cloudera.github.com/hoop/docs/latest/HttpRestApi.html
  username hoopuser
</match>

You will get output like below in hdfs file such as '/hoop/log-201112/log-20111231.log'

2011-12-31T13:14:15Z [TAB] hoop.foo.bar [TAB] {"field1":12345,"field2":"one two three four five","field3":"OK"} [terminated by newline]
2011-12-31T21:22:23Z [TAB] hoop.foo.val [TAB] {"field1":23456,"field2":"two three four five six","field3":"BAD"} [terminated by newline]

Single attribute with tag (removed prefix 'hoop.'), without time, separated by SPACE and NOT to terminate by newline ('message' data will be terminated with newline).

<match hoop.**>
  type hoop
  hoop_server hoop-server.local:14000
  path /hoop/log-%Y%m/log-%Y%m%d-%H.log
  username hoopuser

  output_include_time false
  output_include_tag true

  # If you want multiple attribute, specify like 'attr:field1,field2,field3'
  output_data_type attr:message

  # field_separator allows 'SPACE', 'COMMA' and 'TAB'(default)
  field_separator SPACE

  # add_newline 's default is true
  add_newline false

  # tag 'hoop.foo.bar' is shrinked as 'foo.bar'
  remove_prefix hoop

  # used for tags only remove_prefix string, like 'hoop'
  default_tag unknown
</match>

TODO

  • consider what to do next

  • patches welcome!

Copyright

Copyright

Copyright © 2011- TAGOMORI Satoshi (tagomoris)

License

Apache License, Version 2.0

Something went wrong with that request. Please try again.