New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhdfs output plugin #1

Merged
merged 31 commits into from Jul 28, 2015

Conversation

Projects
None yet
5 participants
@dstore-dbap
Collaborator

dstore-dbap commented May 28, 2015

PR as discussed in elastic/logstash#3274.

For more information about contributing, see the [CONTRIBUTING](https://github.com/elasticsearch/logstash/blob/master/CONTRIBUTING.md) file.

This comment has been minimized.

@purbon

purbon May 28, 2015

Contributor

I can not see the modification in here, can you point me it? is the end of line?

@purbon

purbon May 28, 2015

Contributor

I can not see the modification in here, can you point me it? is the end of line?

This comment has been minimized.

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

It has to be a newline. Commandline diff does not show any difference...

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

It has to be a newline. Commandline diff does not show any difference...

@purbon

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
config_name "webhdfs"
milestone 1
if RUBY_VERSION[0..2] == '1.8'

This comment has been minimized.

@purbon

purbon May 28, 2015

Contributor

To do this you can actually use Logstash::Environment.ruby_abi_version.

@purbon

purbon May 28, 2015

Contributor

To do this you can actually use Logstash::Environment.ruby_abi_version.

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

Logstash requires Ruby 1.9, so we don't need to check for 1.8

@jordansissel

jordansissel May 28, 2015

Contributor

Logstash requires Ruby 1.9, so we don't need to check for 1.8

#
# If this setting is omitted, the full json representation of the
# event will be written as a single line.
config :message_format, :validate => :string

This comment has been minimized.

@purbon

purbon May 28, 2015

Contributor

Does this variable modify somehow the event? I don't actually see the hole picture of it, can you educate me?

@purbon

purbon May 28, 2015

Contributor

Does this variable modify somehow the event? I don't actually see the hole picture of it, can you educate me?

This comment has been minimized.

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

The var message_format can be used to convert the event data via sprintf to a string, containing only selected fields, prior to be written to disk. E.g.:

message_format => '%{source} - %{message}'

It will not modify the event. Hope this clarifies this a bit.

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

The var message_format can be used to convert the event data via sprintf to a string, containing only selected fields, prior to be written to disk. E.g.:

message_format => '%{source} - %{message}'

It will not modify the event. Hope this clarifies this a bit.

This comment has been minimized.

@andrewvc

andrewvc Jun 29, 2015

Is there a reason this isn't implemented via a codec? The Line codec might work here: https://github.com/logstash-plugins/logstash-codec-line

@andrewvc

andrewvc Jun 29, 2015

Is there a reason this isn't implemented via a codec? The Line codec might work here: https://github.com/logstash-plugins/logstash-codec-line

@purbon

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
begin
require 'webhdfs'
rescue LoadError
@logger.error("Module webhdfs could not be loaded.")

This comment has been minimized.

@purbon

purbon May 28, 2015

Contributor

This apply to the other require too, can the plugin work if some of this modules are not available? The way this is done here it would catch the error, report it, but got the error masqueraded.

@purbon

purbon May 28, 2015

Contributor

This apply to the other require too, can the plugin work if some of this modules are not available? The way this is done here it would catch the error, report it, but got the error masqueraded.

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

This rescue also swallows the error and continues; should we reraise?

@jordansissel

jordansissel May 28, 2015

Contributor

This rescue also swallows the error and continues; should we reraise?

@purbon

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
rescue => e
@logger.error("Webhdfs check request failed. (namenode: #{@client.host}:#{@client.port}, Exception: #{e.message})")
end
buffer_initialize(

This comment has been minimized.

@purbon

purbon May 28, 2015

Contributor

Indentation in here should be in one line, as it makes the hole function more clear.

@purbon

purbon May 28, 2015

Contributor

Indentation in here should be in one line, as it makes the hole function more clear.

@jordansissel

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
#
# [source,ruby]
# ----------------------------------
# webhdfs {

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

This should be a complete example, including the 'output' stanza as well.

@jordansissel

jordansissel May 28, 2015

Contributor

This should be a complete example, including the 'output' stanza as well.

@jordansissel

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
# path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{+HH}.log" # (required)
# user => "hue" # (optional)
# message_format => "%{@source_host}" # (optional)
# idle_flush_time => 10 # (optional)

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

If these settings are optional and have good defaults, I recommend not showing them in the example text.

@jordansissel

jordansissel May 28, 2015

Contributor

If these settings are optional and have good defaults, I recommend not showing them in the example text.

This comment has been minimized.

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

Removed optional fields from example in above mentioned commit.

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

Removed optional fields from example in above mentioned commit.

@jordansissel

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
MINIMUM_COMPATIBLE_VERSION = 1
# The server name and port for webhdfs/httpfs connections.
config :server, :validate => :string, :required => true

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

Let's make these settings separate, "host" and "port"

config :host, :validate => :string, :required => true
config :port, :validate => :number, :default => ....

(The default for port assumes there's a default port, otherwise make it required)

@jordansissel

jordansissel May 28, 2015

Contributor

Let's make these settings separate, "host" and "port"

config :host, :validate => :string, :required => true
config :port, :validate => :number, :default => ....

(The default for port assumes there's a default port, otherwise make it required)

This comment has been minimized.

@jordansissel

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
config :server, :validate => :string, :required => true
# The Username for webhdfs.
config :user, :validate => :string, :required => false

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

:required => false is not necessary.

@jordansissel

jordansissel May 28, 2015

Contributor

:required => false is not necessary.

This comment has been minimized.

@jordansissel

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
# Sending data to webhdfs if event count is above, even if store_interval_in_secs is not reached.
config :flush_size, :validate => :number, :default => 500
# WebHdfs open timeout, default 30s (in ruby net/http).

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

Don't mention (in ruby net/http) because users are not needing to know this level of implementation detail.

@jordansissel

jordansissel May 28, 2015

Contributor

Don't mention (in ruby net/http) because users are not needing to know this level of implementation detail.

@jordansissel

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
# WebHdfs open timeout, default 30s (in ruby net/http).
config :open_timeout, :validate => :number, :default => 30
# The WebHdfs read timeout, default 30s (in ruby net/http).

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

Don't mention (in ruby net/http) because users are not needing to know this level of implementation detail.

@jordansissel

jordansissel May 28, 2015

Contributor

Don't mention (in ruby net/http) because users are not needing to know this level of implementation detail.

This comment has been minimized.

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

see above ;)

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

see above ;)

@jordansissel

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
# How long should we wait between retries.
config :retry_interval, :validate => :number, :default => 0.5
# How many times should we retry.

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

Can you document the behavior if the retry_times count is exceeded?

@jordansissel

jordansissel May 28, 2015

Contributor

Can you document the behavior if the retry_times count is exceeded?

This comment has been minimized.

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

Done in https://github.com/dstore-dbap/logstash-output-webhdfs-1/commit/ff2ea845c2ec1a1f12018c65b0ce6165ae3e23c2
Although I'm not sure if discarding the event after retry_times is a good solution. Should we exit? What is the preferred way of coping with failing storage backends?

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

Done in https://github.com/dstore-dbap/logstash-output-webhdfs-1/commit/ff2ea845c2ec1a1f12018c65b0ce6165ae3e23c2
Although I'm not sure if discarding the event after retry_times is a good solution. Should we exit? What is the preferred way of coping with failing storage backends?

@jordansissel

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
config :snappy_format, :validate => ["stream", "file"], :default => "stream"
# Remove @timestamp field. Hive does not like a leading "@", but we need @timestamp for path calculation.
config :remove_at_timestamp, :validate => :boolean, :default => true

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

Seems like an alternate codec could be chosen instead of this remove_at_timestamp setting?

@jordansissel

jordansissel May 28, 2015

Contributor

Seems like an alternate codec could be chosen instead of this remove_at_timestamp setting?

This comment has been minimized.

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

The problem arises when hive is used to query fields from a file in hdfs. It seems to have a problem with the leading at-sign. Instead of dropping the field completely, the at-sign could be removed/replaced. What do you think?

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

The problem arises when hive is used to query fields from a file in hdfs. It seems to have a problem with the leading at-sign. Instead of dropping the field completely, the at-sign could be removed/replaced. What do you think?

This comment has been minimized.

@purbon

purbon Jun 3, 2015

Contributor

I think what jordan says is to use a codec, so to do the changes, so people using hive can decide to use this coded and rename the @ in a more clean/organized way in sync with other plugins. Codecs are thinks like: https://github.com/logstash-plugins/logstash-codec-json

I like the idea of having a codec in here so everything is more like in every plugin. Makes sense?

@purbon

purbon Jun 3, 2015

Contributor

I think what jordan says is to use a codec, so to do the changes, so people using hive can decide to use this coded and rename the @ in a more clean/organized way in sync with other plugins. Codecs are thinks like: https://github.com/logstash-plugins/logstash-codec-json

I like the idea of having a codec in here so everything is more like in every plugin. Makes sense?

This comment has been minimized.

@dstore-dbap

dstore-dbap Jun 12, 2015

Collaborator

@purbon Sorry to take so long getting back to you. We had some major issues this week which took up all my time... Just to make sure if I got you right: For a new codec, I would need to create a new repo with the codec code and issue a pull for this as well?

@dstore-dbap

dstore-dbap Jun 12, 2015

Collaborator

@purbon Sorry to take so long getting back to you. We had some major issues this week which took up all my time... Just to make sure if I got you right: For a new codec, I would need to create a new repo with the codec code and issue a pull for this as well?

This comment has been minimized.

@purbon

purbon Jun 12, 2015

Contributor

Yes, basically a codec is another pluguin. You can start doing that in your
personal account and then next week we can move it here as we did for this
one. Makes sense?

Thanks a lot for your time man!

On Friday, June 12, 2015, Björn notifications@github.com wrote:

In lib/logstash/outputs/webhdfs.rb
#1 (comment)
:

  • How many times should we retry.

  • config :retry_times, :validate => :number, :default => 5
  • Compress output. One of ['none', 'snappy', 'gzip']

  • config :compression, :validate => ["none", "snappy", "gzip"], :default => "none"
  • Set snappy format. One of "stream", "file". Set to stream to be hive compatible.

  • config :snappy_format, :validate => ["stream", "file"], :default => "stream"
  • Remove @timestamp field. Hive does not like a leading "@", but we need @timestamp for path calculation.

  • config :remove_at_timestamp, :validate => :boolean, :default => true

@purbon https://github.com/purbon Sorry to take so long getting back to
you. We had some major issues this week which took up all my time... Just
to make sure if I got you right: For a new codec, I would need to create a
new repo with the codec code and issue a pull for this as well?


Reply to this email directly or view it on GitHub
https://github.com/logstash-plugins/logstash-output-webhdfs/pull/1/files#r32305065
.

Pere Urbon-Bayes
Software Engineer
https://www.elastic.co
https://twitter.com/purbon

@purbon

purbon Jun 12, 2015

Contributor

Yes, basically a codec is another pluguin. You can start doing that in your
personal account and then next week we can move it here as we did for this
one. Makes sense?

Thanks a lot for your time man!

On Friday, June 12, 2015, Björn notifications@github.com wrote:

In lib/logstash/outputs/webhdfs.rb
#1 (comment)
:

  • How many times should we retry.

  • config :retry_times, :validate => :number, :default => 5
  • Compress output. One of ['none', 'snappy', 'gzip']

  • config :compression, :validate => ["none", "snappy", "gzip"], :default => "none"
  • Set snappy format. One of "stream", "file". Set to stream to be hive compatible.

  • config :snappy_format, :validate => ["stream", "file"], :default => "stream"
  • Remove @timestamp field. Hive does not like a leading "@", but we need @timestamp for path calculation.

  • config :remove_at_timestamp, :validate => :boolean, :default => true

@purbon https://github.com/purbon Sorry to take so long getting back to
you. We had some major issues this week which took up all my time... Just
to make sure if I got you right: For a new codec, I would need to create a
new repo with the codec code and issue a pull for this as well?


Reply to this email directly or view it on GitHub
https://github.com/logstash-plugins/logstash-output-webhdfs/pull/1/files#r32305065
.

Pere Urbon-Bayes
Software Engineer
https://www.elastic.co
https://twitter.com/purbon

This comment has been minimized.

@dstore-dbap

dstore-dbap Jun 16, 2015

Collaborator

@purbon Got some time today to look into this. I think a separate codec for this functionality is a bit over the top. The problem will only arise when this plugin is used to write the event data as json to hdfs and hive is used to analyze it via the org.apache.hcatalog.data.JsonSerDe. In this scenario, the @ prefix will be treated as a sql variable, which will cause hive to bail out on this field. Still, this can also be solved by configuring message_format correctly.
So I would recommend just to drop this configuration from the plugin. What do you think?

@dstore-dbap

dstore-dbap Jun 16, 2015

Collaborator

@purbon Got some time today to look into this. I think a separate codec for this functionality is a bit over the top. The problem will only arise when this plugin is used to write the event data as json to hdfs and hive is used to analyze it via the org.apache.hcatalog.data.JsonSerDe. In this scenario, the @ prefix will be treated as a sql variable, which will cause hive to bail out on this field. Still, this can also be solved by configuring message_format correctly.
So I would recommend just to drop this configuration from the plugin. What do you think?

@jordansissel

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
def prepare_client(host, port, username)
client = WebHDFS::Client.new(host, port, username)
if @use_httpfs
client.httpfs_mode = true

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

Can just write this as: client.httpfs_mode = @use_httpfs

@jordansissel

jordansissel May 28, 2015

Contributor

Can just write this as: client.httpfs_mode = @use_httpfs

@jordansissel

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
client.open_timeout = @open_timeout
client.read_timeout = @read_timeout
if @retry_known_errors
client.retry_known_errors = true

This comment has been minimized.

@jordansissel

jordansissel May 28, 2015

Contributor

You can avoid the 'if ...' by doing:

client.retry_known_errors = @retry_known_errors
@jordansissel

jordansissel May 28, 2015

Contributor

You can avoid the 'if ...' by doing:

client.retry_known_errors = @retry_known_errors

This comment has been minimized.

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

See above

@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

See above

@dstore-dbap

This comment has been minimized.

Show comment
Hide comment
@dstore-dbap

dstore-dbap May 29, 2015

Collaborator

@purbon @jordansissel Thank you both for your quick and detailed feedback! Will try to implement the requested changes today.

Collaborator

dstore-dbap commented May 29, 2015

@purbon @jordansissel Thank you both for your quick and detailed feedback! Will try to implement the requested changes today.

@dstore-dbap

This comment has been minimized.

Show comment
Hide comment
@dstore-dbap

dstore-dbap Jun 1, 2015

Collaborator

@purbon @jordansissel Implemented the requested changes. When your time allows it, please let me know if you have more feedback/changes you wish to see in the code.

Collaborator

dstore-dbap commented Jun 1, 2015

@purbon @jordansissel Implemented the requested changes. When your time allows it, please let me know if you have more feedback/changes you wish to see in the code.

@purbon

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
raise
end
if @compression == "gzip"
begin

This comment has been minimized.

@purbon

purbon Jun 1, 2015

Contributor

I see a common pattern on this loading routines, might be a good idea to have them in their own function, isn't?

@purbon

purbon Jun 1, 2015

Contributor

I see a common pattern on this loading routines, might be a good idea to have them in their own function, isn't?

This comment has been minimized.

@purbon

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
def flush(events=nil, teardown=false)
return if not events
# Avoid creating a new string for newline every time
newline = "\n".freeze

This comment has been minimized.

@purbon

purbon Jun 1, 2015

Contributor

why do you want to freeze this?

@purbon

purbon Jun 1, 2015

Contributor

why do you want to freeze this?

This comment has been minimized.

@purbon purbon self-assigned this Jun 1, 2015

@purbon purbon added the enhancement label Jun 1, 2015

@purbon

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
end
end
write_tries = 0
while write_tries < @retry_times do

This comment has been minimized.

@purbon

purbon Jun 1, 2015

Contributor

You can implement all this into the write_data function by leveraging the retry notation of ruby, mode intel on http://www.tutorialspoint.com/ruby/ruby_loops.htm. This would avoid actually the break and next usage, plus simplifying the code.

@purbon

purbon Jun 1, 2015

Contributor

You can implement all this into the write_data function by leveraging the retry notation of ruby, mode intel on http://www.tutorialspoint.com/ruby/ruby_loops.htm. This would avoid actually the break and next usage, plus simplifying the code.

@purbon

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
# Encode data to ASCII_8BIT (binary)
data= data.encode(Encoding::ASCII_8BIT, "binary", :undef => :replace)
buffer = StringIO.new('', 'w')
buffer.set_encoding Encoding::ASCII_8BIT unless RUBY_VERSION =~ /^1\.8/

This comment has been minimized.

@purbon

purbon Jun 1, 2015

Contributor

as @jordansissel said, you can avoid checking for 1.8 ruby version.

@purbon

purbon Jun 1, 2015

Contributor

as @jordansissel said, you can avoid checking for 1.8 ruby version.

This comment has been minimized.

@purbon

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
# Encode data to ASCII_8BIT (binary)
data= data.encode(Encoding::ASCII_8BIT, "binary", :undef => :replace)
buffer = StringIO.new
buffer.set_encoding Encoding::ASCII_8BIT unless RUBY_VERSION =~ /^1\.8/

This comment has been minimized.

@purbon

purbon Jun 1, 2015

Contributor

same thing here about 1.8 version.

@purbon

purbon Jun 1, 2015

Contributor

same thing here about 1.8 version.

This comment has been minimized.

@dstore-dbap

dstore-dbap Jun 2, 2015

Collaborator

See above

@dstore-dbap

dstore-dbap Jun 2, 2015

Collaborator

See above

@purbon

View changes

Show outdated Hide outdated spec/outputs/webhdfs_spec.rb
it 'should have default config values' do
subject = LogStash::Plugin.lookup("output", "webhdfs").new(default_config)
insist { subject.port } == 50070

This comment has been minimized.

@purbon

purbon Jun 1, 2015

Contributor

We're trying to not use insist anymore, you can use expect(....).to eq(...) in exchange, this is standard rspec3 usage.

@purbon

purbon Jun 1, 2015

Contributor

We're trying to not use insist anymore, you can use expect(....).to eq(...) in exchange, this is standard rspec3 usage.

@purbon

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
if @message_format
event_as_string = event.sprintf(@message_format)
else
event_as_string = event.to_json

This comment has been minimized.

@purbon

purbon Jun 1, 2015

Contributor

To transform to json you should use the code provided in https://github.com/elastic/logstash/blob/master/lib/logstash/json.rb, that leverages jackson to be faster.

@purbon

purbon Jun 1, 2015

Contributor

To transform to json you should use the code provided in https://github.com/elastic/logstash/blob/master/lib/logstash/json.rb, that leverages jackson to be faster.

This comment has been minimized.

@dstore-dbap

dstore-dbap Jun 2, 2015

Collaborator

Please correct me if I'm wrong here but should'nt event.to_json call LogStash::Json.dump(@DaTa)?

@dstore-dbap

dstore-dbap Jun 2, 2015

Collaborator

Please correct me if I'm wrong here but should'nt event.to_json call LogStash::Json.dump(@DaTa)?

This comment has been minimized.

@purbon

purbon Jun 3, 2015

Contributor

makes sense, stupid me! 👍 you are 100% right.

@purbon

purbon Jun 3, 2015

Contributor

makes sense, stupid me! 👍 you are 100% right.

@purbon

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
buffer = StringIO.new('','w')
compressor = Zlib::GzipWriter.new(buffer)
begin
compressor.write data

This comment has been minimized.

@purbon

purbon Jun 1, 2015

Contributor

Speaking about parentheses, I know this is controversial in ruby, but we try to have them always as it use to make the code clear.

@purbon

purbon Jun 1, 2015

Contributor

Speaking about parentheses, I know this is controversial in ruby, but we try to have them always as it use to make the code clear.

This comment has been minimized.

@dstore-dbap

dstore-dbap Jun 2, 2015

Collaborator

Done

@dstore-dbap

dstore-dbap Jun 2, 2015

Collaborator

Done

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jun 1, 2015

Contributor

@dstore-dbap I just made another round of comments, this is starting to get shape!! I will do some test runs locally as soon as possible.

Contributor

purbon commented Jun 1, 2015

@dstore-dbap I just made another round of comments, this is starting to get shape!! I will do some test runs locally as soon as possible.

@suyograo

This comment has been minimized.

Show comment
Hide comment
@suyograo

suyograo Jul 22, 2015

Contributor

@dstore-dbap can you sign the CLA? can you please perform step 2 of https://github.com/elasticsearch/logstash/blob/master/CONTRIBUTING.md#contribution-steps

Also @purbon is this ready to merge?

Contributor

suyograo commented Jul 22, 2015

@dstore-dbap can you sign the CLA? can you please perform step 2 of https://github.com/elasticsearch/logstash/blob/master/CONTRIBUTING.md#contribution-steps

Also @purbon is this ready to merge?

@dstore-dbap

This comment has been minimized.

Show comment
Hide comment
@dstore-dbap

dstore-dbap Jul 23, 2015

Collaborator

Just signed the CLA. The transaction id is: XQCFBXNJ2IXYXGR.

Collaborator

dstore-dbap commented Jul 23, 2015

Just signed the CLA. The transaction id is: XQCFBXNJ2IXYXGR.

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jul 24, 2015

Contributor

@dstore-dbap speaking about the CLA, I tried to find you, but didn't succeed (not by name of family name). Which email did you use? the thing is we use the commit email to do the validation, so they mostly need to be the same. Lets try to find this out and finally get this merged.

Contributor

purbon commented Jul 24, 2015

@dstore-dbap speaking about the CLA, I tried to find you, but didn't succeed (not by name of family name). Which email did you use? the thing is we use the commit email to do the validation, so they mostly need to be the same. Lets try to find this out and finally get this merged.

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jul 24, 2015

Contributor

@dstore-dbap

I tired running the test with the code of you last PR and found this issue:

  1) outputs/webhdfs when initializing should have default config values
     Failure/Error: subject = LogStash::Plugin.lookup("output", "webhdfs").new(default_config)
     LogStash::PluginLoadingError:
       Couldn't find any codec plugin named 'line'. Are you sure this is correct? Trying to load the line codec plugin resulted in this error: no such file to load -- logstash/codecs/line
     # ./spec/outputs/webhdfs_spec.rb:30:in `(root)'

  2) outputs/webhdfs when initializing should register with default values
     Failure/Error: subject = LogStash::Plugin.lookup("output", "webhdfs").new(default_config)
     LogStash::PluginLoadingError:
       Couldn't find any codec plugin named 'line'. Are you sure this is correct? Trying to load the line codec plugin resulted in this error: no such file to load -- logstash/codecs/line
     # ./spec/outputs/webhdfs_spec.rb:25:in `(root)'

Finished in 0.061 seconds (files took 1.66 seconds to load)
3 examples, 2 failures

You need to add the logstash-codec-line as a runtime dependency of the plugin.

Contributor

purbon commented Jul 24, 2015

@dstore-dbap

I tired running the test with the code of you last PR and found this issue:

  1) outputs/webhdfs when initializing should have default config values
     Failure/Error: subject = LogStash::Plugin.lookup("output", "webhdfs").new(default_config)
     LogStash::PluginLoadingError:
       Couldn't find any codec plugin named 'line'. Are you sure this is correct? Trying to load the line codec plugin resulted in this error: no such file to load -- logstash/codecs/line
     # ./spec/outputs/webhdfs_spec.rb:30:in `(root)'

  2) outputs/webhdfs when initializing should register with default values
     Failure/Error: subject = LogStash::Plugin.lookup("output", "webhdfs").new(default_config)
     LogStash::PluginLoadingError:
       Couldn't find any codec plugin named 'line'. Are you sure this is correct? Trying to load the line codec plugin resulted in this error: no such file to load -- logstash/codecs/line
     # ./spec/outputs/webhdfs_spec.rb:25:in `(root)'

Finished in 0.061 seconds (files took 1.66 seconds to load)
3 examples, 2 failures

You need to add the logstash-codec-line as a runtime dependency of the plugin.

@dstore-dbap

This comment has been minimized.

Show comment
Hide comment
@dstore-dbap

dstore-dbap Jul 24, 2015

Collaborator

@purbon: Oh, sorry - I did not know that the email address needs to be the same as the commit email.
I used b.puttmann@dbap.de for the CLA. Can you find it by this email or should I sign it again with the commit email?

Collaborator

dstore-dbap commented Jul 24, 2015

@purbon: Oh, sorry - I did not know that the email address needs to be the same as the commit email.
I used b.puttmann@dbap.de for the CLA. Can you find it by this email or should I sign it again with the commit email?

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jul 24, 2015

Contributor

@dstore-dbap I'm sorry man, can not find it either. I'm sorry for the headache, but I guess the easiest would be to do it again. Thanks for taking care of that. And for fixing the dependencies.

Contributor

purbon commented Jul 24, 2015

@dstore-dbap I'm sorry man, can not find it either. I'm sorry for the headache, but I guess the easiest would be to do it again. Thanks for taking care of that. And for fixing the dependencies.

@dstore-dbap

This comment has been minimized.

Show comment
Hide comment
@dstore-dbap

dstore-dbap Jul 24, 2015

Collaborator

@purbon just checked our github account settings. The email address was also set to b.puttmann@dbap.de. As point of contact I used my full name, Björn Puttmann. I could email you the document, that was mailed to me if that would be of any help...

Collaborator

dstore-dbap commented Jul 24, 2015

@purbon just checked our github account settings. The email address was also set to b.puttmann@dbap.de. As point of contact I used my full name, Björn Puttmann. I could email you the document, that was mailed to me if that would be of any help...

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jul 24, 2015

Contributor

@dstore-dbap did you sign the corporate one? or the personal one? I will do more checking, sorry for the inconveniences.

Contributor

purbon commented Jul 24, 2015

@dstore-dbap did you sign the corporate one? or the personal one? I will do more checking, sorry for the inconveniences.

@dstore-dbap

This comment has been minimized.

Show comment
Hide comment
@dstore-dbap

dstore-dbap Jul 24, 2015

Collaborator

Corporate one ;)

Collaborator

dstore-dbap commented Jul 24, 2015

Corporate one ;)

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jul 24, 2015

Contributor

There it is. 😄 we finally figure it out.

Contributor

purbon commented Jul 24, 2015

There it is. 😄 we finally figure it out.

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jul 24, 2015

Contributor

For now, no need to check it again, searching for the signature, will report as soon as possible. Thanks for the patience.

Contributor

purbon commented Jul 24, 2015

For now, no need to check it again, searching for the signature, will report as soon as possible. Thanks for the patience.

@dstore-dbap

This comment has been minimized.

Show comment
Hide comment
@dstore-dbap

dstore-dbap Jul 24, 2015

Collaborator

No problem, weekends coming - have a nice one ;)

Collaborator

dstore-dbap commented Jul 24, 2015

No problem, weekends coming - have a nice one ;)

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jul 27, 2015

Contributor

@dstore-dbap checkup positively your signature, thanks for the patience. Nowadays corporate licenses are more tricky to check for us fast enough as individual ones.

Contributor

purbon commented Jul 27, 2015

@dstore-dbap checkup positively your signature, thanks for the patience. Nowadays corporate licenses are more tricky to check for us fast enough as individual ones.

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jul 27, 2015

Contributor

@dstore-dbap sorry to keep bothering you, I've been all morning trying to run the test/debugging your PR with a real webhdfs server, I might be doing something wrong, because when I run the test I got a lot of errors. Basically the test failing to flush events, can not create the test files, etc.

Is there any special configuration to take into account while setting up hadoop? and the webhdfs part? please feel free to educate me. Happy to take a look at the test and work them out, but would like before to be sure the hdfs setup I have is a good one 😄 !

Contributor

purbon commented Jul 27, 2015

@dstore-dbap sorry to keep bothering you, I've been all morning trying to run the test/debugging your PR with a real webhdfs server, I might be doing something wrong, because when I run the test I got a lot of errors. Basically the test failing to flush events, can not create the test files, etc.

Is there any special configuration to take into account while setting up hadoop? and the webhdfs part? please feel free to educate me. Happy to take a look at the test and work them out, but would like before to be sure the hdfs setup I have is a good one 😄 !

@dstore-dbap

This comment has been minimized.

Show comment
Hide comment
@dstore-dbap

dstore-dbap Jul 27, 2015

Collaborator

@purbon No problem! To test, if webhdfs is basically functional, you could fire up an irb console on the machine the tests should run on and try the following:

require 'webhdfs'
client = WebHDFS::Client.new("ipaddress of server", port_number, "hdfs_user")
client.create('/user/hdfs_user/test.log', 'test')
client.delete('/user/hdfs_user/test.log')

port_number normally is 50070
A default installation of hdfs on CentOS normally has a hdfs user "hadoop" preconfigured.
Maybe the hdfs user is missing? Maybe you could try this to see if the setup has a problem or if the tests are buggy...

Collaborator

dstore-dbap commented Jul 27, 2015

@purbon No problem! To test, if webhdfs is basically functional, you could fire up an irb console on the machine the tests should run on and try the following:

require 'webhdfs'
client = WebHDFS::Client.new("ipaddress of server", port_number, "hdfs_user")
client.create('/user/hdfs_user/test.log', 'test')
client.delete('/user/hdfs_user/test.log')

port_number normally is 50070
A default installation of hdfs on CentOS normally has a hdfs user "hadoop" preconfigured.
Maybe the hdfs user is missing? Maybe you could try this to see if the setup has a problem or if the tests are buggy...

@suyograo

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
require "logstash/outputs/base"
require "stud/buffer"
# Summary: Plugin to send logstash events to to files in HDFS via webhdfs

This comment has been minimized.

@suyograo

suyograo Jul 28, 2015

Contributor

Typo: 2 to

@suyograo

suyograo Jul 28, 2015

Contributor

Typo: 2 to

@suyograo

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
require "stud/buffer"
# Summary: Plugin to send logstash events to to files in HDFS via webhdfs
# restapi.

This comment has been minimized.

@suyograo

suyograo Jul 28, 2015

Contributor

can you capitalize to REST API?

@suyograo

suyograo Jul 28, 2015

Contributor

can you capitalize to REST API?

@suyograo

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
# make sure, that the hostname of your namenode is resolvable on the host running logstash. When creating/appending
# to a file, webhdfs somtime sends a 307 TEMPORARY_REDIRECT with the HOSTNAME of the machine its running on.
#
# USAGE:

This comment has been minimized.

@suyograo

suyograo Jul 28, 2015

Contributor

We use asciidoc (guide here) for rendering docs, can you make this ==== Usage

@suyograo

suyograo Jul 28, 2015

Contributor

We use asciidoc (guide here) for rendering docs, can you make this ==== Usage

@suyograo

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
# }
# ----------------------------------
#
# Author: Bjoern Puttmann <b.puttmann@dbap.de> - dbap GmbH, Münster, Germany.

This comment has been minimized.

@suyograo

suyograo Jul 28, 2015

Contributor

Is it possible to remove this line since you are named as a contributor already in the gemspec?

@suyograo

suyograo Jul 28, 2015

Contributor

Is it possible to remove this line since you are named as a contributor already in the gemspec?

@suyograo

View changes

Show outdated Hide outdated lib/logstash/outputs/webhdfs.rb
include Stud::Buffer
config_name "webhdfs"
milestone 1

This comment has been minimized.

@suyograo

suyograo Jul 28, 2015

Contributor

We don't use milestones anymore, just versions in gemspecs. Can you remove this?

@suyograo

suyograo Jul 28, 2015

Contributor

We don't use milestones anymore, just versions in gemspecs. Can you remove this?

# The path to the file to write to. Event fields can be used here,
# as well as date fields in the joda time format, e.g.:
# ....

This comment has been minimized.

@suyograo

suyograo Jul 28, 2015

Contributor

Can you use in-place code highlighting for this? like: /user/logstash/dt=%{+YYYY-MM-dd}/%{@source_host}-%{+HH}.log. Backticks work here

@suyograo

suyograo Jul 28, 2015

Contributor

Can you use in-place code highlighting for this? like: /user/logstash/dt=%{+YYYY-MM-dd}/%{@source_host}-%{+HH}.log. Backticks work here

@suyograo

View changes

Show outdated Hide outdated logstash-output-webhdfs.gemspec
Gem::Specification.new do |s|
s.name = 'logstash-output-webhdfs'
s.version = '0.0.1'

This comment has been minimized.

@suyograo

suyograo Jul 28, 2015

Contributor

can you make this 0.1.0?

@suyograo

suyograo Jul 28, 2015

Contributor

can you make this 0.1.0?

@dstore-dbap

This comment has been minimized.

Show comment
Hide comment
@dstore-dbap

dstore-dbap Jul 28, 2015

Collaborator

@suyograo Thanks for the feedback and corrections!

Collaborator

dstore-dbap commented Jul 28, 2015

@suyograo Thanks for the feedback and corrections!

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jul 28, 2015

Contributor

@dstore-dbap about my issues yesterday with checking this PR,
dns

interesting how the create request does two, one with localhost and another with the name of the virtual machine, jajajja ...computers!! moving forward on running the tests 😄

Contributor

purbon commented Jul 28, 2015

@dstore-dbap about my issues yesterday with checking this PR,
dns

interesting how the create request does two, one with localhost and another with the name of the virtual machine, jajajja ...computers!! moving forward on running the tests 😄

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jul 28, 2015

Contributor

LGTM

Contributor

purbon commented Jul 28, 2015

LGTM

purbon added a commit that referenced this pull request Jul 28, 2015

Merge pull request #1 from dstore-dbap/master
First webhdfs output plugin merge.

@purbon purbon merged commit 4be9fed into logstash-plugins:master Jul 28, 2015

@purbon

This comment has been minimized.

Show comment
Hide comment
@purbon

purbon Jul 28, 2015

Contributor

@dstore-dbap just merge this now, including some small refactoring (to be more rspec friendly). Happy about it, now we can for sure iterate it, to for example add more nice unit test.

Contributor

purbon commented Jul 28, 2015

@dstore-dbap just merge this now, including some small refactoring (to be more rspec friendly). Happy about it, now we can for sure iterate it, to for example add more nice unit test.

@dstore-dbap

This comment has been minimized.

Show comment
Hide comment
@dstore-dbap

dstore-dbap Jul 29, 2015

Collaborator

@purbon You're the man ;) Thanks for merging and once again, thanks for the friendly atmosphere here ;)

Collaborator

dstore-dbap commented Jul 29, 2015

@purbon You're the man ;) Thanks for merging and once again, thanks for the friendly atmosphere here ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment