Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to aws-sdk v2 to get feature improvements #25

Merged
merged 3 commits into from
Jun 24, 2016

Conversation

DanielRedOak
Copy link
Contributor

Upgraded spec tests + code to use aws-sdk v2. This then improves processing by a large margin as v1 repeated api calls for every object when .last_modified was used. v2 requires only a single call for every 1000 objects so on a bucket with 1000 objects, api calls are down to 1 from 1001 just for the last_modified and bucket.objects calls. Pagination is included in the sdk as well so this plugin now supports buckets with over 1000 objects.

Additionally added a debug message to show the length of objects[] as things are being added.

There was also a bug where the 'message' field was assumed present and used for checks to determine if the 'event_is_metadata'. Since this is not a required field per https://logstash.jira.com/browse/LOGSTASH-675 we should check for nil before assuming its present. This manifested itself specifically when using the cloudtrail codec as there is no message included in resulting events.

This was referenced Mar 11, 2015
@ph
Copy link
Contributor

ph commented Mar 16, 2015

Will try to test asap, with the elastic{on} I am a bit behind.

@shaharmor
Copy link

Any estimate on when this will be merged?

@DanielRedOak
Copy link
Contributor Author

Just fyi, I've been using this code + the extras for threading for our cloudtrail and other logs over the past week. Seems to be chugging along nicely. If interested: https://github.com/DanielRedOak/logstash-input-s3/tree/multithread

Looking forward to hopefully getting this merged, then submitting a PR for the threaded version!

@rabidscorpio
Copy link

+1

@netoneko
Copy link

Any progress on merging this?

@ph
Copy link
Contributor

ph commented May 12, 2015

With 1.5 release just around the corner I am looking at this :)
@DanielRedOak Awesome work!

@sgzijl
Copy link

sgzijl commented Jun 10, 2015

+1 (will this be merged? it works perfectly here and makes me very very happy!)

@ph
Copy link
Contributor

ph commented Jun 11, 2015

@DanielRedOak Look great, lets get this thing in :)

Would you mind rebasing your PR? Also can you test it with logstash-plugins/logstash-mixin-aws#13 ? This PR add the aws-sdk v2 api and remain compatible with plugins that use v1. Small changes are necessary in your PR to make it work with it:

  • remove the dependencies from your gemspec.
  • remove the aws_service_endpoint method.
  • include the LogStash::PluginMixins::AwsConfig::V2 instead of LogStash::PluginMixins::AwsConfig

@nikolay
Copy link

nikolay commented Jul 22, 2015

@DanielRedOak Are you gonna wrap this up? We need it badly!

@xelibrion
Copy link

+1, would be great to have it merged!

@DanielRedOak
Copy link
Contributor Author

@ph yea, I really want to finish this up. Been busy with a new job so its definitely been on the back burner. I'll try for this week or early next.

@ph
Copy link
Contributor

ph commented Aug 27, 2015

@DanielRedOak I'll give a hand I've been busy with logstash-core issues :(

@DanielRedOak
Copy link
Contributor Author

For real this time, I have it checked out and will work on it! Shouldn't take much work, I just have to squeeze it in

@jmccarty3
Copy link

any update on this?

@elasticsearch-release
Copy link

Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'.

@DanielRedOak
Copy link
Contributor Author

I've been working with the aws sdk again for some work related items, so it's turned out to be a good time to finish this up for folks. I'll work through the needed changes then rebase for the PR. Sorry I've been slacking.

@KptnKMan
Copy link

KptnKMan commented Nov 6, 2015

@DanielRedOak This bug is killing my ELK project on different levels. I'm seeing the same issues in Logstash2.0 even.
Can anyone provide any indication of when this can be implemented, or if there is some method I can test your WIP v2 implementation?

@sstarcher
Copy link

what is this waiting for?

@smashew
Copy link

smashew commented Dec 17, 2015

pwease? I have a similar issue.

@sstarcher
Copy link

@smashew if your interested I rebased @DanielRedOak https://github.com/DanielRedOak/logstash-input-s3/tree/multithread changes for aws-sdk v2 and his multithreading changes onto logstash-input-s3 HEAD onto https://github.com/sstarcher/logstash-input-s3/

I have been running it for a few days now. It is performing much much better. I can't seem to get it to run more than on 3 cores though so it's still limited. Although that may be a user error.

@KptnKMan
Copy link

Omg @sstarcher, thank you! Is there any special compiling required for this
or can I just clone the repository and use the files?

I'm happy to test!
On 17 Dec 2015 14:33, "Shane Starcher" notifications@github.com wrote:

@smashew https://github.com/smashew if your interested I rebased
@DanielRedOak https://github.com/DanielRedOak
https://github.com/DanielRedOak/logstash-input-s3/tree/multithread
changes for aws-sdk v2 and his multithreading changes onto
logstash-input-s3 HEAD onto
https://github.com/sstarcher/logstash-input-s3/

I have been running it for a few days now. It is performing much much
better. I can't seem to get it to run more than on 3 cores though so it's
still limited. Although that may be a user error.


Reply to this email directly or view it on GitHub
#25 (comment)
.

@sstarcher
Copy link

I ran everything inside of a docker container. The following should install everything for you

    curl -SL https://github.com/sstarcher/logstash-input-s3/archive/master.tar.gz \
    | tar xzC / &&\
    cd logstash-input-s3-master/ &&\
    gem build logstash-input-s3.gemspec &&\
    /opt/logstash/bin/plugin install /logstash-input-s3-master/logstash-input-s3-*.gem &&\
    rm -rf /logstash-input-s3-master

@KptnKMan
Copy link

KptnKMan commented Jan 5, 2016

@sstarcher @DanielRedOak I've finally had a chance to test this (S3 bucket with 105000+ items) and it appears to have worked. Churned through the whole bucket in a matter of hours, where it never finished before.

Setup is fresh install of LS2.1.1, and ES2.1.1.
Plugins installed: logstash-codec-cloudtrail
No other modifications.

Its worth noting that logstash crashed overnight, but I have not looked into exactly why that was.

@DanielRedOak
Copy link
Contributor Author

Worked on this some more and was rebasing with current master. Unfortunately I am seeing issues now with spec tests reading in 0 line files so I need to troubleshoot more. I'm going to be on vacation for a bit, so I will follow up after!

@DanielRedOak
Copy link
Contributor Author

Think this should be good now, rebasing with a year of changes was a bit of fun :)

If there is still interest, I can also submit another PR for multithreading when this one is merged in or alternatively merge the changes here since people seem to be using it. Nothing much changes other than adding ruby threads and plopping the files list onto the work_q.

@sstarcher
Copy link

@DanielRedOak I'm certainly interested in the multithreading, but you may want to have that in another PR as to not slow this one down ;)

@suyograo
Copy link
Contributor

suyograo commented Mar 3, 2016

@DanielRedOak I ran the tests and they fail. Can you please take a look:

suyog@machine:~/ws/elastic/ls_plugins/logstash-input-s3 (pr/25)$ bundle exec rspec
....

Failed examples:

rspec ./spec/inputs/s3_spec.rb:66 # LogStash::Inputs::S3#get_s3object with deprecated credentials option should instantiate AWS::S3 clients with a proxy set
rspec ./spec/inputs/s3_spec.rb:88 # LogStash::Inputs::S3#get_s3object with modern access key options should instantiate AWS::S3 clients with a proxy set
rspec ./spec/inputs/s3_spec.rb:245 # LogStash::Inputs::S3 when working with logs when event doesn't have a `message` field should process events
rspec ./spec/inputs/s3_spec.rb:245 # LogStash::Inputs::S3 when working with logs plain text should process events
rspec ./spec/inputs/s3_spec.rb:245 # LogStash::Inputs::S3 when working with logs encoded should process events
rspec ./spec/inputs/s3_spec.rb:245 # LogStash::Inputs::S3 when working with logs compressed should process events
rspec ./spec/inputs/s3_spec.rb:250 # LogStash::Inputs::S3 when working with logs compressed deletes the temporary file
rspec ./spec/inputs/s3_spec.rb:245 # LogStash::Inputs::S3 when working with logs cloudfront should process events

@suyograo
Copy link
Contributor

suyograo commented Mar 3, 2016

Tests on master run successfully, no failures.

@suyograo suyograo self-assigned this Mar 3, 2016
@DanielRedOak
Copy link
Contributor Author

Yea, I mentioned above that they fail probably due to issues stubbing the
get on the object. @ph wanted me to rebase anyways since many people have
used it successfully. I'll still give it another look when I get some more
time as it bothers me :)
On Mar 3, 2016 5:58 PM, "Suyog Rao" notifications@github.com wrote:

@DanielRedOak https://github.com/DanielRedOak I ran the tests and they
fail. Can you please take a look:

suyog@machine:~/ws/elastic/ls_plugins/logstash-input-s3 (pr/25)$ bundle exec rspec
....

Failed examples:

rspec ./spec/inputs/s3_spec.rb:66 # LogStash::Inputs::S3#get_s3object with deprecated credentials option should instantiate AWS::S3 clients with a proxy set
rspec ./spec/inputs/s3_spec.rb:88 # LogStash::Inputs::S3#get_s3object with modern access key options should instantiate AWS::S3 clients with a proxy set
rspec ./spec/inputs/s3_spec.rb:245 # LogStash::Inputs::S3 when working with logs when event doesn't have a message field should process events
rspec ./spec/inputs/s3_spec.rb:245 # LogStash::Inputs::S3 when working with logs plain text should process events
rspec ./spec/inputs/s3_spec.rb:245 # LogStash::Inputs::S3 when working with logs encoded should process events
rspec ./spec/inputs/s3_spec.rb:245 # LogStash::Inputs::S3 when working with logs compressed should process events
rspec ./spec/inputs/s3_spec.rb:250 # LogStash::Inputs::S3 when working with logs compressed deletes the temporary file
rspec ./spec/inputs/s3_spec.rb:245 # LogStash::Inputs::S3 when working with logs cloudfront should process events


Reply to this email directly or view it on GitHub
#25 (comment)
.

@suyograo
Copy link
Contributor

suyograo commented Mar 4, 2016

Thanks @DanielRedOak would love to get this merged and new version published asap :)

@ph
Copy link
Contributor

ph commented Mar 4, 2016

Thanks @DanielRedOak

@RichardN
Copy link

Hello. I was looking at what is required to get the tests from this pull request working. One issue is that the "credentials" config setting in this plugin has a conflict with a private method called "credentials" in the logstash-mixin-aws gem. If this is fixed, my fork of this repo https://github.com/RichardN/logstash-input-s3 has tests that pass (it also fully makes use of the fixed logstash-mixin-aws (I've created a pull request for the mixin - logstash-plugins/logstash-mixin-aws#20)

@PhaedrusTheGreek
Copy link

+1 for the merge

@ryantanner
Copy link

Is this going to get merged any time soon? We were about to put Logstash in production when we found it just can't handle our volume.

@jsvd
Copy link
Member

jsvd commented Apr 26, 2016

Current state of this was that the PR had tests failing, so it couldn't be merged, and obviously by now has again diverged from master :(

@0x4D31
Copy link

0x4D31 commented Jun 8, 2016

any update?
I tested @sstarcher branch for a few hours, and it seems it works great!

@DanielRedOak
Copy link
Contributor Author

I'll see if I can pull things in again and rebase to get everything aligned and include fixes for the spec tests so we can hopefully close this out. Unfortunately since we don't use Logstash at my current job I don't get any time during the workdays to get things cleaned up. I'll shoot to get something done and merged in again by next week for the sdk upgrade.

@energycoaching
Copy link

+1

# Conflicts:
#	lib/logstash/inputs/s3.rb
#	logstash-input-s3.gemspec
#	spec/inputs/s3_spec.rb

Corrected rspec tests.
@DanielRedOak
Copy link
Contributor Author

Should be good to go now. Hopefully it will be merged before we pile up some more conflicts :)

@suyograo
Copy link
Contributor

@DanielRedOak thanks for your efforts here. Will merge this.

@suyograo suyograo merged commit 9597ada into logstash-plugins:master Jun 24, 2016
@emckean
Copy link

emckean commented Jun 25, 2016

dumb q: to get these changes should I be cloning master and installing it (as in sstarcher's instructions above) or will basic logstash-plugin install work?

@sstarcher
Copy link

@emckean looks like the changes are now in release 3.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet