-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to aws-sdk v2 to get feature improvements #25
Conversation
Will try to test asap, with the elastic{on} I am a bit behind. |
Any estimate on when this will be merged? |
Just fyi, I've been using this code + the extras for threading for our cloudtrail and other logs over the past week. Seems to be chugging along nicely. If interested: https://github.com/DanielRedOak/logstash-input-s3/tree/multithread Looking forward to hopefully getting this merged, then submitting a PR for the threaded version! |
+1 |
Any progress on merging this? |
With 1.5 release just around the corner I am looking at this :) |
+1 (will this be merged? it works perfectly here and makes me very very happy!) |
@DanielRedOak Look great, lets get this thing in :) Would you mind rebasing your PR? Also can you test it with logstash-plugins/logstash-mixin-aws#13 ? This PR add the aws-sdk v2 api and remain compatible with plugins that use v1. Small changes are necessary in your PR to make it work with it:
|
@DanielRedOak Are you gonna wrap this up? We need it badly! |
+1, would be great to have it merged! |
@ph yea, I really want to finish this up. Been busy with a new job so its definitely been on the back burner. I'll try for this week or early next. |
@DanielRedOak I'll give a hand I've been busy with logstash-core issues :( |
For real this time, I have it checked out and will work on it! Shouldn't take much work, I just have to squeeze it in |
any update on this? |
Jenkins standing by to test this. If you aren't a maintainer, you can ignore this comment. Someone with commit access, please review this and clear it for Jenkins to run; then say 'jenkins, test it'. |
I've been working with the aws sdk again for some work related items, so it's turned out to be a good time to finish this up for folks. I'll work through the needed changes then rebase for the PR. Sorry I've been slacking. |
@DanielRedOak This bug is killing my ELK project on different levels. I'm seeing the same issues in Logstash2.0 even. |
what is this waiting for? |
pwease? I have a similar issue. |
@smashew if your interested I rebased @DanielRedOak https://github.com/DanielRedOak/logstash-input-s3/tree/multithread changes for aws-sdk v2 and his multithreading changes onto logstash-input-s3 HEAD onto https://github.com/sstarcher/logstash-input-s3/ I have been running it for a few days now. It is performing much much better. I can't seem to get it to run more than on 3 cores though so it's still limited. Although that may be a user error. |
Omg @sstarcher, thank you! Is there any special compiling required for this I'm happy to test!
|
I ran everything inside of a docker container. The following should install everything for you
|
@sstarcher @DanielRedOak I've finally had a chance to test this (S3 bucket with 105000+ items) and it appears to have worked. Churned through the whole bucket in a matter of hours, where it never finished before. Setup is fresh install of LS2.1.1, and ES2.1.1. Its worth noting that logstash crashed overnight, but I have not looked into exactly why that was. |
Worked on this some more and was rebasing with current master. Unfortunately I am seeing issues now with spec tests reading in 0 line files so I need to troubleshoot more. I'm going to be on vacation for a bit, so I will follow up after! |
Think this should be good now, rebasing with a year of changes was a bit of fun :) If there is still interest, I can also submit another PR for multithreading when this one is merged in or alternatively merge the changes here since people seem to be using it. Nothing much changes other than adding ruby threads and plopping the files list onto the work_q. |
@DanielRedOak I'm certainly interested in the multithreading, but you may want to have that in another PR as to not slow this one down ;) |
@DanielRedOak I ran the tests and they fail. Can you please take a look:
|
Tests on master run successfully, no failures. |
Yea, I mentioned above that they fail probably due to issues stubbing the
|
Thanks @DanielRedOak would love to get this merged and new version published asap :) |
Thanks @DanielRedOak |
Hello. I was looking at what is required to get the tests from this pull request working. One issue is that the "credentials" config setting in this plugin has a conflict with a private method called "credentials" in the logstash-mixin-aws gem. If this is fixed, my fork of this repo https://github.com/RichardN/logstash-input-s3 has tests that pass (it also fully makes use of the fixed logstash-mixin-aws (I've created a pull request for the mixin - logstash-plugins/logstash-mixin-aws#20) |
+1 for the merge |
Is this going to get merged any time soon? We were about to put Logstash in production when we found it just can't handle our volume. |
Current state of this was that the PR had tests failing, so it couldn't be merged, and obviously by now has again diverged from master :( |
any update? |
I'll see if I can pull things in again and rebase to get everything aligned and include fixes for the spec tests so we can hopefully close this out. Unfortunately since we don't use Logstash at my current job I don't get any time during the workdays to get things cleaned up. I'll shoot to get something done and merged in again by next week for the sdk upgrade. |
+1 |
# Conflicts: # lib/logstash/inputs/s3.rb # logstash-input-s3.gemspec # spec/inputs/s3_spec.rb Corrected rspec tests.
Should be good to go now. Hopefully it will be merged before we pile up some more conflicts :) |
@DanielRedOak thanks for your efforts here. Will merge this. |
dumb q: to get these changes should I be cloning master and installing it (as in sstarcher's instructions above) or will basic logstash-plugin install work? |
@emckean looks like the changes are now in release 3.1.1 |
Upgraded spec tests + code to use aws-sdk v2. This then improves processing by a large margin as v1 repeated api calls for every object when .last_modified was used. v2 requires only a single call for every 1000 objects so on a bucket with 1000 objects, api calls are down to 1 from 1001 just for the last_modified and bucket.objects calls. Pagination is included in the sdk as well so this plugin now supports buckets with over 1000 objects.
Additionally added a debug message to show the length of objects[] as things are being added.
There was also a bug where the 'message' field was assumed present and used for checks to determine if the 'event_is_metadata'. Since this is not a required field per https://logstash.jira.com/browse/LOGSTASH-675 we should check for nil before assuming its present. This manifested itself specifically when using the cloudtrail codec as there is no message included in resulting events.