New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FOG-based S3 input plugin #786

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@20goto10

20goto10 commented Nov 19, 2013

This is essentially a clone of LogStash's AWS S3 input plugin, but it has been modified to use the FOG library instead of AWS-SDK, which in my case was dying of mysterious timeout exceptions.

It requires the fog and unf gems.

Sample input section:
input {
s3fog {
bucket => 's3-bucket-name'
credentials => "credential_file.conf" # see S3 plugin for more info
region => 'us-east-1'
region_endpoint => 'us-east-1'
backup_to_bucket => 's3-backup-bucket-name'
backup_endpoint => 'us-east-1'
prefix => 'foo' # logfile path/prefix
interval => 60
delete => false
debug => false
}
}

@mveitas

This comment has been minimized.

Show comment
Hide comment
@mveitas

mveitas Nov 19, 2013

Take a look at #753 which contains some modifications to how credentials are collected using the AwsConfigMixin to make them common across all AWS components in logstash

mveitas commented Nov 19, 2013

Take a look at #753 which contains some modifications to how credentials are collected using the AwsConfigMixin to make them common across all AWS components in logstash

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Nov 20, 2013

Contributor

Thanks for helping improve logstash!

I'm hoping a small bug in the current s3 plugin doesn't require a totally new plugin just to work around a bug in another library. Let's fix that bug instead of adding a duplicate plugin! What do you think? :)

Contributor

jordansissel commented Nov 20, 2013

Thanks for helping improve logstash!

I'm hoping a small bug in the current s3 plugin doesn't require a totally new plugin just to work around a bug in another library. Let's fix that bug instead of adding a duplicate plugin! What do you think? :)

@mveitas

This comment has been minimized.

Show comment
Hide comment
@mveitas

mveitas Nov 20, 2013

Unless we know for sure it is the AWS-SDK that is the issue, then it would not be worth introducing an additional library and new plugin. Could you write a small sample script to isolate and repeat the issue with the AWS-SDK?

mveitas commented Nov 20, 2013

Unless we know for sure it is the AWS-SDK that is the issue, then it would not be worth introducing an additional library and new plugin. Could you write a small sample script to isolate and repeat the issue with the AWS-SDK?

@20goto10

This comment has been minimized.

Show comment
Hide comment
@20goto10

20goto10 Nov 20, 2013

I never figured out what was causing my timeouts after much effort. I tried out PR #753 and I get hanging instead of actual timeout exceptions. Either way it doesn't work for me. I wouldn't be surprised if the problem is unrelated to LogStash altogether.

I think it would make sense to change this into a general Fog library plugin for handling all kinds of cloud storage. (I can't currently test anything besides AWS S3, however.)

20goto10 commented Nov 20, 2013

I never figured out what was causing my timeouts after much effort. I tried out PR #753 and I get hanging instead of actual timeout exceptions. Either way it doesn't work for me. I wouldn't be surprised if the problem is unrelated to LogStash altogether.

I think it would make sense to change this into a general Fog library plugin for handling all kinds of cloud storage. (I can't currently test anything besides AWS S3, however.)

@20goto10

This comment has been minimized.

Show comment
Hide comment
@20goto10

20goto10 Nov 20, 2013

Anyone who has the original s3 plugin working should verify that it works on buckets containing more than 1000 files. I didn't see any code for handling that, but AWS only returns up to 1000 keys at once.

(http://docs.aws.amazon.com/AmazonS3/latest/dev/ReturningAdditionalObjectVersionsAfterExceedingMaxKeys.html)

20goto10 commented Nov 20, 2013

Anyone who has the original s3 plugin working should verify that it works on buckets containing more than 1000 files. I didn't see any code for handling that, but AWS only returns up to 1000 keys at once.

(http://docs.aws.amazon.com/AmazonS3/latest/dev/ReturningAdditionalObjectVersionsAfterExceedingMaxKeys.html)

@piavlo

This comment has been minimized.

Show comment
Hide comment
@piavlo

piavlo Nov 20, 2013

Contributor

I think the best would be to merge all these input storage could plugins like #772 into one base input storage could plugin
this base plugin can use fog as it supports all the mostly used cloud providers

Contributor

piavlo commented Nov 20, 2013

I think the best would be to merge all these input storage could plugins like #772 into one base input storage could plugin
this base plugin can use fog as it supports all the mostly used cloud providers

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Nov 22, 2013

Contributor

@piavlo The problem with a single base storage plugin is that so many storage APIs have different primitives. Some are objects, some are files, etc. I'm not sure it's best to abstract a single interface (as a 'general cloud storage' plugin) because you'd lose much of the features for each respective provider.

Contributor

jordansissel commented Nov 22, 2013

@piavlo The problem with a single base storage plugin is that so many storage APIs have different primitives. Some are objects, some are files, etc. I'm not sure it's best to abstract a single interface (as a 'general cloud storage' plugin) because you'd lose much of the features for each respective provider.

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Nov 22, 2013

Contributor

I'm looking at the discussion here as a bug discussion, fwiw. We should identify the problem with the current plugin and fix them. If there are bugs in the upstream library, then we can fix those, too :)

Contributor

jordansissel commented Nov 22, 2013

I'm looking at the discussion here as a bug discussion, fwiw. We should identify the problem with the current plugin and fix them. If there are bugs in the upstream library, then we can fix those, too :)

@20goto10

This comment has been minimized.

Show comment
Hide comment
@20goto10

20goto10 Nov 22, 2013

I think it would be possible to make a pretty abstract cloud storage plugin provided it didn't get too bogged down with the details of each one. In other words we could trade some of the specific features/advantages of each provider for the convenience of configuration. Or at least start it that way. That's what Fog does already, after all. LogStash inputs should really just be concerned with reading the relevant files, anything else is besides the point. I'm not yet familiar enough with Fog to say it provides enough abstraction to make this simple, but I would assume that's why it exists...

I'm not sure I can provide a reproduction scenario for the bug I'm getting in aws-sdk. I do think the trouble is probably upstream from LogStash itself.

20goto10 commented Nov 22, 2013

I think it would be possible to make a pretty abstract cloud storage plugin provided it didn't get too bogged down with the details of each one. In other words we could trade some of the specific features/advantages of each provider for the convenience of configuration. Or at least start it that way. That's what Fog does already, after all. LogStash inputs should really just be concerned with reading the relevant files, anything else is besides the point. I'm not yet familiar enough with Fog to say it provides enough abstraction to make this simple, but I would assume that's why it exists...

I'm not sure I can provide a reproduction scenario for the bug I'm getting in aws-sdk. I do think the trouble is probably upstream from LogStash itself.

@jordansissel jordansissel added the O(2) label Aug 19, 2014

@elasticsearch-release

This comment has been minimized.

Show comment
Hide comment
@elasticsearch-release

elasticsearch-release Aug 26, 2014

Can one of the admins verify this patch?

elasticsearch-release commented Aug 26, 2014

Can one of the admins verify this patch?

@ph

This comment has been minimized.

Show comment
Hide comment
@ph

ph Oct 28, 2014

Member

@jordansissel Do you want to close this PR too? Btw I have checked if we have an issues with the aws-sdk gem we are using and the 1000 files limit (see #786 (comment)) and its not the case.

Member

ph commented Oct 28, 2014

@jordansissel Do you want to close this PR too? Btw I have checked if we have an issues with the aws-sdk gem we are using and the 1000 files limit (see #786 (comment)) and its not the case.

@jordansissel

This comment has been minimized.

Show comment
Hide comment
@jordansissel

jordansissel Oct 28, 2014

Contributor

Yeah. I don't want to confuse users with multiple plugins that can do the same thing over different implementations. We've learned so far that having multiple protocols to connect to Elasticsearch is confusing, and I don't want to spread that to other plugins.

If there's a bug in theS3 plugin or its implementation/libraries (aws-sdk, etc), we should fix those.

Contributor

jordansissel commented Oct 28, 2014

Yeah. I don't want to confuse users with multiple plugins that can do the same thing over different implementations. We've learned so far that having multiple protocols to connect to Elasticsearch is confusing, and I don't want to spread that to other plugins.

If there's a bug in theS3 plugin or its implementation/libraries (aws-sdk, etc), we should fix those.

@lexelby

This comment has been minimized.

Show comment
Hide comment
@lexelby

lexelby Mar 9, 2015

Likely the original reason that the OP got timeouts was that their network setup requires the use of an HTTP proxy. The s3 input doesn't have any way to specify an HTTP proxy, and it ignores the standard http[s]_proxy environment variables. Fog might have magically picked up and used the proxy environment variables.

I, on the other hand, am considering using this input because of the issue outlined here: logstash-plugins/logstash-input-s3#14 (comment)

That issue makes the s3 input essentially unusable for my use case, which is actually the same as the OP's: retrieving cloudwatch logs.

lexelby commented Mar 9, 2015

Likely the original reason that the OP got timeouts was that their network setup requires the use of an HTTP proxy. The s3 input doesn't have any way to specify an HTTP proxy, and it ignores the standard http[s]_proxy environment variables. Fog might have magically picked up and used the proxy environment variables.

I, on the other hand, am considering using this input because of the issue outlined here: logstash-plugins/logstash-input-s3#14 (comment)

That issue makes the s3 input essentially unusable for my use case, which is actually the same as the OP's: retrieving cloudwatch logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment