Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Output Enhancements: Track S3 feature requests here #2700

Closed
PettitWesley opened this issue Oct 19, 2020 · 42 comments
Closed

S3 Output Enhancements: Track S3 feature requests here #2700

PettitWesley opened this issue Oct 19, 2020 · 42 comments
Assignees
Labels
AWS Issues with AWS plugins or experienced by users running on AWS enhancement

Comments

@PettitWesley
Copy link
Contributor

S3 Support was released in 1.6, however, there are a bunch of outstanding requests for improvements in the original ticket: #1004

Please comment with new S3 feature requests here.

@PettitWesley
Copy link
Contributor Author

@elrob 's requests: #1004 (comment)

  • gzip compression support (I know this is mentioned above but this is currently a blocker for migrating to using fluent-bit with S3 for me so I want to express my desire for this)
  • make the automatically added object key suffix position configurable so it is possible to have the key end with (e.g. -objectqZ7jv9Qt.jsonl )
  • make it possible to disable the date injection into the output JSON. I can't seem to disable it although I've tried and even tried to work out how this might be disabled by hunting through the source code.

My response: #1004 (comment)

make the automatically added object key suffix position configurable so it is possible to have the key end with (e.g. -objectqZ7jv9Qt.jsonl )

For this one, I'm considering adding another special format string in the s3 key, $UUID (or may be $RANDOM), which will give you some number of random characters. If you enable use_put_object then having $UUID in the S3 key would be required.

That's not a perfect solution though...

The PutObject API is called under two circumstances:

  1. Normal uploads when you explicitly enable it with use_put_object.
  2. When Fluent Bit is stopped/restarted and there is leftover data to send.

In both cases I want to force some sort of UUID interpolation to ensure the key is unique. I suppose one thing I could do is split the S3 Key on . and then add the UUID before the last piece (if there were dots in the key). That way if you have an S3 key in the form of something.extension the UUID will come before the extension.

Another option would just be to include the $UUID special format string and require that it is always used.

Thoughts?

@PettitWesley
Copy link
Contributor Author

@shailegu requests pre-signed URLs: #1004 (comment)

I am very doubtful on the use case though; I think the pre-signed URLs are one time use only- it does not really fit a project like Fluent Bit that is meant to be continually uploading data.

@PettitWesley
Copy link
Contributor Author

Supporting parquet as an output format was requested as well: #1004 (comment)

@elrob
Copy link

elrob commented Oct 19, 2020

@PettitWesley Thank you. I think adding the UUID part before the last . is a reasonable solution if it is documented so it doesn't surprise people. Or ideally can be toggled. Alternately, having an e.g. $UUID part within the key format and making it mandatory would work fine for me. Probably this is the most flexible solution with the least surprises.

@bksteiny
Copy link

Hi @PettitWesley I came across an issue when configuring S3 Output to use an Object Lock enabled S3 bucket. Would it be possible to include the Content-MD5 header with requests?

From the AWS Object Lock doc...

If you configure a default retention period on a bucket, requests to upload objects in such a bucket must include the Content-MD5 header

The logging provided by Fluent Bit supports the documentation:

[2020/10/27 18:46:22] [debug] [output:s3:s3.0] Running upload timer callback..
[2020/10/27 18:46:22] [debug] [aws_credentials] Requesting credentials from the env provider..
[2020/10/27 18:46:23] [debug] [http_client] server s3.us-west-2.amazonaws.com:443 will close connection #37
[2020/10/27 18:46:23] [debug] [aws_client] s3.us-west-2.amazonaws.com: http_do=0, HTTP Status: 400
[2020/10/27 18:46:23] [debug] [aws_client] Unable to parse API response- response is notnot valid JSON.
[2020/10/27 18:46:23] [debug] [output:s3:s3.0] PutObject http status=400
[2020/10/27 18:46:23] [error] [output:s3:s3.0] PutObject API responded with error='InvalidRequest', message='Content-MD5 HTTP header is required for Put Object requests with Object Lock parameters'
[2020/10/27 18:46:23] [error] [output:s3:s3.0] Raw PutObject response: HTTP/1.1 400 Bad Request
x-amz-request-id: xxx
x-amz-id-2: xxx
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Tue, 27 Oct 2020 18:46:22 GMT
Connection: close
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>InvalidRequest</Code><Message>Content-MD5 HTTP header is required for Put Object requests with Object Lock parameters</Message><RequestId>xxx</RequestId><HostId>xxx</HostId></Error>
[2020/10/27 18:46:23] [error] [output:s3:s3.0] PutObject request failed
[2020/10/27 18:46:23] [error] [output:s3:s3.0] Could not send chunk with tag syslog.0

Thanks!

@lifttocode
Copy link

  • gzip compression support (I know this is mentioned above but this is currently a blocker for migrating to using fluent-bit with S3 for me so I want to express my desire for this)

@PettitWesley Can we expect the gzip compression support for the S3 output plugin to be added anytime soon? It's the only impediment for our team migrating to using fluent-bit with S3

@diranged
Copy link

@PettitWesley Per our discussion on Slack - it's pretty important that the S3 plugin be able to set the ACL on the file its uploading to S3. Without that, you cannot do cross-account writing safely even with the https://docs.aws.amazon.com/AmazonS3/latest/user-guide/add-object-ownership.html feature. At a minimum, there should be a canned ACL usage of the default "bucket-owner-full-control" policy. Better would be for us to be able to configure the ACL applied to the files. I think this should be a pretty simple change overall.

@PettitWesley PettitWesley self-assigned this Nov 12, 2020
@PettitWesley
Copy link
Contributor Author

PettitWesley commented Nov 12, 2020

General note- I can not make any definite promises on timeline- but we are watching this issue and my team and I will be making our way through these requests over the next few weeks and months.

@zhonghui12
Copy link
Contributor

@diranged Hi, I am from Wesley's team and am working on the issue you mentioned above, to support ACL in S3. Do you think the canned ACL is good enough: https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html? Like you are able to request the policy you need and we apply it to your bucket. Or is canned ACL insufficient and you want to grant permissions to specific users or aws accounts?

@hawkesn
Copy link

hawkesn commented Jan 5, 2021

Hi @PettitWesley , thanks so much for your work on the s3 plugin. Just wondering if the compression is still being worked on? 😄

@PettitWesley
Copy link
Contributor Author

@hawkesnc it has been merged but not released IIRC.

CC @zhonghui12

@PettitWesley
Copy link
Contributor Author

Some discussion on s3_key_format in #2905

@fvasco
Copy link

fvasco commented Jan 18, 2021

Documentation: we did not found how to configure IAM, what are permissions used?

Compression +1, it reduces overall costs.

@zhonghui12
Copy link
Contributor

Hi @fvasco , s3:PutObject is the only permission we need and you check here for more details.

@zhonghui12
Copy link
Contributor

Gzip compression is available in S3, and you can compression: gzip to enable it in the configuration file.

@ephemeralsnow
Copy link

I'm expecting json_date_key: false to be implemented like out_stdout.
https://docs.fluentbit.io/manual/pipeline/outputs/standard-output#configuration-parameters

@fvasco
Copy link

fvasco commented Jan 20, 2021

Hi @zhonghui12,
thank you for your response.

IAM permissions aren't predictable, the S3 plugin page misses this information.

I looked for the compression option in https://docs.fluentbit.io/manual/pipeline/outputs/s3, but it is missing.

Edit: compression works

@zhonghui12
Copy link
Contributor

Hi @fvasco, thanks for the suggestions. We've submitted the PRs and they are ready to be merged. The documentation will be updated soon.

@PettitWesley
Copy link
Contributor Author

@fvasco Use v1.6.10 or the code in the 1.6 branch for compression support

@gregmankes
Copy link

Hi @PettitWesley 👋 I wrote up this feature request

TL;DR: It would be great if we could configure the aws s3 credentials via the output configuration 😄

@harrish81286
Copy link

I have explained my issue in #2962

The last chunk of the log (which is not meeting the minimum size of the multipart upload) should be part of multipart upload before the log router exits. This will help us not to stich the logs together to maintain the chronological order later. Here we don't expect the container to restart and hence the request.

@tchen
Copy link

tchen commented Jan 26, 2021

I found that if compression: gzip is set, use_put_object: true is also needed. Hope this gets documented. Or point me to the documentation and I can make a PR for it.

I plan to query the logs via AWS Athena. One issue I ran into with using compression is that Athena does not read the logs as gzip format unless the extension is .gz. But due to the unique suffix added to the file name, this breaks that functionality. Compression is pretty important because I routinely get 10:1 improvement in storage, that also translates to savings when querying in Athena as well as storage costs on S3.

@PettitWesley
Copy link
Contributor Author

@tchen I believe we already put up a PR for those docs. CC @zhonghui12

gzip format unless the extension is .gz.

Interesting. We are planning on fixing this in the next few months. I think I made a comment earlier in this issue on the plan. You'll have the option of configuring where the randomness gets added to the file name, which will let you set any extension which is needed.

@PettitWesley PettitWesley added the AWS Issues with AWS plugins or experienced by users running on AWS label Mar 12, 2021
@fluent fluent deleted a comment from github-actions bot Apr 1, 2021
@zhonghui12
Copy link
Contributor

We will track the request from @bksteiny (#2700 (comment)) and brunosimsenhor (#3035) for AWS Object Lock here.

@github-actions
Copy link
Contributor

github-actions bot commented May 4, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@PettitWesley
Copy link
Contributor Author

Those stable comments are annoying 😐

What are folks looking for us to improve in the S3 output?

@justchris1
Copy link

I am not sure if I am just missing it, but I can't find anywhere to jam in an access key/secret access key for the plugin to use. It seems this does the more sophisticated ARN->STS->temporary access key/secret access key lookups supporting machines which are running in AWS workloads, but not just bypassing that with a provided access key/secret access key.

Background: I have a developing use case where I would like to store logs of device I control, but is temporarily on someone else's network. Therefore, my log server isn't accessible and I am prohibited at the network level from opening a VPN session or IPSec tunnel. However, I can hit AWS S3 so I was thinking of using fluent bit to upload the logs to S3, and then on my local log server have it pull down those logs (likely in fluentd, since fluent bit doesn't support S3 input). However, I can't figure out how to tell fluent bit to use the access key/secret access key to upload the logs.

@elrob
Copy link

elrob commented Jul 19, 2021

@PettitWesley
Thank you for the changes made by you and your team to support UUID within the key (not suffix) and gzip support.

I have another request: 🙏
Make the injected date key configurable.
Currently it is possible to configure the value format e.g. epoch with json_date_format.
I'd like to be able to configure the key e.g. to use timestamp as the key rather than date.
e.g. by using a parameter json_date_key or similar

@PettitWesley
Copy link
Contributor Author

@elrob I believe that exists, S3 supports json_date_key

@PettitWesley
Copy link
Contributor Author

PettitWesley commented Jul 19, 2021

@justchris1 Fluent BIt supports all standard AWS Credential sources, including environment variables and a local credentials file via AWS_SHARED_CREDENTIALS_FILE.

https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/configuring-sdk.html

@elrob
Copy link

elrob commented Jul 20, 2021

Thanks @PettitWesley
Yes, it's already supported. I didn't realize because it's not included in the docs.
I've created a PR to update the docs:
fluent/fluent-bit-docs#574

@chitralverma
Copy link

chitralverma commented Sep 15, 2021

@PettitWesley Here's a feature request for S3 output - Ability to configure the content formatting of uploaded S3 objects.

The uploaded objects are always newline separated json files, it will be great to allow a new key called record_format which can be configured with values like CSV, TSV, and defaults to JSON.

I am not familiar with the internal workings of this plugin, but if this feature holds weight for the community, I can look into it and possibly raise a PR.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2022

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Jan 4, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2022

This issue was closed because it has been stalled for 5 days with no activity.

@PettitWesley
Copy link
Contributor Author

We're not actively planning any more S3 enhancements right now; but keeping this open for new requests.

@pranavmarla
Copy link

pranavmarla commented Jan 24, 2022

@PettitWesley

Request:
Right now, I believe that a single S3 output is only capable of sending logs to 1 S3 bucket in 1 region, because both bucket and region are static values.
It would be great if the S3 plugin supported Fluent Bit's record accessor syntax, so that it could dynamically extract the bucket and region from certain log fields, thus enabling a single Fluent Bit S3 output to send different logs to different buckets.

i.e. To send different logs to different buckets, instead of:

[OUTPUT]
    name            s3
    match           <TAG1>
    bucket          <BUCKET1>
    region          <REGION1>

[OUTPUT]
    name            s3
    match           <TAG2>
    bucket          <BUCKET2>
    region          <REGION2>

[OUTPUT]
    name            s3
    match           <TAG3>
    bucket          <BUCKET3>
    region          <REGION3>

...

it would be great if I could just do:

[OUTPUT]
    name            s3
    match           *
    bucket          $log['bucket']
    region          $log['region']

This seems like a fairly obvious use case so if I'm mistaken and this is already possible, or if there is some technical reason why this will NEVER be possible, please feel free to let me know!

@PettitWesley
Copy link
Contributor Author

@pranavmarla this would be possible to implement. I think we will prioritize record accessor for the cloudwatch_logs plugin first tho.

@pranavmarla
Copy link

Sure, thanks @PettitWesley

@raxidex
Copy link

raxidex commented May 27, 2022

Plus one here that would like the bucket value to be able to accept record accessor like @pranavmarla said above, or accept tags .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS Issues with AWS plugins or experienced by users running on AWS enhancement
Projects
None yet
Development

No branches or pull requests