S3 Output Compression not working #3676

justchris1 · 2021-06-23T19:17:06Z

Bug Report

Describe the bug
Using td-agent-bit version 1.7.8 with the S3 output, the compression setting seems to be ignored, even when using use_put_object true

To Reproduce
Here is my configuration of the output s3 block.

[OUTPUT]
    name s3
    match *
    region us-east-2
    bucket my-bucket-name
    s3_key_format /fluent-bit-logs/$TAG/%Y/%m/%d/%H/%M/%S/$UUID.gz
    use_put_object On
    total_file_size 40M
    upload_timeout 1m
    compression gzip

Regardless if compression setting is missing (inferring none) or present with gzip, the uploaded files are always cleartext / uncompressed.

Expected behavior
Logs uploaded would be compressed with gzip before upload.

Your Environment

Version used: 1.7.8
Configuration: (See above)
Environment name and version (e.g. Kubernetes? What version?): RPM install
Server type and version: AWS t3a instance
Operating System and version: Centos 8, fully patched as of 2021-06-23
Filters and plugins: none

I can find nothing in the error logs about a failed compression. Every upload, I get a 'happy' message: Successfully uploaded object. However, the file is still cleartext. I saw references in @PettitWesley thread in #2700 that this was working, so I am unsure if this is a regression or something else.

The text was updated successfully, but these errors were encountered:

mtparet · 2021-06-24T12:29:49Z

I have the same issue, I wondering if setting the content encoding to gzip is the issue. Does S3 automatically decompress the file on its side ?

justchris1 · 2021-06-24T12:34:49Z

I have the same issue, I wondering if setting the content encoding to gzip is the issue. Does S3 automatically decompress the file on its side ?

I know of no AWS S3 function that would be capable of doing that. S3 is just an object store. I verified this issue by downloading the S3 object directly after it was uploaded to eliminate the fluentd input that was pulling it down as the source of the problem.

justchris1 · 2021-06-24T13:14:52Z

I should have been more specific in my reply, my apologies. Certainly the Content-Type field is available to store what you want in it. However, to my knowledge, S3 won't mutate the data object stored in any way based on the value of that field.

…

On Thu, Jun 24, 2021 at 8:37 AM Matthieu Paret ***@***.***> wrote: AWS S3 is capable of that, it is written directly in the documentation. Content-Encoding Specifies what content encodings have been applied to the object and thus what decoding mechanisms must be applied to obtain the media-type referenced by the Content-Type header field. For more information, see https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3676 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHGRWCSJUDOE5X64CEWQYXDTUMRHFANCNFSM47GNN54Q> .

canidam · 2021-07-06T06:51:04Z

I have the same issue using fluent/fluent-bit:1.7.9. Any idea if the configuration is wrong, or this is an actual bug?

Is it possible there's a threshold for compression? for example, if the file is less than 1K it skips the compression part?

justchris1 · 2021-07-06T11:07:36Z

I have the same issue using fluent/fluent-bit:1.7.9. Any idea if the configuration is wrong, or this is an actual bug?

Nope. I haven't even seen any comment from someone at the project even acknowledging the issue.

PettitWesley · 2021-07-06T17:23:27Z

@DrewZhang13 @zhonghui12

DrewZhang13 · 2021-07-08T08:47:20Z

ACK, issue is reproduced with the same config and fluent bit version.
The uploaded file is always cleartext / uncompressed.

DrewZhang13 · 2021-07-09T16:42:00Z

@justchris1 @mtparet After some more test with the same config provided in this issue, the file will auto decompressed if downloaded on Macbook, but uploaded file in S3 is already compressed after size-based comparison.
So I didn't see the real uncompressed file issue in my testing machine.

Could you provide the comparison of size for the file uploaded in S3 and in your local before uploaded to confirm if it's really uncompressed?

justchris1 · 2021-07-09T17:00:02Z

@justchris1 @mtparet After some more test with the same config provided in this issue, the file will auto decompressed if downloaded on Macbook, but uploaded file in S3 is already compressed after size-based comparison.
So I didn't see the real uncompressed file issue in my testing machine.

Could you provide the comparison of size for the file uploaded in S3 and in your local before uploaded to confirm if it's really uncompressed?

@DrewZhang13 - When I was debugging this, I eliminated the automated ingestion of the file into fluentd on the other side. To confirm it was uncompressed, I downloaded the file directly from S3 after it was uploaded by fluent-bit. When I inspect the file stored in S3, it is uncompressed. S3 has no 'auto-compress' or 'uncompress' functions, so downloading it represents what was stored in S3. The content is plaintext & readable.

DrewZhang13 · 2021-07-12T05:13:12Z

@justchris1 So i have verified from both Macbook and Linux, no similar uncompressed situation comes at my side.
I used the same configuration you provided and compared the file I download from S3.
The file size I download is 679B before decompression and 31KB after decompression.
I could use VI to see the cleartext when the file is compressed. I wonder if this is the situation you mean for the cleartext?

justchris1 · 2021-07-14T14:21:03Z

I could use VI to see the cleartext when the file is compressed. I wonder if this is the situation you mean for the cleartext?

No, I meant from a 'dumb' text editor like Windows Notepad. When I download the file that was uploaded by fluent-bit with the configuration file shown in the issue from the AWS console, I am able to open the file in Notepad immediately and see clear text.

canidam · 2021-07-18T08:56:17Z

@DrewZhang13
There's a weird behavior here. I've a file on S3 with the size 2.3KB
When I download it on my Mac, the size grows to 17K and the file type is JSON Data

➜ file 0N285h0f.gz
0N285h0f.gz: JSON data

I did another test. I use fluentd to consume these log files, and when I use text type it prints binary data. When I use gzip type, it works. So I guess compression works, but something weird is going on when downloading the objects from S3 on a Mac.

DrewZhang13 · 2021-07-19T17:58:41Z

@canidam yeah Mac will automatically decompressed when you download from S3. I think this is the reason why you are seeing wired behavior.

github-actions · 2021-08-19T01:46:04Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

justchris1 · 2021-08-21T01:38:47Z

I still see this behavior. Please do not close.

github-actions · 2021-10-01T01:52:35Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

ssc-ksaitou · 2021-10-01T11:56:07Z

This is occurred by the attribute Content-Encoding: gzip tagged with the log.gz file fluent-bit have uploaded.

When you download the file tagged as Content-Encoding: gzip, user agent (e.g. Chrome, curl) will automatically decode the content as same as downloading gzipped stream on HTTP since Content-Encoding: gzip header has been appended to the response header.
Yes, it's obviously been compressed on S3.

An easy solution is to just remove off .gz extension from s3_key_format.

There seems to be no way to turn off Content-Encoding: gzip.

github-actions · 2021-11-05T01:48:39Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

justchris1 · 2021-11-08T11:12:24Z

This would not explain why I would get parsing errors in fluentd with compression turned on (and the corresponding configuration in fluentd indicating it was compressed, but working immediately after disabling the fluentd side only to indicate no compression). I still see this behavior. Please do not close.

github-actions · 2021-12-10T02:03:36Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

justchris1 · 2021-12-10T13:05:12Z

I still see this behavior. Issue is not resolved.

gjirm · 2021-12-13T09:11:41Z

I can see this behavior on multiple systems too with fluentbit 1.8.10 (standalone fluentbit and fluentbit in the docker container). I also experienced this behavior on previous 1.8.x versions.

This is my config:

[OUTPUT]
    Name                            s3
    Match                           auth
    bucket                          server-logs
    region                          eu-west-1
    tls                             On
    s3_key_format                   /auth-logs/$TAG/%Y/%m/%d/h%H/%M-$UUID.gz
    s3_key_format_tag_delimiters    .-_
    compression                     gzip
    use_put_object                  On
    total_file_size                 50M
    upload_timeout                  10m

Spritekin · 2022-02-23T05:21:10Z

I don't think this is working.
I have a similar configuration to the ones reported before:

        [OUTPUT]
            Name s3
            Match *
            bucket mybucket
            region ap-southeast-2
            store_dir /home/ec2-user/buffer
            s3_key_format /fluentbit/$TAG[2]/$TAG[0]/%Y/%m/%d/%H/%M/%S/$UUID.gz
            s3_key_format_tag_delimiters .-
            compression gzip
            use_put_object On
            total_file_size 50M

And I got my files in S3. I.E.
s3://mybucket/fluentbit/log/kube/2022/02/23/01/35/48/15BRQR03.gz

Then I select the file in S3 and in the object actions I select "Query with S3 Select"

So in the S3 select I configure like:

Now you will notice I select one JSON per line and gzip compression as it is the expected output, however it returns an error that says GZIP is not applicable.

However, if I change the compression to None, I get a proper response on the same query:

While I got a Mac, these queries are being run inside AWS and the files won't touch my laptop so I can say with a level of certainty the files are not being gzipped.

marcosdiez · 2022-02-28T08:34:22Z

It works for me (i.e. I checked on S3 and the results are gzipped. Athena can read them because the files end with .gz).
I am using ubuntu 20.04 and I got fluentbit v1.8.12 from the official deb package (https://packages.fluentbit.io/ubuntu/focal).

Here are my settings:

[OUTPUT]
    name s3
    match *
    bucket XXXXXXXXXXXXX
    region us-east-1
    s3_key_format /prod-sslv-nginx/$TAG/%Y/%m/%d/%H/%M/%S-$UUID.gz
    total_file_size 1M
    upload_timeout 1m
    compression gzip

Spritekin · 2022-02-28T23:01:22Z

@marcosdiez

Sure I tried that, please read my test configuration above. Maybe it has been fixed but I was using a recent helm installation.

  repository = "https://fluent.github.io/helm-charts"
  chart      = "fluent-bit"
  version    = "0.19.19"

One thing, not sure that configuration you use works as I'm quite sure you need to set the "use_put_object On" option (I got an error saying I had to turn in on when I omitted the option and the container wouldn't start). If you don't get the error then its another sign the version you are testing might have been updated.

PettitWesley · 2022-03-01T19:12:24Z

@Spritekin Compression has always only worked wtih Use_Put_object On

Spritekin · 2022-03-03T01:36:38Z

@PettitWesley
I'm not claiming otherwise, as you can see in my analysis above the flag is configured. My comment was because Marcos submitted a configuration with the gzip compression enabled but no "use_put_object On" option and said it worked ok. I just pointed his config would be wrong because the use_put_object flag was not set.

logston · 2022-03-16T00:08:28Z

$ aws --profile 1234567890 s3 cp s3://mybucket/path/to/file/ItWLhdDe.log.gz ~/Downloads/ItWLhdDe.log.gz
$ ls -la ~/Downloads/ItWLhdDe.*
-rw-r--r--  1 paul  staff  2554 Mar 15 17:00 /Users/paul/Downloads/ItWLhdDe.log.gz
$ gunzip ~/Downloads/ItWLhdDe.log.gz
$ ls -la ~/Downloads/ItWLhdDe.*     
-rw-r--r--  1 paul  staff  22264 Mar 15 17:00 /Users/paul/Downloads/ItWLhdDe.log

WHY CHROME, WHY!?

github-actions · 2022-06-14T02:17:42Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions · 2022-06-19T02:18:28Z

This issue was closed because it has been stalled for 5 days with no activity.

xposionn · 2023-05-22T11:57:16Z

for who still facing this issue.
After deep dive, seems that compression does work but taking a look in the response header seems my .log (which compressed into .gz) has a content-type of application/octet-stream.
After adding content_type text/plain in the s3 output plugin config, the downloaded file has .txt end instead of .gz (which was compressed on s3 but decompressed after downloading with wrong file extension.)

github-actions bot added the Stale label Aug 19, 2021

github-actions bot removed the Stale label Aug 21, 2021

github-actions bot added the Stale label Oct 1, 2021

github-actions bot removed the Stale label Oct 5, 2021

github-actions bot added the Stale label Nov 5, 2021

github-actions bot removed the Stale label Nov 9, 2021

github-actions bot added the Stale label Dec 10, 2021

github-actions bot removed the Stale label Dec 15, 2021

Spritekin mentioned this issue Feb 26, 2022

pack: added java_sql_timestamp, a format string used by amazon athena #4811

Merged

5 tasks

github-actions bot added the Stale label Jun 14, 2022

github-actions bot closed this as completed Jun 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 Output Compression not working #3676

S3 Output Compression not working #3676

justchris1 commented Jun 23, 2021

mtparet commented Jun 24, 2021

justchris1 commented Jun 24, 2021

justchris1 commented Jun 24, 2021 via email

canidam commented Jul 6, 2021

justchris1 commented Jul 6, 2021

PettitWesley commented Jul 6, 2021

DrewZhang13 commented Jul 8, 2021

DrewZhang13 commented Jul 9, 2021 •

edited

Loading

justchris1 commented Jul 9, 2021

DrewZhang13 commented Jul 12, 2021

justchris1 commented Jul 14, 2021

canidam commented Jul 18, 2021

DrewZhang13 commented Jul 19, 2021

github-actions bot commented Aug 19, 2021

justchris1 commented Aug 21, 2021

github-actions bot commented Oct 1, 2021

ssc-ksaitou commented Oct 1, 2021 •

edited

Loading

github-actions bot commented Nov 5, 2021

justchris1 commented Nov 8, 2021

github-actions bot commented Dec 10, 2021

justchris1 commented Dec 10, 2021

gjirm commented Dec 13, 2021

Spritekin commented Feb 23, 2022 •

edited

Loading

marcosdiez commented Feb 28, 2022 •

edited

Loading

Spritekin commented Feb 28, 2022 •

edited

Loading

PettitWesley commented Mar 1, 2022

Spritekin commented Mar 3, 2022

logston commented Mar 16, 2022

github-actions bot commented Jun 14, 2022

github-actions bot commented Jun 19, 2022

xposionn commented May 22, 2023

S3 Output Compression not working #3676

S3 Output Compression not working #3676

Comments

justchris1 commented Jun 23, 2021

Bug Report

mtparet commented Jun 24, 2021

justchris1 commented Jun 24, 2021

justchris1 commented Jun 24, 2021 via email

canidam commented Jul 6, 2021

justchris1 commented Jul 6, 2021

PettitWesley commented Jul 6, 2021

DrewZhang13 commented Jul 8, 2021

DrewZhang13 commented Jul 9, 2021 • edited Loading

justchris1 commented Jul 9, 2021

DrewZhang13 commented Jul 12, 2021

justchris1 commented Jul 14, 2021

canidam commented Jul 18, 2021

DrewZhang13 commented Jul 19, 2021

github-actions bot commented Aug 19, 2021

justchris1 commented Aug 21, 2021

github-actions bot commented Oct 1, 2021

ssc-ksaitou commented Oct 1, 2021 • edited Loading

github-actions bot commented Nov 5, 2021

justchris1 commented Nov 8, 2021

github-actions bot commented Dec 10, 2021

justchris1 commented Dec 10, 2021

gjirm commented Dec 13, 2021

Spritekin commented Feb 23, 2022 • edited Loading

marcosdiez commented Feb 28, 2022 • edited Loading

Spritekin commented Feb 28, 2022 • edited Loading

PettitWesley commented Mar 1, 2022

Spritekin commented Mar 3, 2022

logston commented Mar 16, 2022

github-actions bot commented Jun 14, 2022

github-actions bot commented Jun 19, 2022

xposionn commented May 22, 2023

DrewZhang13 commented Jul 9, 2021 •

edited

Loading

ssc-ksaitou commented Oct 1, 2021 •

edited

Loading

Spritekin commented Feb 23, 2022 •

edited

Loading

marcosdiez commented Feb 28, 2022 •

edited

Loading

Spritekin commented Feb 28, 2022 •

edited

Loading