Fluentbit S3 PlugIn created the Files with Strange Character Names #2905

VF-mbrauer · 2021-01-05T13:28:27Z

Bug Report

Describe the bug
When the Logs have been delivered to S3-Storage the filenames are looking strange in comparison from what I remember
from FluentD, where the Filenames have had meaningful names.
As an example the name looks like this: 00-objecthy79ndzk

Is there any change to get this customized, or is this something what will be corrected in some time?

To Reproduce

Rubular link if applicable:
Example log message if applicable:

Name                Type   Last modified                             Size       Storage class
00-objectHfqpBisk |  -   | December 16, 2020, 01:28:24 (UTC+01:00) | 218.0 KB | Standard
00-objecthy79ndzk |  -   | December 16, 2020, 01:28:24 (UTC+01:00) | 156.4 KB | Standard
00-objecthYVixAQc |  -   | December 16, 2020, 01:38:24 (UTC+01:00) | 209.9 KB | Standard
00-objectiexMUMuZ |  -   | December 16, 2020, 01:43:24 (UTC+01:00) | 156.4 KB | Standard
00-objectjpyqHunt |  -   | December 16, 2020, 01:43:24 (UTC+01:00) | 216.0 KB | Standard
00-objectK4GBr4gA |  -   | December 16, 2020, 01:58:24 (UTC+01:00) | 209.9 KB | Standard
00-objectKcXoISsk |  -   | December 16, 2020, 01:48:24 (UTC+01:00) | 156.4 KB | Standard
00-objectlHBPIPDw |  -   | December 16, 2020, 01:18:24 (UTC+01:00) | 217.6 KB | Standard
00-objectlnjHyipg |  -   | December 16, 2020, 01:08:24 (UTC+01:00) | 209.9 KB | Standard
00-objectOcejgB8C |  -   | December 16, 2020, 01:13:24 (UTC+01:00)

Steps to reproduce the problem:

Expected behavior
nginx_logs-2020121601-0.json
nginx_logs-2020121602-0.json
nginx_logs-2020121603-0.json
nginx_logs-2020121604-0.json

We run the LTS of Fluentbit.

The text was updated successfully, but these errors were encountered:

VF-mbrauer · 2021-01-05T13:29:10Z

@PettitWesley: Can you please have a look.

@edsiper: FYI.

PettitWesley · 2021-01-05T18:41:59Z

Can you attach your fluent bit config?

The -objectHfqpBisk is expected behavior. When Fluent Bit uses the PutObject API, it automatically appends a random string to the object name to ensure that the object name is unique. Because if the name is not unique then it will overwrite the old object with the same name.

In your case, it looks like some how your S3 key is set so that the name is "00" (and then the other bit is appended for randomness).

In the future, we are planning to make the randomness customizable, but I think it is needed in the default experience to ensure users don't accidentally overwrite existing files in their bucket.

VF-mbrauer · 2021-01-05T21:15:18Z

Hi @PettitWesley, please find attached an extract of the fluentbit config: fluentbit.txt

VF-mbrauer · 2021-01-06T14:36:58Z

Hi @PettitWesley,

I tried also to put a more unique name by adding minutes to the config. But still no change, even if the names are different:

2021-01-06 14:51:10   28623441 apiserver_logs/2021/01/06/1351.json
2021-01-06 15:01:17   15048952 apiserver_logs/2021/01/06/1356.json-objectpAHsm4Bj
2021-01-06 15:06:20   14100871 apiserver_logs/2021/01/06/1401.json-object7Vrz6rMv
2021-01-06 15:11:24   14434855 apiserver_logs/2021/01/06/1406.json-objectSLWK3DdP
2021-01-06 15:16:28   14210086 apiserver_logs/2021/01/06/1411.json-objectBlqLYtOx
2021-01-06 15:21:32   14122207 apiserver_logs/2021/01/06/1416.json-objectm2CyEAwh
2021-01-06 15:26:36   14450949 apiserver_logs/2021/01/06/1421.json-objectKuZnQwZ7
2021-01-06 15:31:40   14000509 apiserver_logs/2021/01/06/1426.json-objecta4SNOeaF

Now I would expect that the randomized objects are not added as the names by default differ.

PettitWesley · 2021-01-06T18:49:02Z

Hey @VF-mbrauer I played with your config... I didn't get any names that looked like what you showed in your original comment "00-objectHfqpBisk". I am not sure how that happened.

Now I would expect that the randomized objects are not added as the names by default differ.

What you show in that latest comment is expected behavior. It will always add the randomness and there is no way to turn it off.

As I mentioned, we are planning to change how this works in the future- S3 enhancements are tracked here: #2700

PettitWesley · 2021-01-06T18:54:49Z

I should note- the plugin only adds this randomness when it uses the PutObject API.

Looking at your config, you have it set to use multipart uploads when it can (that's the default). However, it can only use the multipart API when there is enough log data. This is because each part in the multipart upload must be at least 5 MiB. I see you have an upload time out of 5 minutes- if you do not accumulate at least 5 MiB of logs in that time, then it will use the PutObject API and will create files with randomness appended.

The logic behind this feature was that when the PutObject API is used it tends to mean that new files are being created more quickly, whereas the multipart API tends to create larger files more slowly. So since files are being created quickly we need to make sure their names are random.

I don't regret building it that way and I do not plan to remove this feature, but in the future we will make it more configurable.

VF-mbrauer · 2021-01-06T19:25:45Z

Hi @PettitWesley,

Hey @VF-mbrauer I played with your config... I didn't get any names that looked like what you showed in your original comment "00-objectHfqpBisk". I am not sure how that happened.

I think I know why we get this name. It is because the hour (%H)
s3_key_format /k8s_logs/%Y/%m/%d/%H

so the 00 is the hour I have also 01 and 02 up to 24. So basically the whole day. I just did not want to put all the logs here from the complete day. Maybe my fault. Otherwise you would have seen that probably that this is a result of the hour format %H.

But one last thing is still strange. The first entry of an hour is always fine:
Here the example:

2021-01-06 11:16:40   28553728 apiserver_logs/2021/01/06/10
2021-01-06 11:06:35   14317385 apiserver_logs/2021/01/06/10-object0zFYOqVD
...
2021-01-06 12:32:39   14007158 apiserver_logs/2021/01/06/11
2021-01-06 12:37:43   14341676 apiserver_logs/2021/01/06/11-object5pEt65Bl
...
2021-01-06 13:28:20   28408077 apiserver_logs/2021/01/06/12
2021-01-06 13:58:42   14468679 apiserver_logs/2021/01/06/12-object0PZbe70u

OR

2021-01-06 19:09:30   28191815 apiserver_logs/2021/01/06/1809.json
2021-01-06 19:19:37   14377651 apiserver_logs/2021/01/06/1814.json-objectW59DtvnL
2021-01-06 19:24:41   13923075 apiserver_logs/2021/01/06/1819.json-object2rCGpXuq
...
2021-01-06 14:51:10   28623441 apiserver_logs/2021/01/06/1351.json
2021-01-06 15:01:17   15048952 apiserver_logs/2021/01/06/1356.json-objectpAHsm4Bj
021-01-06 15:01:17   15048952 apiserver_logs/2021/01/06/1356.json-objectpAHsm4Bj
2021-01-06 15:06:20   14100871 apiserver_logs/2021/01/06/1401.json-object7Vrz6rMv

If I understand you correct, it should always do it with randomized object keys?

I don't regret building it that way and I do not plan to remove this feature, but in the future we will make it more configurable.

I can understand that this is part of the design and also to make sure that overwriting will no happen, but it looks odd for the customer to have those kind of file and not clean and proper named ones. So I would appreciate if customers can adapt here more to their needs and to get also a better cosmetic view on files without cryptographic characters.
So if you plan something like this also mentioned in #2700 that would be great.

PettitWesley · 2021-01-06T22:56:35Z

I agree, so whats happening in your cases probably is:

Multipart Upload (because enough data was collected before timeout): 28191815 apiserver_logs/2021/01/06/1809.json
PutObject Upload (because too little data was collected in time): 14377651 apiserver_logs/2021/01/06/1814.json-objectW59DtvnL
PutObject Upload (because too little data was collected in time): 13923075 apiserver_logs/2021/01/06/1819.json-object2rCGpXuq

PutObject uploads add the randomness.

I think the main reason your uploads primarily are being sent as PutObject uploads is that you have a very large upload_chunk_size- 50M. Normally folks use the default which is close to 5M.

This means that Fluent Bit will not try to do a multipart upload until it has 50M of data. If it has less than that when the timeout passes, you will end up with PutObject uploads. It does this because a MultipartUpload requires at least 3 API requests, its more expensive in terms of the work that needs to be done.

I would not necessarily recommend changing your config though- I am guessing you set a large chunk size to save s3 upload costs.

The solution in the future will be what is mentioned in that issue- we will find someway to make the filenames more configurable. We can also make sure that the file names are same regardless of which API is used.

I think we will always require some randomness in the file name just for safety. However, we may allow you to configure where the randomness is placed in the file name, and how much randomness.

So imagine in the future all your names might look like: apiserver_logs/2021/01/06/1351-pAHsm4Bj.json

And you might set your S3 key as: /apiserver_logs/%Y/%m/%d/%H-${uuid}.json. That is the proposal in the issue- we require randomness for all files but let you specify where the randomness is placed using a special character sequence ${uuid}.

github-actions · 2021-03-06T02:08:30Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

PettitWesley · 2021-03-08T16:27:08Z

@zhonghui12

zhonghui12 · 2021-03-09T01:01:20Z

Hi @VF-mbrauer, recently we have merged a PR for S3 key format enhancements. With this feature, UUID and key extension are able to be specified so you can make the s3 key name more customized. More explanations can be found in this PR and we will merge this one soon.

Thanks

atitan · 2021-03-23T07:31:04Z

Hi @zhonghui12, we updated our aws-for-fluent-bit to 2.12.0, which includes this extension enhancement.

And with config looks like this:

[OUTPUT]
    Name                          s3
    Match                         host.*
    region                        ${AWS_REGION}
    bucket                        ${S3_BUCKET}
    total_file_size               1M
    upload_timeout                10s
    use_put_object                On
    s3_key_format                 /${CLUSTER_NAME}_host/%Y/%m/%d/%H/$TAG-$UUID.gz
    s3_key_format_tag_delimiters  .-_
    compression                   gzip

But still getting random extension names:

Any ideas?

zhonghui12 · 2021-03-23T16:49:57Z

Hi @atitan, the extension enhancement code hasn't released by fluent bit yet: https://fluentbit.io/announcements/v1.7.2/. It should be included in next release. So this feature is not supported for now.

macropin · 2021-03-24T03:51:15Z

@zhonghui12 are you sure? The documentation https://docs.fluentbit.io/manual/pipeline/outputs/s3 says that $UUID is supported.

Add $UUID in the format string to insert a random string.

zhonghui12 · 2021-03-24T17:45:03Z

Hi @atitan @macropin, I am sorry for the mistake. The doc was updated before the feature is released. I have submitted a PR to revert the docs update: fluent/fluent-bit-docs#497. Also, we will update the docs as soon as $UUID is available and I will let you know.

Thanks again for your understanding.

bgweber · 2021-04-02T21:22:38Z

Hi @zhonghui12, is there an ETA for the release? We are using fluentbit to write gzip compressed files to S3, and plan on processing the files using PySpark (Databricks). Spark currently requires .gz to be specified as the file extension: https://issues.apache.org/jira/browse/SPARK-29280

Looking forward to this feature! In the mean time, we're able to use uncompressed files for our workflow.

zhonghui12 · 2021-04-02T21:35:04Z

Hello @bgweber, I have emailed @edsiper and I think the release should come by the end of this week.

zhonghui12 · 2021-04-06T22:20:00Z

Hello @bgweber @atitan @VF-mbrauer @macropin, the feature is released in 1.7.3: https://fluentbit.io/announcements/v1.7.3/. And we also update the S3 file: https://github.com/fluent/fluent-bit-docs/blob/master/pipeline/outputs/s3.md. It may takes time for it to display on the documentation but the feature is already available.

Thanks.

agup006 · 2021-04-28T21:37:25Z

Closing as this is part of the 1.7.3 release

PettitWesley self-assigned this Jan 5, 2021

PettitWesley added the AWS Issues with AWS plugins or experienced by users running on AWS label Jan 5, 2021

PettitWesley mentioned this issue Jan 6, 2021

S3 Output Enhancements: Track S3 feature requests here #2700

Closed

harrish81286 mentioned this issue Jan 22, 2021

S3 Multi part upload is not uploading the final chunk correctly #2962

Closed

github-actions bot added the Stale label Mar 6, 2021

PettitWesley removed the Stale label Mar 8, 2021

agup006 closed this as completed Apr 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluentbit S3 PlugIn created the Files with Strange Character Names #2905

Fluentbit S3 PlugIn created the Files with Strange Character Names #2905

VF-mbrauer commented Jan 5, 2021

VF-mbrauer commented Jan 5, 2021

PettitWesley commented Jan 5, 2021

VF-mbrauer commented Jan 5, 2021

VF-mbrauer commented Jan 6, 2021

PettitWesley commented Jan 6, 2021

PettitWesley commented Jan 6, 2021

VF-mbrauer commented Jan 6, 2021

PettitWesley commented Jan 6, 2021

github-actions bot commented Mar 6, 2021

PettitWesley commented Mar 8, 2021

zhonghui12 commented Mar 9, 2021

atitan commented Mar 23, 2021 •

edited

Loading

zhonghui12 commented Mar 23, 2021

macropin commented Mar 24, 2021 •

edited

Loading

zhonghui12 commented Mar 24, 2021

bgweber commented Apr 2, 2021

zhonghui12 commented Apr 2, 2021

zhonghui12 commented Apr 6, 2021 •

edited

Loading

agup006 commented Apr 28, 2021

Fluentbit S3 PlugIn created the Files with Strange Character Names #2905

Fluentbit S3 PlugIn created the Files with Strange Character Names #2905

Comments

VF-mbrauer commented Jan 5, 2021

Bug Report

VF-mbrauer commented Jan 5, 2021

PettitWesley commented Jan 5, 2021

VF-mbrauer commented Jan 5, 2021

VF-mbrauer commented Jan 6, 2021

PettitWesley commented Jan 6, 2021

PettitWesley commented Jan 6, 2021

VF-mbrauer commented Jan 6, 2021

PettitWesley commented Jan 6, 2021

github-actions bot commented Mar 6, 2021

PettitWesley commented Mar 8, 2021

zhonghui12 commented Mar 9, 2021

atitan commented Mar 23, 2021 • edited Loading

zhonghui12 commented Mar 23, 2021

macropin commented Mar 24, 2021 • edited Loading

zhonghui12 commented Mar 24, 2021

bgweber commented Apr 2, 2021

zhonghui12 commented Apr 2, 2021

zhonghui12 commented Apr 6, 2021 • edited Loading

agup006 commented Apr 28, 2021

atitan commented Mar 23, 2021 •

edited

Loading

macropin commented Mar 24, 2021 •

edited

Loading

zhonghui12 commented Apr 6, 2021 •

edited

Loading