Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Multi part upload is not uploading the final chunk correctly #2962

Closed
harrish81286 opened this issue Jan 21, 2021 · 8 comments
Closed

S3 Multi part upload is not uploading the final chunk correctly #2962

harrish81286 opened this issue Jan 21, 2021 · 8 comments
Labels
AWS Issues with AWS plugins or experienced by users running on AWS enhancement Stale

Comments

@harrish81286
Copy link

Bug Report

Describe the bug
When S3 plugin is setup to use the multi part upload with chunk size to s3, final chunk is not uploaded properly. It creates a separate file for the last chunk. If you see in the example below, chunk size was set to 5M. Overall the log size is about 17M. But I see 2 files with 11.7

To Reproduce
`[SERVICE]
Flush 1
Grace 120

[OUTPUT]
Name file
Match *
Path /tmp
File ${TaskId}.log
Format plain

#[OUTPUT]
#Name stdout
#Match *

[OUTPUT]
Name s3
Match *
region us-west-2
bucket raviteja-test
s3_key_format /fluent-bit-logs/${TaskId}.log

`

Expected behavior
Expected a single file 12.log

Screenshots

See the final chunk not being part of multipart upload
image

S3 bucket looks like this
image

@harrish81286
Copy link
Author

harrish81286 commented Jan 22, 2021

I believe this is explained better in #2905 (comment)

But I think this behavior needs to be enhanced because we lose the continuity of the log and clients are expected to stich the logs together to recreate the chronological order of the logs.

@PettitWesley
Copy link
Contributor

are somehow expected to stich the logs together to recreate the chronological order of the logs.

I don't understand this... if you are uploading logs over time you will always end up with multiple log files.

Are you collecting a fixed amount of logs and desire them to all be in the same file in S3?

Also- the logs shouldn't be out of order in this case (though the plugin also can not make guarantees that logs will necessarily be in order across files).

The reason I chose to implement it this way is because I felt it was safer. Basically on shut down the plugin wants to:

  1. Complete any in progress multipart uploads ASAP (this is done through a call to CompleteMultipartUpload). This ensures that the data already sent to S3 is available- if the CompleteMultipartUpload is not made then the data will not appear in the bucket and is "lost" (if you don't know to look for it).
  2. Send any locally buffered chunks ASAP. The fastest way to send is to make a PutObject call since an upload requires only 1 call. Uploading new data with multipart requires 2 calls for a complete upload- UploadPart and CompleteMultipartUpload.

Essentially, you're requesting that the plugin chain together two calls to upload the remaining data, which I think is slightly riskier than having the calls be independent.

I could change the code to do what you are asking for though- but the question for me still is- why? I understand it's a bit weird to have two files, but its not clear to me why that is a real problem.

@harrish81286
Copy link
Author

Hi,

Thanks for the reply.

Use case here is that we have a long running job (for example a build command which runs for 45 mins - 2hrs) via the ecs task. Once the job is complete, same task will not run again and the EC2 machines will be used for some other build at later point of time. So we would like to capture all the logs belonging to that particular task which we can use it for debugging purposes.

Probably you can make it configurable so that we can retain the current functionality for other use cases.

@harrish81286
Copy link
Author

Are you collecting a fixed amount of logs and desire them to all be in the same file in S3?

Yes, this is correct. Since we know a task runs for a given amount of time, we would like to collect all of its log into a single file.

Thanks

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Mar 5, 2021
@PettitWesley PettitWesley added AWS Issues with AWS plugins or experienced by users running on AWS enhancement help wanted and removed Stale help wanted labels Mar 5, 2021
@harrish81286
Copy link
Author

Hello @PettitWesley ,

Any update on this request ? By the way, we also spoke to AWS TAM who said they will contact you get priority for this particular use case.

Thanks

@github-actions
Copy link
Contributor

github-actions bot commented Apr 9, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Apr 9, 2021
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS Issues with AWS plugins or experienced by users running on AWS enhancement Stale
Projects
None yet
Development

No branches or pull requests

2 participants