-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid using background goroutine for caching http ranges #10724
avoid using background goroutine for caching http ranges #10724
Conversation
Two different readers are being used here - one for the cache, and a different one returned to the client, so early finish of the s3 client should not impact cache fill operation. Could you give more specifics on the cache settings in use here and the minio release you are on? Tested with MINIO_CACHE_RANGE=on and could not repro an issue. |
@poornas I added my cache settings to the PR description in the section for testing. You are right. I overlooked that there are indeed two different Readers created. Looking deeper, I found that they share a common resource - the I was able to prove this hypothesis by modifying the background goroutine to use However, this would not be the perfect fix because the object is still downloaded twice from the backend - once to serve it to the S3 clients, and a second time to fill the cache. This hits not only the performance but also causes unwanted egress consumption from the backend. |
@kaloyan-raev ,the context being used in background is the global context and not the request context - this was fixed quite a while ago, please upgrade your minio version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Mint Automation
10724-9616297/mint-gateway-azure.sh.log:
Deleting image on docker hub |
Description
For some reason, caching HTTP ranges was implemented to download the data range in parallel in two separate goroutines:
I contrast, downloading a complete file uses a TeeReader in a single goroutine. It downloads the data range once from the backend and writes it to the cache mount while serving it to the S3 client.
This change modifies the cache code to uses the TeeReader in a single goroutine for downloading HTTP ranges.
The additional background goroutine remains for the case when
MINIO_CACHE_RANGE=off
. I suspect this case was the main motivation for the additional goroutine as the complete file should be downloaded to the cache, while only a range of it served to the S3 client.Motivation and Context
We have trouble with caching a larger file with our MinIO gateway for Tardigrade. The two goroutines are reading from the same Reader in parallel. The one serving the data range to the S3 client finishes first and closes the Reader. This causes the goroutine that is downloading the data range to the cache to fail.
This failure is caught at:
minio/cmd/disk-cache-backend.go
Line 715 in 5cc23ae
This forces the removal of all data for this file from the cache.
As a result, our gateway:
How to test this PR?
Use AWS CLI to download a larger file (tens or hundreds of MB) several times. Watch if the cache works as expected.
Cache settings:
Types of changes
Checklist:
commit-id
orPR #
here)