avoid using background goroutine for caching http ranges #10724

kaloyan-raev · 2020-10-21T15:28:53Z

Description

For some reason, caching HTTP ranges was implemented to download the data range in parallel in two separate goroutines:

One that downloads the data range from the backend and serves it to the S3 client, and
A second one that downloads the data from the backend and writes it to the cache mount.

I contrast, downloading a complete file uses a TeeReader in a single goroutine. It downloads the data range once from the backend and writes it to the cache mount while serving it to the S3 client.

This change modifies the cache code to uses the TeeReader in a single goroutine for downloading HTTP ranges.

The additional background goroutine remains for the case when MINIO_CACHE_RANGE=off. I suspect this case was the main motivation for the additional goroutine as the complete file should be downloaded to the cache, while only a range of it served to the S3 client.

Motivation and Context

We have trouble with caching a larger file with our MinIO gateway for Tardigrade. The two goroutines are reading from the same Reader in parallel. The one serving the data range to the S3 client finishes first and closes the Reader. This causes the goroutine that is downloading the data range to the cache to fail.

This failure is caught at:

minio/cmd/disk-cache-backend.go

Line 715 in 5cc23ae

removeAll(cachePath)

This forces the removal of all data for this file from the cache.

As a result, our gateway:

Doesn't have a working cache for larger files.
Wastes more egress traffic from the backend than required.

How to test this PR?

Use AWS CLI to download a larger file (tens or hundreds of MB) several times. Watch if the cache works as expected.

Cache settings:

MINIO_CACHE=on
MINIO_CACHE_DRIVES=/var/cache/storj
MINIO_CACHE_EXPIRY=180
MINIO_CACHE_QUOTA=80
MINIO_CACHE_WATERMARK_LOW=80
MINIO_CACHE_WATERMARK_HIGH=95
MINIO_CACHE_AFTER=0

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

Fixes a regression (If yes, please add commit-id or PR # here)
Documentation needed
Unit tests needed

poornas · 2020-10-23T00:59:36Z

The one serving the data range to the S3 client finishes first and closes the Reader. This causes the goroutine that is downloading the data range to the cache to fail.

Two different readers are being used here - one for the cache, and a different one returned to the client, so early finish of the s3 client should not impact cache fill operation.

Could you give more specifics on the cache settings in use here and the minio release you are on? Tested with MINIO_CACHE_RANGE=on and could not repro an issue.
Also possible that the object was garbage collected - how much space is left on the cache drive?

kaloyan-raev · 2020-10-23T12:22:59Z

@poornas I added my cache settings to the PR description in the section for testing.

You are right. I overlooked that there are indeed two different Readers created.

Looking deeper, I found that they share a common resource - the Context. If the main goroutine of the GET request finishes first, then perhaps the Context is canceled and the background goroutine continues with a canceled context. If the context is canceled, the client library of our backend service triggers the release of connection resources, which explains why reading from the second readers fails prematurely.

I was able to prove this hypothesis by modifying the background goroutine to use context.Background() instead of ctx. In this case, I don't reproduce the failure.

However, this would not be the perfect fix because the object is still downloaded twice from the backend - once to serve it to the S3 clients, and a second time to fill the cache. This hits not only the performance but also causes unwanted egress consumption from the backend.

poornas · 2020-10-23T18:57:29Z

@kaloyan-raev ,the context being used in background is the global context and not the request context - this was fixed quite a while ago, please upgrade your minio version.
This patch should fix the issue - would be nice if you can validate or incorporate in this patch.

poornas

LGTM

minio-trusted · 2020-10-27T06:02:44Z

Mint Automation

Test	Result
mint-large-bucket.sh	✔️
mint-fs.sh	✔️
mint-gateway-s3.sh	✔️
mint-erasure.sh	✔️
mint-dist-erasure.sh	✔️
mint-zoned.sh	✔️
mint-gateway-nas.sh	✔️
mint-gateway-azure.sh	❌ more...

10724-9616297/mint-gateway-azure.sh.log:

Running with
SERVER_ENDPOINT:      minio-c3.minio.io:32009
ACCESS_KEY:           minioazure
SECRET_KEY:           ***REDACTED***
ENABLE_HTTPS:         0
SERVER_REGION:        us-east-1
MINT_DATA_DIR:        /mint/data
MINT_MODE:            full
ENABLE_VIRTUAL_STYLE: 0

To get logs, run 'docker cp 0a3fd75c1532:/mint/log /tmp/mint-logs'

(1/15) Running aws-sdk-go tests ... done in 9 seconds
(2/15) Running aws-sdk-java tests ... done in 1 seconds
(3/15) Running aws-sdk-php tests ... done in 2 minutes and 38 seconds
(4/15) Running aws-sdk-ruby tests ... done in 21 seconds
(5/15) Running awscli tests ... done in 2 minutes and 52 seconds
(6/15) Running healthcheck tests ... done in 1 seconds
(7/15) Running mc tests ... done in 3 minutes and 22 seconds
(8/15) Running minio-dotnet tests ... done in 1 minutes and 38 seconds
(9/15) Running minio-go tests ... FAILED in 2 minutes and 49 seconds
{
  "args": {
    "bucketName": "minio-go-test-2pk4meyu36wqjk5m",
    "objectName": "test-object",
    "opts": "",
    "size": -1
  },
  "duration": 1161,
  "function": "PutObject(bucketName, objectName, reader,size,opts)",
  "message": "Unexpected size",
  "name": "minio-go: testPutObjectStreaming",
  "status": "FAIL"
}
(9/15) Running minio-java tests ... done in 10 minutes and 45 seconds
(10/15) Running minio-js tests ... FAILED in 47 seconds
{
  "name": "minio-js",
  "function": "\"after all\" hook in \"functional tests\"",
  "duration": 79,
  "status": "FAIL",
  "error": "S3Error: The bucket you tried to delete is not empty at Object.parseError (node_modules/minio/dist/main/xml-parsers.js:86:11) at /mint/run/core/minio-js/node_modules/minio/dist/main/transformers.js:156:22 at DestroyableTransform._flush (node_modules/minio/dist/main/transformers.js:80:10) at DestroyableTransform.prefinish (node_modules/readable-stream/lib/_stream_transform.js:129:10) at prefinish (node_modules/readable-stream/lib/_stream_writable.js:611:14) at finishMaybe (node_modules/readable-stream/lib/_stream_writable.js:620:5) at endWritable (node_modules/readable-stream/lib/_stream_writable.js:643:3) at DestroyableTransform.Writable.end (node_modules/readable-stream/lib/_stream_writable.js:571:22) at IncomingMessage.onend (_stream_readable.js:682:10) at endReadableNT (_stream_readable.js:1252:12) at processTicksAndRejections (internal/process/task_queues.js:80:21)"
}
(10/15) Running minio-py tests ... done in 18 minutes and 28 seconds
(11/15) Running s3cmd tests ... done in 2 minutes and 25 seconds
(12/15) Running s3select tests ... done in 38 seconds
(13/15) Running security tests ... done in 0 seconds

Executed 13 out of 15 tests successfully.

Deleting image on docker hub
Deleting image locally

avoid caching http ranges in background goroutine

6ee9816

kaloyan-raev changed the title ~~avoid using background goroutines for caching http ranges~~ avoid using background goroutine for caching http ranges Oct 21, 2020

harshavardhana requested review from poornas and harshavardhana October 21, 2020 16:22

poornas approved these changes Oct 27, 2020

View reviewed changes

harshavardhana approved these changes Oct 27, 2020

View reviewed changes

harshavardhana added 2 commits October 26, 2020 18:54

Merge branch 'master' into cache-range-background

6c54dd9

Merge branch 'master' into cache-range-background

9616297

minio deleted a comment from minio-trusted Oct 27, 2020

harshavardhana merged commit df9894e into minio:master Oct 27, 2020

kaloyan-raev mentioned this pull request Oct 27, 2020

Do not cleanup range files in SaveMetadata if incHitsOnly is false #10728

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid using background goroutine for caching http ranges #10724

avoid using background goroutine for caching http ranges #10724

kaloyan-raev commented Oct 21, 2020 •

edited

poornas commented Oct 23, 2020

kaloyan-raev commented Oct 23, 2020 •

edited

poornas commented Oct 23, 2020

poornas left a comment

minio-trusted commented Oct 27, 2020

avoid using background goroutine for caching http ranges #10724

avoid using background goroutine for caching http ranges #10724

Conversation

kaloyan-raev commented Oct 21, 2020 • edited

Description

Motivation and Context

How to test this PR?

Types of changes

Checklist:

poornas commented Oct 23, 2020

kaloyan-raev commented Oct 23, 2020 • edited

poornas commented Oct 23, 2020

poornas left a comment

Choose a reason for hiding this comment

minio-trusted commented Oct 27, 2020

Mint Automation

10724-9616297/mint-gateway-azure.sh.log:

kaloyan-raev commented Oct 21, 2020 •

edited

kaloyan-raev commented Oct 23, 2020 •

edited