Skip to content

s3cache: Fix data race in readerAtCloser#6675

Merged
tonistiigi merged 1 commit intomoby:masterfrom
x-qdo:fix/s3-readerat-race
Apr 10, 2026
Merged

s3cache: Fix data race in readerAtCloser#6675
tonistiigi merged 1 commit intomoby:masterfrom
x-qdo:fix/s3-readerat-race

Conversation

@kuzaxak
Copy link
Copy Markdown
Contributor

@kuzaxak kuzaxak commented Apr 9, 2026

readerAtCloser implements io.ReaderAt, whose documented contract explicitly allows concurrent callers:

Clients of ReadAt can execute parallel ReadAt calls on the same
input source.

However the type has no synchronization protecting its mutable state (rc, ra, offset, closed). When used as the body of an S3 upload via manager.Uploader, the AWS SDK v2 upload manager spawns DefaultUpload- Concurrency=5 worker goroutines, each given an io.NewSectionReader slice that shares the same underlying body. Each worker's Read eventually becomes a concurrent ReadAt on the same *readerAtCloser at a different offset, racing on the close-then-reopen path.

Under the race detector this produces DATA RACE warnings on rc, offset and ra. Without the race detector it intermittently crashes buildkitd with a nil pointer dereference at the hrs.rc.Read(p) call in the io.ReaderAt fallback branch, when one goroutine nils rc between a peer's nil-check and Read dispatch. The crash kills the buildkit daemon and cannot be recovered by the buildctl client.

The bug reliably reproduces whenever a cache layer exceeds the default 5 MiB part size, i.e. essentially every real-world Docker image build that exports cache to S3.

PR #5597 added an offset parameter to s3Client.getReader and reduced the frequency of close-reopen on the common sequential-read path, but did not address the underlying thread-safety violation. This commit takes the minimum-risk correctness fix: serialize ReadAt and Close with a sync.Mutex so the io.ReaderAt contract is honoured and the existing close-reopen logic remains correct under concurrent callers.

Under the mutex this becomes serialised per readerAtCloser: correct, but potentially slower for a single large blob that is being re-exported from S3 back to S3. This is not a global BuildKit lock: different cache layers can still upload in parallel, but one blob backed by this reader loses multipart read parallelism and still pays the reopen churn. A proper follow-up would also eliminate the shared-state optimisation that causes the thrashing, for example by having ReaderAt open an independent reader per call (stateless), or by keeping a small pool of per-offset readers. That optimisation can be follow-up work after this correctness fix, or part of the #3993 refactor (which would also need a mutex added to the replacement contentutil.readerAt).

Closes #6674

Copy link
Copy Markdown
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks good but I don't think we need such a test

readerAtCloser implements io.ReaderAt, whose documented contract
explicitly allows concurrent callers:

    Clients of ReadAt can execute parallel ReadAt calls on the same
    input source.

However the type has no synchronization protecting its mutable state
(rc, ra, offset, closed). When used as the body of an S3 upload via
manager.Uploader, the AWS SDK v2 upload manager spawns DefaultUpload-
Concurrency=5 worker goroutines, each given an io.NewSectionReader
slice that shares the same underlying body. Each worker's Read
eventually becomes a concurrent ReadAt on the same *readerAtCloser
at a different offset, racing on the close-then-reopen path.

Under the race detector this produces DATA RACE warnings on rc,
offset and ra. Without the race detector it intermittently crashes
buildkitd with a nil pointer dereference at the hrs.rc.Read(p) call
in the io.ReaderAt fallback branch, when one goroutine nils rc
between a peer's nil-check and Read dispatch. The crash kills the
buildkit daemon and cannot be recovered by the buildctl client.

The bug reliably reproduces whenever a cache layer exceeds the
default 5 MiB part size, i.e. essentially every real-world Docker
image build that exports cache to S3.

PR moby#5597 added an offset parameter to s3Client.getReader and reduced
the frequency of close-reopen on the common sequential-read path,
but did not address the underlying thread-safety violation. This
commit takes the minimum-risk correctness fix: serialize ReadAt and
Close with a sync.Mutex so the io.ReaderAt contract is honoured and
the existing close-reopen logic remains correct under concurrent
callers.

Under the mutex this becomes serialised per readerAtCloser: correct,
but potentially slower for a single large blob that is being
re-exported from S3 back to S3. This is not a global BuildKit lock:
different cache layers can still upload in parallel, but one blob
backed by this reader loses multipart read parallelism and still
pays the reopen churn. A proper follow-up would also eliminate the
shared-state optimisation that causes the thrashing, for example by
having ReaderAt open an independent reader per call (stateless), or
by keeping a small pool of per-offset readers. That optimisation
can be follow-up work after this correctness fix, or part of the
moby#3993 refactor (which would also need a mutex added to the
replacement contentutil.readerAt).

Closes moby#6674

Signed-off-by: Vladimir Kuznichenkov <vova@kuzaxak.dev>
@kuzaxak kuzaxak force-pushed the fix/s3-readerat-race branch from 9c7b120 to b0e7d80 Compare April 10, 2026 06:02
@kuzaxak
Copy link
Copy Markdown
Contributor Author

kuzaxak commented Apr 10, 2026

Fix looks good but I don't think we need such a test

@tonistiigi Thanks for the review, dropped test.

@kuzaxak kuzaxak requested a review from tonistiigi April 10, 2026 06:03
@crazy-max crazy-max added this to the v0.30.0 milestone Apr 10, 2026
@tonistiigi tonistiigi merged commit 082e8d8 into moby:master Apr 10, 2026
274 of 277 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

s3 remote cache: data race / nil pointer panic in cache/remotecache/s3.readerAtCloser under concurrent multipart upload

3 participants