daemon: retry secrets dir remount on transient EBUSY#52235
Merged
thaJeztah merged 1 commit intomoby:masterfrom Apr 16, 2026
Merged
daemon: retry secrets dir remount on transient EBUSY#52235thaJeztah merged 1 commit intomoby:masterfrom
thaJeztah merged 1 commit intomoby:masterfrom
Conversation
vvoland
reviewed
Apr 8, 2026
Contributor
vvoland
left a comment
There was a problem hiding this comment.
Thanks!
Looks good, some input on the timing though
Contributor
|
Thanks @firecow! Looks good, one thing remaining though - could you please rebase your branch on top of master and squash your commit into one? |
On busy Docker hosts, the read-only remount of the secrets/configs tmpfs can fail with EBUSY when the mount is momentarily held by another process immediately after writing secret/config files. This is particularly common with Swarm services using start-first update order, where old and new containers coexist on the same node. Retry the remount up to 5 times with progressive backoff (10ms + retry * 25ms). Only EBUSY triggers a retry; other errors fail immediately. Fixes moby#48783 Signed-off-by: Mads Jon Nielsen <madsjon@gmail.com>
5677a8d to
3b22c50
Compare
Contributor
Author
Done |
Member
|
windows failures are unrelated. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
- What I did
Added retry logic to
remountSecretDirfor transientEBUSYerrors when remounting the secrets/configs tmpfs as read-only.- How I did it
Wrapped the
mount.Mount("tmpfs", dir, "tmpfs", "remount,ro,...")call in a retry loop (up to 5 attempts, 100ms apart). OnlyEBUSYtriggers a retry; other errors fail immediately. A debug-level log is emitted on each retry for observability.This follows the existing retry-on-EBUSY pattern from
EnsureRemoveAllindaemon/internal/containerfs/rm.go.- How to verify it
update_config: { order: start-first }and Docker configsEBUSYrace during secrets remount should now be retried instead of failing the container startThe issue is a race condition that occurs intermittently on busy hosts, so verification requires a loaded environment. See #48783 for reproduction details.
- Description
The race condition in
remountSecretDirhas existed for a long time, but became noticeably worse after the containerd v1.7 → v2.x bump (Docker 28.x/29.x). The major containerd upgrade appears to have changed container lifecycle timing enough to widen the window where the tmpfs mount is held during secret/config setup, making theEBUSYcollision significantly more likely on busy hosts. We observe this consistently on Docker 29.3.0 (containerd v2.2.1) but not on Docker 27.5.0 (containerd v1.7.25) with comparable workloads (~100 containers).- Human readable description for the release notes
Fixes #48783
Signed-off-by: Mads Jon Nielsen madsjon@gmail.com