daemon: retry secrets dir remount on transient EBUSY by firecow · Pull Request #52235 · moby/moby

firecow · 2026-03-26T11:06:41Z

- What I did

Added retry logic to remountSecretDir for transient EBUSY errors when remounting the secrets/configs tmpfs as read-only.

- How I did it

Wrapped the mount.Mount("tmpfs", dir, "tmpfs", "remount,ro,...") call in a retry loop (up to 5 attempts, 100ms apart). Only EBUSY triggers a retry; other errors fail immediately. A debug-level log is emitted on each retry for observability.

This follows the existing retry-on-EBUSY pattern from EnsureRemoveAll in daemon/internal/containerfs/rm.go.

- How to verify it

Set up a Docker Swarm node running many services that use configs or secrets
Deploy a service with update_config: { order: start-first } and Docker configs
Trigger repeated service updates — the EBUSY race during secrets remount should now be retried instead of failing the container start

The issue is a race condition that occurs intermittently on busy hosts, so verification requires a loaded environment. See #48783 for reproduction details.

- Description

The race condition in remountSecretDir has existed for a long time, but became noticeably worse after the containerd v1.7 → v2.x bump (Docker 28.x/29.x). The major containerd upgrade appears to have changed container lifecycle timing enough to widen the window where the tmpfs mount is held during secret/config setup, making the EBUSY collision significantly more likely on busy hosts. We observe this consistently on Docker 29.3.0 (containerd v2.2.1) but not on Docker 27.5.0 (containerd v1.7.25) with comparable workloads (~100 containers).

- Human readable description for the release notes

Fix intermittent container start failures (`EBUSY` on secrets/configs remount) on busy Swarm nodes by retrying the read-only remount.

Fixes #48783

Signed-off-by: Mads Jon Nielsen madsjon@gmail.com

vvoland

Thanks!
Looks good, some input on the timing though

vvoland · 2026-04-10T15:37:11Z

Thanks @firecow!

Looks good, one thing remaining though - could you please rebase your branch on top of master and squash your commit into one?

On busy Docker hosts, the read-only remount of the secrets/configs tmpfs can fail with EBUSY when the mount is momentarily held by another process immediately after writing secret/config files. This is particularly common with Swarm services using start-first update order, where old and new containers coexist on the same node. Retry the remount up to 5 times with progressive backoff (10ms + retry * 25ms). Only EBUSY triggers a retry; other errors fail immediately. Fixes moby#48783 Signed-off-by: Mads Jon Nielsen <madsjon@gmail.com>

firecow · 2026-04-12T18:07:03Z

Thanks @firecow!

Looks good, one thing remaining though - could you please rebase your branch on top of master and squash your commit into one?

Done

vvoland

LGTM, thanks!

thaJeztah

LGTM

thaJeztah · 2026-04-16T12:12:48Z

windows failures are unrelated.

github-actions Bot added the area/daemon Core Engine label Mar 26, 2026

thaJeztah added status/2-code-review impact/changelog kind/bugfix PR's that fix bugs labels Apr 8, 2026

thaJeztah added this to the 29.4.1 milestone Apr 8, 2026

vvoland reviewed Apr 8, 2026

View reviewed changes

Comment thread daemon/container_operations_unix.go Outdated

firecow force-pushed the fix-secrets-remount-ebusy branch from 5677a8d to 3b22c50 Compare April 12, 2026 18:06

vvoland approved these changes Apr 16, 2026

View reviewed changes

thaJeztah approved these changes Apr 16, 2026

View reviewed changes

thaJeztah merged commit 48924f0 into moby:master Apr 16, 2026
339 of 350 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

daemon: retry secrets dir remount on transient EBUSY#52235

daemon: retry secrets dir remount on transient EBUSY#52235
thaJeztah merged 1 commit intomoby:masterfrom
firecow:fix-secrets-remount-ebusy

firecow commented Mar 26, 2026 •

edited

Loading

Uh oh!

vvoland left a comment

Uh oh!

Uh oh!

vvoland commented Apr 10, 2026

Uh oh!

firecow commented Apr 12, 2026

Uh oh!

vvoland left a comment

Uh oh!

thaJeztah left a comment

Uh oh!

thaJeztah commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

firecow commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vvoland left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vvoland commented Apr 10, 2026

Uh oh!

firecow commented Apr 12, 2026

Uh oh!

vvoland left a comment

Choose a reason for hiding this comment

Uh oh!

thaJeztah left a comment

Choose a reason for hiding this comment

Uh oh!

thaJeztah commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

firecow commented Mar 26, 2026 •

edited

Loading