Calling ContainerExecAttach and ContainerExecStart will lead to unnoticed race condition #42408

KnisterPeter · 2021-05-23T16:11:24Z

Description

There is race condition calling the client api functions ContainerExecAttach and ContainerExecStart.
Depending on the exec state a call to this APIs errors in docker daemon but this is not reflected on the docker client side.

If welcome I can create a PR which changes this behavior.

Steps to reproduce the issue:

Use the docker client go API
Create an ExecCommand with ContainerExecCreate
Run ContainerExecAttach
Run ContainerExecStart

Describe the results you received:

The above call order is invalid, since ContainerExecAttach will implicitly start the execution and calling ContainerExecStart afterwards will run successfully.
But in some cases the docker daemon responds with Error: exec command xxx is already running.

Describe the results you expected:

The client should check if a command is already running prior to starting the command and fail if the call would be invalid, because the execution is already running.

Additional information you deem important (e.g. issue happens only occasionally):

This may sound like a nitpick according to the API, but I think it would be quite helpful to allow to use the API in the expected way.
Since the docker daemon returns an error, the client should return an error as well, either by checking upfront if the call is valid at all or by returning the error from the daemon.

Since this is a race condition (only happens sometimes), it was a hard to understand issue in act.

Output of docker version:

Docker version 20.10.5+dfsg1, build 55c4c88

Output of docker info:

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 4
  Running: 4
  Paused: 0
  Stopped: 0
 Images: 44
 Server Version: 20.10.5+dfsg1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1.4.4~ds1-2
 runc version: 1.0.0~rc93+ds1-3
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.0-6-amd64
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 30.99GiB
 Name: lapi
 ID: 3OSF:JB54:EK2M:IUK2:PNQT:6HWC:2ULI:CWZM:BFNU:I5FV:6OTN:UTD3
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: Support for cgroup v2 is experimental

Additional environment details (AWS, VirtualBox, physical, etc.):

Using debian linux.

The text was updated successfully, but these errors were encountered:

mieubrisse · 2021-07-20T18:33:01Z

Huge +1 to this - I've been doing the same create/attach/start call, had no idea that Attach actually started the command, and have my Execs fail very occasionally (maybe once very 20 runs), and only on CI where there's a larger lag between client & the Docker engine.

Extra weirdness about this: when this bug shows up, the ContainerExecStart command doesn't return an error. Instead, the "Exec command XXXXXX is already running" shows up in the STDOUT of the exec command.

thaJeztah · 2021-07-20T22:16:41Z

What's proposed here would be the same as doing a ContainerExecInspect before calling ContainerExecStart. In the situation described above (ContainerExecAttach and ContainerExecStart called after each other), I don't think that would help;

ContainerExecAttach is called, which starts and attaches
ContainerExecInspect is called, which would have the same race-condition as ContainerExecStart in the existing situation, so would randomly return "exec is not started" or "exec started"
in case 2. returned "exec is not started", ContainerExecStart is executed, and (again) result in either a "success" or "exec already started"

The only possible alternative could be to make ContainerExecStart idempotent, and in case the exec was already started to (instead of returning an error) to just "attach", but not sure that's worth changing.

mieubrisse · 2021-07-20T22:24:29Z

Making ContainerExecStart idempotent would make a lot of sense IMO - either that, or making ContainerExecAttach not start the exec (though guessing that would break everything for everybody so it's not an option)

thaJeztah · 2021-07-20T22:27:12Z

IIRC, ContainerExecAttach also performs the start, to prevent the initial output from the process from being missed, but I'd have to dig in git history to confirm that

mieubrisse · 2021-07-20T22:31:29Z

Yep, it does perform the start (that's what causes this race condition)! It makes sense why it might do it that way, so making Start idempotent sounds like the best thing IMO

iojcde · 2023-11-12T09:08:41Z

Would just removing these lines be enough to fix the issue? I'd like to make sure there wouldn't be other issues if I do that with a PR.

moby/daemon/exec.go

Lines 169 to 172 in 3b423ea

    
           if ec.Running { 
        
           	ec.Unlock() 
        
           	return errdefs.Conflict(fmt.Errorf("Error: Exec command %s is already running", ec.ID)) 
        
           }

In my case, I would like to attach to a single ContainerExec instance multiple times, but currently there doesn't seem to be a way to do that. I'm only calling ContainerExecAttach once and not ContainerExecStart. However, since internally the docker daemon triggers the same Start for both, I'm getting the same errors when I try to attach to a ContainerExec multiple times.

IIRC, ContainerExecAttach also performs the start, to prevent the initial output from the process from being missed, but I'd have to dig in git history to confirm that

If @thaJeztah 's comment above is true, could someone point me to what I would need to do to resolve this properly (by making start idempotent, I'd guess)?

I'm more than glad to submit a PR to fix this.

thaJeztah added the area/api label Jul 20, 2021

xNinjaKittyx mentioned this issue Apr 22, 2024

fix: avoid race condition by not running exec/start twice containrrr/watchtower#1966

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling ContainerExecAttach and ContainerExecStart will lead to unnoticed race condition #42408

Calling ContainerExecAttach and ContainerExecStart will lead to unnoticed race condition #42408

KnisterPeter commented May 23, 2021

mieubrisse commented Jul 20, 2021 •

edited

thaJeztah commented Jul 20, 2021

mieubrisse commented Jul 20, 2021

thaJeztah commented Jul 20, 2021

mieubrisse commented Jul 20, 2021

iojcde commented Nov 12, 2023 •

edited

Calling ContainerExecAttach and ContainerExecStart will lead to unnoticed race condition #42408

Calling ContainerExecAttach and ContainerExecStart will lead to unnoticed race condition #42408

Comments

KnisterPeter commented May 23, 2021

mieubrisse commented Jul 20, 2021 • edited

thaJeztah commented Jul 20, 2021

mieubrisse commented Jul 20, 2021

thaJeztah commented Jul 20, 2021

mieubrisse commented Jul 20, 2021

iojcde commented Nov 12, 2023 • edited

mieubrisse commented Jul 20, 2021 •

edited

iojcde commented Nov 12, 2023 •

edited