Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling ContainerExecAttach and ContainerExecStart will lead to unnoticed race condition #42408

Open
KnisterPeter opened this issue May 23, 2021 · 6 comments
Labels

Comments

@KnisterPeter
Copy link

Description

There is race condition calling the client api functions ContainerExecAttach and ContainerExecStart.
Depending on the exec state a call to this APIs errors in docker daemon but this is not reflected on the docker client side.

If welcome I can create a PR which changes this behavior.

Steps to reproduce the issue:

  1. Use the docker client go API
  2. Create an ExecCommand with ContainerExecCreate
  3. Run ContainerExecAttach
  4. Run ContainerExecStart

Describe the results you received:

The above call order is invalid, since ContainerExecAttach will implicitly start the execution and calling ContainerExecStart afterwards will run successfully.
But in some cases the docker daemon responds with Error: exec command xxx is already running.

Describe the results you expected:

The client should check if a command is already running prior to starting the command and fail if the call would be invalid, because the execution is already running.

Additional information you deem important (e.g. issue happens only occasionally):

This may sound like a nitpick according to the API, but I think it would be quite helpful to allow to use the API in the expected way.
Since the docker daemon returns an error, the client should return an error as well, either by checking upfront if the call is valid at all or by returning the error from the daemon.

Since this is a race condition (only happens sometimes), it was a hard to understand issue in act.

Output of docker version:

Docker version 20.10.5+dfsg1, build 55c4c88

Output of docker info:

Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 4
  Running: 4
  Paused: 0
  Stopped: 0
 Images: 44
 Server Version: 20.10.5+dfsg1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1.4.4~ds1-2
 runc version: 1.0.0~rc93+ds1-3
 init version: 
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.0-6-amd64
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 30.99GiB
 Name: lapi
 ID: 3OSF:JB54:EK2M:IUK2:PNQT:6HWC:2ULI:CWZM:BFNU:I5FV:6OTN:UTD3
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: Support for cgroup v2 is experimental

Additional environment details (AWS, VirtualBox, physical, etc.):

Using debian linux.

@mieubrisse
Copy link

mieubrisse commented Jul 20, 2021

Huge +1 to this - I've been doing the same create/attach/start call, had no idea that Attach actually started the command, and have my Execs fail very occasionally (maybe once very 20 runs), and only on CI where there's a larger lag between client & the Docker engine.

Extra weirdness about this: when this bug shows up, the ContainerExecStart command doesn't return an error. Instead, the "Exec command XXXXXX is already running" shows up in the STDOUT of the exec command.

@thaJeztah
Copy link
Member

What's proposed here would be the same as doing a ContainerExecInspect before calling ContainerExecStart. In the situation described above (ContainerExecAttach and ContainerExecStart called after each other), I don't think that would help;

  1. ContainerExecAttach is called, which starts and attaches
  2. ContainerExecInspect is called, which would have the same race-condition as ContainerExecStart in the existing situation, so would randomly return "exec is not started" or "exec started"
  3. in case 2. returned "exec is not started", ContainerExecStart is executed, and (again) result in either a "success" or "exec already started"

The only possible alternative could be to make ContainerExecStart idempotent, and in case the exec was already started to (instead of returning an error) to just "attach", but not sure that's worth changing.

@mieubrisse
Copy link

Making ContainerExecStart idempotent would make a lot of sense IMO - either that, or making ContainerExecAttach not start the exec (though guessing that would break everything for everybody so it's not an option)

@thaJeztah
Copy link
Member

IIRC, ContainerExecAttach also performs the start, to prevent the initial output from the process from being missed, but I'd have to dig in git history to confirm that

@mieubrisse
Copy link

Yep, it does perform the start (that's what causes this race condition)! It makes sense why it might do it that way, so making Start idempotent sounds like the best thing IMO

@iojcde
Copy link

iojcde commented Nov 12, 2023

Would just removing these lines be enough to fix the issue? I'd like to make sure there wouldn't be other issues if I do that with a PR.

moby/daemon/exec.go

Lines 169 to 172 in 3b423ea

if ec.Running {
ec.Unlock()
return errdefs.Conflict(fmt.Errorf("Error: Exec command %s is already running", ec.ID))
}

In my case, I would like to attach to a single ContainerExec instance multiple times, but currently there doesn't seem to be a way to do that. I'm only calling ContainerExecAttach once and not ContainerExecStart. However, since internally the docker daemon triggers the same Start for both, I'm getting the same errors when I try to attach to a ContainerExec multiple times.

IIRC, ContainerExecAttach also performs the start, to prevent the initial output from the process from being missed, but I'd have to dig in git history to confirm that

If @thaJeztah 's comment above is true, could someone point me to what I would need to do to resolve this properly (by making start idempotent, I'd guess)?

I'm more than glad to submit a PR to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants