Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockerd segfaults on rollback after a constraint has been set #42783

Open
akiuni opened this issue Aug 25, 2021 · 3 comments
Open

dockerd segfaults on rollback after a constraint has been set #42783

akiuni opened this issue Aug 25, 2021 · 3 comments
Labels
area/swarm kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/20.10

Comments

@akiuni
Copy link

akiuni commented Aug 25, 2021


BUG REPORT INFORMATION

In swarm mode, dockerd segfaults on query "POST rollback=previous" to rollback a constraint.

Steps to reproduce the issue:
I've written a pytest from aiodocker lib to reproduce the bug :

git clone https://github.com/aio-libs/aiodocker.git
python -m venv testenv
source testenv/bin/activate
pip install -r requirements.txt  # ( you can remove pkg_resource from requirements if needed )
cp test_crash.py aiodocker/tests
cd aiodocker
python -m pytest -svv tests/test_crash.py
# ( patch in case you get an error about IP addresses ) :
# sed -i 's/swarm\.init()/swarm.init(advertise_addr="127.0.0.1")/g' tests/conftest.py

Describe the results you received:

dockerd crashes and corrupts the swarm.

Describe the results you expected:

In the test, the constraint should have been removed properly.

Additional information you deem important (e.g. issue happens only occasionally):

If I run actions manually, dockerd doesn't crash
In the test, I add labels before the set constaints, the test passes

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.5
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        55c4c88
 Built:             Tue Mar  2 20:17:50 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.5
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       363e9a8
  Built:            Tue Mar  2 20:15:47 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.4
  GitCommit:        05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
  Version:          1.0.0-rc93
  GitCommit:        12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)

Server:
 Containers: 46
  Running: 0
  Paused: 0
  Stopped: 46
 Images: 220
 Server Version: 20.10.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.19.0-16-amd64
 Operating System: Debian GNU/Linux 10 (buster)
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 9.76GiB
 Name: rs
 ID: D7Y5:ZP6N:XKNZ:FUVT:QMB2:YNPL:F4NF:K2JJ:HKXY:PBVI:BLQT:4Z3C
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

I'm runing on a single node swarm bound on localhost:unix//. OS is VirtualBox 6.1 / Debian 10.9.
Also reproduced on another computer, Docker version 20.10.7, build 20.10.7-0, Ubuntu 20.04.1, no virtual box.

Bug appears when dockerd is started via systemd but also manually with :
/usr/bin/dockerd -D -H unix:// --containerd=/run/containerd/containerd.sock
I attach the ouput log

20210817_crash_dockerd.zip

@cpuguy83
Copy link
Member

Can you paste just the stack where the panic is? I'm not inclined to open a random zip.

@akiuni
Copy link
Author

akiuni commented Aug 26, 2021

Yes, here it is :

DEBU[2021-08-25T09:31:06.772883836+02:00] update of service 3dng8kmyrtsolnwselk1xkc2x complete  module=node node.id=3ktr2ump2nz880uidwd6378mj
DEBU[2021-08-25T09:31:06.923060082+02:00] Calling GET /v1.41/services/3dng8kmyrtsolnwselk1xkc2x 
DEBU[2021-08-25T09:31:06.924018122+02:00] Calling GET /v1.41/services/3dng8kmyrtsolnwselk1xkc2x 
DEBU[2021-08-25T09:31:06.925093622+02:00] Calling POST /v1.41/services/3dng8kmyrtsolnwselk1xkc2x/update?version=24&rollback=previous 
DEBU[2021-08-25T09:31:06.925202085+02:00] form data: {"Labels":{},"Mode":{"Replicated":{"Replicas":1}},"Name":"testcrash","TaskTemplate":{"ContainerSpec":{"Args":["ping","localhost"],"Image":"alpine","Isolation":"default"},"ForceUpdate":0,"Placement":{"Constraints":["node.role==manager"]},"Runtime":"container"}} 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x559d96dd26f6]
goroutine 845 [running]:
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator.nodeMatches(0xc002150b40, 0xc002150c80, 0x0)
        /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/task.go:122 +0x36
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator.IsTaskDirty(0xc002150b40, 0xc0017f70e0, 0xc002150c80, 0x559d9547116a)
        /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/task.go:86 +0x3f7
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/update.(*Updater).isTaskDirty(0xc00213b900, 0xc0017f70e0, 0x0)
        /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/update/updater.go:530 +0x9e
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/update.(*Updater).isSlotDirty(...)
        /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/update/updater.go:534
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/update.(*Updater).Run(0xc00213b900, 0x559d9811d840, 0xc000eec5c0, 0xc002139fe0, 0x1, 0x1)
        /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/update/updater.go:138 +0xd99
github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/update.(*Supervisor).Update.func1(0xc00213b900, 0x559d9811d840, 0xc000eec5c0, 0xc002139fe0, 0x1, 0x1, 0xc000c90100, 0xc001234b01, 0x19)
        /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/update/updater.go:66 +0x63
created by github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/update.(*Supervisor).Update
        /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/manager/orchestrator/update/updater.go:65 +0x200

@akiuni akiuni changed the title dockerd segfaults when on query POST rollback dockerd segfaults on rollback after a constraint has been set Aug 26, 2021
@cpuguy83 cpuguy83 added area/swarm kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/20.10 labels Aug 31, 2021
@akerouanton
Copy link
Member

This is a duplicate of #37883. It's been reported in swarmkit repo and a PR is ready to be merged, cf. moby/swarmkit#2947 and moby/swarmkit#2950.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/swarm kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/20.10
Projects
None yet
Development

No branches or pull requests

3 participants