Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to deactivate service binding for container #42975

Open
rightisleft opened this issue Oct 28, 2021 · 7 comments
Open

failed to deactivate service binding for container #42975

rightisleft opened this issue Oct 28, 2021 · 7 comments

Comments

@rightisleft
Copy link

rightisleft commented Oct 28, 2021

Description

When running docker swarm with a generic resource declared, random hosts start throwing the following errors:

$ journalctl -r -u docker.service
-- Logs begin at Tue 2021-10-26 20:40:20 UTC, end at Thu 2021-10-28 19:14:32 UTC. --
Oct 28 19:14:32 nightly-gpu-worker-2 dockerd[3323]: time="2021-10-28T19:14:32.980490822Z" level=warning msg="failed to deactivate service binding for container nightly_core_metamorph.1.7syc32uyege6dbkx
Oct 28 19:14:15 nightly-gpu-worker-2 dockerd[3323]: time="2021-10-28T19:14:15.886736675Z" level=info msg="initialized VXLAN UDP port to 4789 "
Oct 28 19:14:15 nightly-gpu-worker-2 dockerd[3323]: time="2021-10-28T19:14:15.792480608Z" level=info msg="API listen on /var/run/docker.sock"
Oct 28 19:14:15 nightly-gpu-worker-2 systemd[1]: Started Docker Application Container Engine.
Oct 28 20:40:26 nightly-gpu-worker-2 dockerd[4327]: time="2021-10-28T20:40:26.391046254Z" level=error msg="fatal task error" error="node is missing network attachments, ip addresses may be exhausted" module=node/age
Oct 28 20:40:21 nightly-gpu-worker-2 dockerd[4327]: time="2021-10-28T20:40:21.631935527Z" level=warning msg="failed to deactivate service binding for container nightly_core_metamorph.6.k2s1g1t7hf3dgo9rpqfjjsmb5" err
lines 1-63

Steps to reproduce the issue:

  1. Declare a generic resource inside docker daemon.json across multiple nodes
  2. Deploy a service using that resource as a constraint
  3. Wait

Eventually services will start failing to place with the error "assigned node no longer meets constraints"

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.10
 API version:       1.41
 Go version:        go1.16.9
 Git commit:        b485636
 Built:             Mon Oct 25 07:42:57 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true
Server: Docker Engine - Community
 Engine:
  Version:          20.10.10
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.9
  Git commit:       e2f740d
  Built:            Mon Oct 25 07:41:06 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.11
  GitCommit:        5b46e404f6b9f661a205e28d59c982d3634148f8
 nvidia:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

 $ docker info
 Client:
  Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)
  scan: Docker Scan (Docker Inc., v0.9.0)
Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 1
 Server Version: 20.10.10
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: krr8taoni6onndoople1byahy
  Is Manager: false
  Node Address: 10.125.15.245
  Manager Addresses:
   10.125.0.49:2377
   10.125.0.51:2377
   10.125.0.52:2377
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
 Default Runtime: nvidia
 Init Binary: docker-init
 containerd version: 5b46e404f6b9f661a205e28d59c982d3634148f8
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-1055-gcp
 Operating System: Ubuntu 18.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 14.65GiB
 Name: nightly-gpu-worker-2
 ID: X6R7:BCWS:YC3P:FGMR:S7SO:3JZ2:5WNQ:3FWO:56EZ:AP7A:WFJN:SU6F
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: *********
 Registry: https://index.docker.io/v1/
 Labels:
  LocalType=gpu_worker
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Ubuntu 18.04 on GCP

Looks to be related to this code:

// waitNodeAttachments validates that NetworkAttachments exist on this node

@rightisleft
Copy link
Author

The services in question have the following resource requirement:

                "Resources": {
                    "Reservations": {
                        "GenericResources": [
                            {
                                "DiscreteResourceSpec": {
                                    "Kind": "LocalGpu",
                                    "Value": 1
                                }
                            }
                        ]
                    }
                },

The node in question have the following:

            "Resources": {
                "NanoCPUs": 4000000000,
                "MemoryBytes": 15762878464,
                "GenericResources": [
                    {
                        "DiscreteResourceSpec": {
                            "Kind": "LocalGpu",
                            "Value": 1
                        }
                    }
                ]
            },

Removing the LocalGpu resource requirement from the service bypasses the issue

@rightisleft
Copy link
Author

The other oodd thing, is that this is happening on a subset of nodes

$ docker service ps --no-trunc nightly_core_metamorph
ID                          NAME                           IMAGE                                                                                                      NODE                    DESIRED STATE   CURRENT STATE                     ERROR                                                                                       PORTS
g6g1cr3f4lfgu2n8uth7099at   nightly_core_metamorph.1       scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-15   Running         Running 16 hours ago
iqs3g3gomtuhaz27170uafsru    \_ nightly_core_metamorph.1   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-4    Shutdown        Rejected 16 hours ago             "node is missing network attachments, ip addresses may be exhausted"
iqpa5ybxgfge5o6rdvv4sfnky    \_ nightly_core_metamorph.1   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-13   Shutdown        Rejected 16 hours ago             "assigned node no longer meets constraints"
py6d8zsruof2r3hn9rx05r8m1    \_ nightly_core_metamorph.1   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-13   Shutdown        Rejected 16 hours ago             "node is missing network attachments, ip addresses may be exhausted"
y6nzg1huvyn2127ihu9o99zzt    \_ nightly_core_metamorph.1   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-13   Shutdown        Rejected 16 hours ago             "node is missing network attachments, ip addresses may be exhausted"
mfwlk38a9l4mzmt3mcm8hy0pf   nightly_core_metamorph.2       scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-12   Running         Running 16 hours ago
hzsl6qrfq8ujamdeuphf36nij    \_ nightly_core_metamorph.2   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-4    Shutdown        Rejected 16 hours ago             "node is missing network attachments, ip addresses may be exhausted"
6pmpgl2g4acsubne9d43mdigv    \_ nightly_core_metamorph.2   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-11   Shutdown        Rejected 16 hours ago             "assigned node no longer meets constraints"
qhz59h9gmu5k9684b0td45f11    \_ nightly_core_metamorph.2   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-10   Shutdown        Rejected 16 hours ago             "assigned node no longer meets constraints"
q07ap6yh8cep8y8hg3706u548    \_ nightly_core_metamorph.2   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-8    Shutdown        Rejected 16 hours ago             "assigned node no longer meets constraints"
tai10i3lbs7y5ufwk3jkidax1   nightly_core_metamorph.3       scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-11   Running         Running 16 hours ago
wth2ugabc140mjeumof657ezz    \_ nightly_core_metamorph.3   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-0    Shutdown        Rejected 16 hours ago             "assigned node no longer meets constraints"
oyh63uu324efy6ra4z63xczg2    \_ nightly_core_metamorph.3   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-11   Shutdown        Rejected 16 hours ago             "error while removing network: unknown network core_network id yqjtim870rr6pq9cwkfvuwuk8"
v356iarq8ufxrytsf83dra731    \_ nightly_core_metamorph.3   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-9    Shutdown        Rejected 16 hours ago             "assigned node no longer meets constraints"
be33qk3hu50hgru4nyg9me52w    \_ nightly_core_metamorph.3   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-1    Shutdown        Rejected 16 hours ago             "node is missing network attachments, ip addresses may be exhausted"
wrluqantr0jvv1duiel7k8vel   nightly_core_metamorph.4       scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-7    Running         Running 16 hours ago
xprh6rybmy6oyexpda9net0mv    \_ nightly_core_metamorph.4   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-4    Shutdown        Rejected 16 hours ago             "node is missing network attachments, ip addresses may be exhausted"
b0lw1h9x157czj3ln54llqpph    \_ nightly_core_metamorph.4   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-8    Shutdown        Rejected 16 hours ago             "node is missing network attachments, ip addresses may be exhausted"
uqlzx5cicvlaxx1mjdake0fxk    \_ nightly_core_metamorph.4   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-1    Shutdown        Rejected 16 hours ago             "node is missing network attachments, ip addresses may be exhausted"
4j73woqv3mhh3jr6gde94lnbz    \_ nightly_core_metamorph.4   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-13   Shutdown        Rejected 16 hours ago             "node is missing network attachments, ip addresses may be exhausted"
l4w9n4g5y2t8jji7dmguotgzv   nightly_core_metamorph.5       scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-10   Running         Running 16 hours ago
kmx4vifgsvshn7lapyuve0gu9    \_ nightly_core_metamorph.5   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-15   Shutdown        Rejected 16 hours ago             "assigned node no longer meets constraints"
ofcu36kfqtv4aeig15t034ilc    \_ nightly_core_metamorph.5   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-9    Shutdown        Rejected 16 hours ago             "Failed to find a load balancer IP to use for network: yqjtim870rr6pq9cwkfvuwuk8"
odti8egocdq9iuvqp60v2c5vr    \_ nightly_core_metamorph.5   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-3    Shutdown        Rejected 16 hours ago             "assigned node no longer meets constraints"
xqfglki6kqglh0to58xro8e9y    \_ nightly_core_metamorph.5   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-7    Shutdown        Rejected 16 hours ago             "assigned node no longer meets constraints"
6yhkg3p56gdwqgfxwuhe8u0lj   nightly_core_metamorph.6       scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-9    Ready           Rejected less than a second ago   "assigned node no longer meets constraints"
7jplm9qp0682rx2jo80qhgd5l    \_ nightly_core_metamorph.6   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-9    Shutdown        Rejected 5 seconds ago            "assigned node no longer meets constraints"
ihsuozzawdhm6y134yv4xwtph    \_ nightly_core_metamorph.6   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-9    Shutdown        Rejected 5 seconds ago            "node is missing network attachments, ip addresses may be exhausted"
gkt5nt9ozy842wega09bo4w0e    \_ nightly_core_metamorph.6   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-2    Shutdown        Rejected 10 seconds ago           "node is missing network attachments, ip addresses may be exhausted"
i9ba3x7kv6csb0ip1272p9o6m    \_ nightly_core_metamorph.6   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-8    Shutdown        Rejected 15 seconds ago           "node is missing network attachments, ip addresses may be exhausted"
7vy4rob57oo7cja9h3wga5bej    \_ nightly_core_metamorph.6   scuba/core-metamorph:dev-latest@sha256:XXX   nightly-gpu-worker-1    Shutdown        Rejected 20 seconds ago           "node is missing network attachments, ip addresses may be exhausted"

@thaJeztah
Copy link
Member

@dperny ptal

@rightisleft
Copy link
Author

Is there any additional information that i can provide? This is a critical blocker for our company. Happy to provide whatever details I can.

@ffrommelt
Copy link

Hi @rightisleft ,
did you find a solution?
I am getting a similar error with a "purchased" software and do no find a clue to fix...
Frank

@mvaljento
Copy link

Our Docker networking is experiencing similar critical issues with the same warning messages after multiple stack re-deployments. Any information on this?

@jeanz6
Copy link

jeanz6 commented May 15, 2023

same problem here, but with docker rootless. any updates on this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants