Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to create a netlink handle: failed to set into network namespace xx while creating netlink socket #33656

Closed
scott-wood-vgh opened this issue Jun 13, 2017 · 18 comments

Comments

Projects
None yet
7 participants
@scott-wood-vgh
Copy link

commented Jun 13, 2017


BUG REPORT INFORMATION

Use the commands below to provide key information from your environment:
You do NOT have to include this information if this is a FEATURE REQUEST
-->

Description
No container will start due to a failure to initialize the socket.

When a docker image is pulled, attempting to run/start it yields:

docker: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:334: running prestart hook 0 caused "error running hook: exit status 1, stdout: , stderr: time=\"2017-06-13T13:05:42Z\" level=fatal msg=\"failed to create a netlink handle: failed to set into network namespace 21 while creating netlink socket: invalid argument\" \n"".

Steps to reproduce the issue:

  1. Download image
  2. Attempt to start any container with sudo docker run, even hello-world

Describe the results you received:

Container will not start

Describe the results you expected:

Container should start with no issues.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:14:09 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e
 Built:        Mon Mar 27 17:14:09 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 13
 Running: 0
 Paused: 0
 Stopped: 13
Images: 2
Server Version: 17.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-78-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.795 GiB
Name: ip-10-11-5-206
ID: VRSZ:FZGN:B2CZ:7CGR:36PM:RDXC:KB3I:ET4N:ZPX6:4BOH:7S7O:RFRU
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://awsprodproxy.company.net:3128/
Https Proxy: http://awsprodproxy.company.net:3128/
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
Running on AWS EC2 and pulling image from AWS ECR, Ubuntu 16.04 image.

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jun 13, 2017

Are you having this issue with any image, or just that particular image? If the latter, could you provide a minimal image (or Dockerfile) to reproduce the issue?

@scott-wood-vgh

This comment has been minimized.

Copy link
Author

commented Jun 13, 2017

@thaJeztah Any image. It seems to be related to Docker starting containers in general. Even the hello-world container fails to run.

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jun 13, 2017

Can you try running the check-config script to see if there's anything standing out in the host configuration and kernel? https://github.com/moby/moby/blob/master/contrib/check-config.sh

@scott-wood-vgh

This comment has been minimized.

Copy link
Author

commented Jun 13, 2017

Here are the results of the config - most looks typical:

warning: /proc/config.gz does not exist, searching other paths for kernel config ...
info: reading kernel config from /boot/config-4.4.0-78-generic ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: missing
    (cgroup swap accounting is currently not enabled, you can enable it by setting boot option "swapaccount=1")
- CONFIG_LEGACY_VSYSCALL_EMULATE: enabled
- CONFIG_MEMCG_KMEM: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled (as module)
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled (as module)
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled (as module)
  - "ipvlan":
    - CONFIG_IPVLAN: enabled (as module)
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: enabled (as module)
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_FTP: enabled (as module)
    - CONFIG_NF_NAT_TFTP: enabled (as module)
    - CONFIG_NF_CONNTRACK_TFTP: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: enabled (as module)
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000
@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jun 13, 2017

Yup, nothing really standing out at a glance; did log files provide any more information?

@scott-wood-vgh

This comment has been minimized.

Copy link
Author

commented Jun 13, 2017

Here's the output on startup - basically repeating the same message on startup repeatedly after fetching the image successfully.

Jun 13 12:29:53 ip-10-11-5-206 systemd[1]: Starting Docker Application Container Engine...
Jun 13 12:30:16 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:16.402747180Z" level=info msg="libcontainerd: new containerd process, pid: 1222"
Jun 13 12:30:17 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:17.906426753Z" level=info msg="libcontainerd: new containerd process, pid: 1251"
Jun 13 12:30:21 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:21.163718568Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
Jun 13 12:30:21 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:21.986590347Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Jun 13 12:30:21 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:21.986822180Z" level=warning msg="Your kernel does not support swap memory limit"
Jun 13 12:30:21 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:21.986892724Z" level=warning msg="Your kernel does not support cgroup rt period"
Jun 13 12:30:21 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:21.986908217Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Jun 13 12:30:21 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:21.987716965Z" level=info msg="Loading containers: start."
Jun 13 12:30:23 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:23.835525019Z" level=info msg="Firewalld running: false"
Jun 13 12:30:25 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:25.558884423Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon
Jun 13 12:30:25 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:25.604576628Z" level=info msg="Loading containers: done."
Jun 13 12:30:28 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:28.935787857Z" level=info msg="Daemon has completed initialization"
Jun 13 12:30:28 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:28.935827245Z" level=info msg="Docker daemon" commit=c6d412e graphdriver=overlay2 version=17.03.1-ce
Jun 13 12:30:28 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:30:28.943127184Z" level=info msg="API listen on /var/run/docker.sock"
Jun 13 12:30:28 ip-10-11-5-206 systemd[1]: Started Docker Application Container Engine.
Jun 13 12:46:50 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:46:50.651623340Z" level=error msg="Handler for POST /v1.27/containers/create returned error: No such image: 6466
Jun 13 12:47:25 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:47:25.790633358Z" level=error msg="containerd: start container" error="oci runtime error: container_linux.go:247
Jun 13 12:47:25 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:47:25.795562137Z" level=error msg="Create container failed with error: oci runtime error: container_linux.go:247
Jun 13 12:47:25 ip-10-11-5-206 dockerd[985]: time="2017-06-13T12:47:25.913058721Z" level=error msg="Handler for POST /v1.27/containers/4326ea64258a621a09c05b59828d730dcd50d3853b
Jun 13 13:02:48 ip-10-11-5-206 dockerd[985]: time="2017-06-13T13:02:48.043987085Z" level=error msg="containerd: start container" error="oci runtime error: container_linux.go:247
Jun 13 13:02:48 ip-10-11-5-206 dockerd[985]: time="2017-06-13T13:02:48.050033353Z" level=error msg="Create container failed with error: oci runtime error: container_linux.go:247
Jun 13 13:02:48 ip-10-11-5-206 dockerd[985]: time="2017-06-13T13:02:48.161058818Z" level=error msg="Handler for POST /v1.27/containers/4a7f355ea445c15352924a5a3c5bc50a426d9996a7
Jun 13 13:02:49 ip-10-11-5-206 dockerd[985]: time="2017-06-13T13:02:49.851664678Z" level=error msg="containerd: start container" error="oci runtime error: container_linux.go:247
Jun 13 13:02:49 ip-10-11-5-206 dockerd[985]: time="2017-06-13T13:02:49.857030670Z" level=error msg="Create container failed with error: oci runtime error: container_linux.go:247
Jun 13 13:02:49 ip-10-11-5-206 dockerd[985]: time="2017-06-13T13:02:49.961077047Z" level=error msg="Handler for POST /v1.27/containers/f9072a7f989f87074d8b98a1ebe70c54227615a4f7
Jun 13 13:02:51 ip-10-11-5-206 dockerd[985]: time="2017-06-13T13:02:51.140658532Z" level=error msg="containerd: start container" error="oci runtime error: container_linux.go:247
@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jun 13, 2017

The error is output from here; https://github.com/moby/moby/blob/v17.03.1-ce/vendor/github.com/vishvananda/netlink/nl/nl_linux.go#L496-L532

I do see an update to that package was recently merged; 5fd65a6 (#33555),
which was brought in upstream through vishvananda/netlink#234

Not sure if it's possibly related; let me ping @fcrisciani who may be able to tell

@scott-wood-vgh

This comment has been minimized.

Copy link
Author

commented Jun 13, 2017

Thanks for the extra info. It seems to me that that network namespace number is quite low - is there a chance it is already reserved by the OS for some other process?

@vincepii

This comment has been minimized.

Copy link

commented Jun 19, 2017

We're hitting the same issue.

@scott-wood-vgh any progress in determining the cause?

@scott-wood-vgh

This comment has been minimized.

Copy link
Author

commented Jun 19, 2017

@vincepii Unfortunately not. I had previously had success re-imaging a new Amazon AMI with Docker, but that may have been a one-off success. What's odd is that sometimes it does succeed in launching a container, but increasingly rarely.

@vincepii

This comment has been minimized.

Copy link

commented Jun 21, 2017

Our issue was actually not related to docker, it was an external component interfering with docker's ability to use the network stack properly on Ubuntu. Unfortunately I don't have more detailed information than this.

@rommik

This comment has been minimized.

Copy link

commented Jul 5, 2017

I just hit the same issue. Suddenly a newly attached worker started to fail with the same error as in the original post. Only one of 16 nodes.

I believe the issue is in the overlay network. All my containers are united in 1 network called proxy. I use Docker-flow-proxy. By default, the overlay network's subnet mask was /24 which allows 256 (I think) hosts. With 16 servers, I've outgrown the limit a long time ago. I had to extend the network , but I made a mistake and set a new subnet to /20 (1046 hosts). Tonight, I reset the network again this time /16 (64k hosts). This fixed my issue.

PS: I just ran a count and I have 593 services in my Docker Swarm. Each Service requires at least 2 IP (one service IP and one for each container in that service) so yeah, I was over /20 limit today.

@mrchoke

This comment has been minimized.

Copy link

commented Oct 5, 2017

I just hit the same issue.

Ubuntu 16.04.3 LTS (vmware vm )

docker version

Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:18 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:40:56 2017
 OS/Arch:      linux/amd64
 Experimental: false

when I run container it alway error

docker: Error response from daemon: invalid header field value "oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:334: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: time=\\\\\\\"2017-10-05T18:14:49+07:00\\\\\\\" level=fatal msg=\\\\\\\"failed to create a netlink handle: failed to set into network namespace 20 while creating netlink socket: invalid argument\\\\\\\" \\\\n\\\"\"\n".

My IT support suggest me to stop ds_agent service.

I do

$ sudo service  ds_agent stop

and run container again. It work for me.

@mrchoke

This comment has been minimized.

Copy link

commented Oct 5, 2017

My screen record to confirm.

ds_agent

@taylor2464

This comment has been minimized.

Copy link

commented Nov 2, 2017

I can confirm what @mrchoke said about stopping the ds_agent service. On SUSE 12, Linux 4.4.74-92.35, when running:

sudo docker run hello-world

I got:

container_linux.go:247: starting container process caused "process_linux.go:332: running prestart hook 0 caused "error running hook: exit status 1, stdout: , stderr: time=\"2017-11-02T14:56:57-04:00\" level=info msg=\"SUSE:secrets :: enabled\" \ntime=\"2017-11-02T14:56:57-04:00\" level=fatal msg=\"failed to create a netlink handle: failed to set into network namespace 20 while creating netlink socket: invalid argument\" \n""
docker: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:332: running prestart hook 0 caused "error running hook: exit status 1, stdout: , stderr: time=\"2017-11-02T14:56:57-04:00\" level=info msg=\"SUSE:secrets :: enabled\" \ntime=\"2017-11-02T14:56:57-04:00\" level=fatal msg=\"failed to create a netlink handle: failed to set into network namespace 20 while creating netlink socket: invalid argument\" \n"".
ERRO[0000] error getting events from daemon: net/http: request canceled

Stopping the ds_agent service let it run with no problem.

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Nov 2, 2017

@scott-wood-vgh would the above problem apply to your situation as well? (i.e. do you have ds_agent running?)

@scott-wood-vgh

This comment has been minimized.

Copy link
Author

commented Nov 3, 2017

@thaJeztah Yes! I have not seen this problem in a while now, but now that you mention it, we were running ds_agent on these boxes before we moved to exclude them from Deep Security for other business reasons. I believe these events must be correlated.

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Nov 3, 2017

Thanks for the update @scott-wood-vgh. With that information, this does not appear to be a bug in Docker/Moby, but caused by virus-scanning software interfering with docker. If you are running into this, I recommend opening a ticket with trend-micro.

I'm closing this issue because of the above, but feel free to discuss

@thaJeztah thaJeztah closed this Nov 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.