New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error response from daemon: rpc error: code = 9 desc = service needs ingress network, but no ingress network is present #33420

Closed
adityacs opened this Issue May 27, 2017 · 17 comments

Comments

Projects
None yet
7 participants
@adityacs

adityacs commented May 27, 2017

Description

Port forwarding fails on docker service create with "Error response from daemon: rpc error: code = 9 desc = service needs ingress network, but no ingress network is present"

Steps to reproduce the issue:
Facing this issue on 10 node docker swarm mode cluster. Not sure how to reproduce

Describe the results you received:

Error response from daemon: rpc error: code = 9 desc = service needs ingress network, but no ingress network is present
screenshot from 2017-05-27 19 24 08

Describe the results you expected:

create service

Output of docker version:

Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Wed May 10 22:45:11 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Wed May 10 22:45:11 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Containers: 7
 Running: 0
 Paused: 0
 Stopped: 7
Images: 6
Server Version: 17.05.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: l0j7jl95mzzvrc7puwccyhuik
 Is Manager: true
 ClusterID: k1ww6j2k2nagulfphn93doa2v
 Managers: 4
 Nodes: 9
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 172.18.2.127
 Manager Addresses:
  172.18.2.127:2377
  172.18.2.133:2377
  172.18.2.139:2377
  172.18.2.147:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: v0.13.2 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  Profile: default
 selinux
Kernel Version: 4.11.0-coreos
Operating System: Container Linux by CoreOS 1409.0.0 (Ladybug)
OSType: linux
Architecture: x86_64
CPUs: 3
Total Memory: 7.801GiB
Name: swarmd-1.igloo.vztelematics.in
ID: MPWK:IK7C:MQQJ:LBVI:M7AD:VPTO:6OGV:QTQ6:CZPP:PLCW:LJNJ:FHRR
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 registry.igloo.vztelematics.in:5000
 127.0.0.0/8
Live Restore Enabled: false

coreos 1409.0.0

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 May 27, 2017

Contributor

Sounds like you've removed your ingress network.
You can recreate it with docker network create --ingress --driver overlay ingress

Contributor

cpuguy83 commented May 27, 2017

Sounds like you've removed your ingress network.
You can recreate it with docker network create --ingress --driver overlay ingress

@adityacs

This comment has been minimized.

Show comment
Hide comment
@adityacs

adityacs May 27, 2017

I have not deleted any network. When I did "docker network ls" ingress was there in the list.

adityacs commented May 27, 2017

I have not deleted any network. When I did "docker network ls" ingress was there in the list.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 29, 2017

Member

Is it possible you ran docker system prune when no services were running?

Could you provide

  • the output of docker network ls
  • the output of docker network inspect ingress

Does it work after you re-create the ingress network (using the steps @cpuguy83 provided?)

Member

thaJeztah commented May 29, 2017

Is it possible you ran docker system prune when no services were running?

Could you provide

  • the output of docker network ls
  • the output of docker network inspect ingress

Does it work after you re-create the ingress network (using the steps @cpuguy83 provided?)

@adityacs

This comment has been minimized.

Show comment
Hide comment
@adityacs

adityacs May 29, 2017

After getting above issue, I ran docker network prune which removed ingress network.
Later, I recreated ingress network and it worked fine.

adityacs commented May 29, 2017

After getting above issue, I ran docker network prune which removed ingress network.
Later, I recreated ingress network and it worked fine.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 29, 2017

Member

Ok, I suspect what happened, is that this was a version of docker that was upgraded from an older version. Older versions marked "ingress" networks in a different way, and there was a bug in 17.05 which prevented docker from recognising such networks. As a result, docker network prune (or docker system prune) removed the ingress network if no container / service was attached to it.

That situation will be resolved in docker 17.06 (see #33286). If I'm correct, an ingress network created with docker 17.05 will not be removed by docker network prune, so if you would run docker network prune again, the network should not be removed.

I'll go ahead and close this issue, because this will be addressed in docker 17.06, but feel free to continue the conversation 👍

Member

thaJeztah commented May 29, 2017

Ok, I suspect what happened, is that this was a version of docker that was upgraded from an older version. Older versions marked "ingress" networks in a different way, and there was a bug in 17.05 which prevented docker from recognising such networks. As a result, docker network prune (or docker system prune) removed the ingress network if no container / service was attached to it.

That situation will be resolved in docker 17.06 (see #33286). If I'm correct, an ingress network created with docker 17.05 will not be removed by docker network prune, so if you would run docker network prune again, the network should not be removed.

I'll go ahead and close this issue, because this will be addressed in docker 17.06, but feel free to continue the conversation 👍

@thaJeztah thaJeztah closed this May 29, 2017

@adityacs

This comment has been minimized.

Show comment
Hide comment
@adityacs

adityacs May 29, 2017

@thaJeztah Will try network prune.

I am still confused which version of docker to use. In any of the docker versions after 1.12 some thing will break.

Could you please let me know which is the most stable version for use in production for docker swarm mode?

adityacs commented May 29, 2017

@thaJeztah Will try network prune.

I am still confused which version of docker to use. In any of the docker versions after 1.12 some thing will break.

Could you please let me know which is the most stable version for use in production for docker swarm mode?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 30, 2017

Member

In any of the docker versions after 1.12 some thing will break.

Can you provide information what breaks?

Could you please let me know which is the most stable version for use in production for docker swarm mode?

The 17.03.x and 17.06.x (to be released soon), are "quarterly" releases, which have a 4-month support cycle for critical bug fixes (community edition), and are supported for 1 year (enterprise edition)

Member

thaJeztah commented May 30, 2017

In any of the docker versions after 1.12 some thing will break.

Can you provide information what breaks?

Could you please let me know which is the most stable version for use in production for docker swarm mode?

The 17.03.x and 17.06.x (to be released soon), are "quarterly" releases, which have a 4-month support cycle for critical bug fixes (community edition), and are supported for 1 year (enterprise edition)

@adityacs

This comment has been minimized.

Show comment
Hide comment
@adityacs

adityacs May 30, 2017

Can you provide information what breaks?

  • In 1.13, I faced issues with selinux #31255.
  • In 17.05-ce I faced above issue #33420.

Will try out 17.06. Hope we can use it in prod :)

Thanks for the help.

adityacs commented May 30, 2017

Can you provide information what breaks?

  • In 1.13, I faced issues with selinux #31255.
  • In 17.05-ce I faced above issue #33420.

Will try out 17.06. Hope we can use it in prod :)

Thanks for the help.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 31, 2017

Member

#31255 actually is an issue with docker 1.12.4. Note that Docker 17.03.0 is actually what 1.13.2 would have been, so no functional changes from 1.13.0, only bug fixes. For production, I'd recommend installing the "quarterly" releases, unless upgrading once a month is no problem for your situation. For longer support cycles, Docker EE provides 1 year of support (bug fixes and security fixes), so may be worth considering for production

Member

thaJeztah commented May 31, 2017

#31255 actually is an issue with docker 1.12.4. Note that Docker 17.03.0 is actually what 1.13.2 would have been, so no functional changes from 1.13.0, only bug fixes. For production, I'd recommend installing the "quarterly" releases, unless upgrading once a month is no problem for your situation. For longer support cycles, Docker EE provides 1 year of support (bug fixes and security fixes), so may be worth considering for production

@adityacs

This comment has been minimized.

Show comment
Hide comment
@adityacs

adityacs May 31, 2017

For longer support cycles, Docker EE provides 1 year of support (bug fixes and security fixes), so may be worth considering for production

Will look into this.

adityacs commented May 31, 2017

For longer support cycles, Docker EE provides 1 year of support (bug fixes and security fixes), so may be worth considering for production

Will look into this.

@cabloo

This comment has been minimized.

Show comment
Hide comment
@cabloo

cabloo Jun 17, 2017

@thaJeztah can confirm your suspicion here, just ran into the same issue on my dev environment: installed a swarm in the past, upgraded Docker and ran docker system prune while the swarm was offline, brought the swarm back up and got this. docker network create --ingress --driver overlay ingress resolved the issue.

cabloo commented Jun 17, 2017

@thaJeztah can confirm your suspicion here, just ran into the same issue on my dev environment: installed a swarm in the past, upgraded Docker and ran docker system prune while the swarm was offline, brought the swarm back up and got this. docker network create --ingress --driver overlay ingress resolved the issue.

@yunghoy

This comment has been minimized.

Show comment
Hide comment
@yunghoy

yunghoy Aug 1, 2017

The method @cabloo suggested can remove the problem. It still happens in 17.06.0-ce.

yunghoy commented Aug 1, 2017

The method @cabloo suggested can remove the problem. It still happens in 17.06.0-ce.

@johnjelinek

This comment has been minimized.

Show comment
Hide comment
@johnjelinek

johnjelinek Feb 7, 2018

@thaJeztah: I'm having this issue upgrading our swarm from 17.09 to 17.12. All the nodes are now 17.12, but I can't seem to rebuild the ingress network.

$ docker service create --name=swarm-visualizer \
  --publish=8080:8080/tcp \
  --constraint=node.role==manager \
  --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
  dockersamples/visualizer

Error response from daemon: rpc error: code = FailedPrecondition desc = service needs ingress network, but no ingress network is present
$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
68bb9e73c8db        bridge              bridge              local
315e6882961f        docker_gwbridge     bridge              local
d972764776bb        host                host                local
2aittxoqcj4c        ingress             overlay             swarm
64614b19d7aa        none                null                local
js7axcrq1ddj        proxy               overlay             swarm
whthoj3zcugc        swarmpit_net        overlay             swarm
$ docker network rm ingress
WARNING! Before removing the routing-mesh network, make sure all the nodes in your swarm run the same docker engine version. Otherwise, removal may not be effective and functionality of newly create ingress networks will be impaired.
Are you sure you want to continue? [y/N] y
ingress
$ docker network create --ingress --driver overlay ingress
Error response from daemon: network with name ingress already exist

So, it doesn't really remove the ingress network.

johnjelinek commented Feb 7, 2018

@thaJeztah: I'm having this issue upgrading our swarm from 17.09 to 17.12. All the nodes are now 17.12, but I can't seem to rebuild the ingress network.

$ docker service create --name=swarm-visualizer \
  --publish=8080:8080/tcp \
  --constraint=node.role==manager \
  --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
  dockersamples/visualizer

Error response from daemon: rpc error: code = FailedPrecondition desc = service needs ingress network, but no ingress network is present
$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
68bb9e73c8db        bridge              bridge              local
315e6882961f        docker_gwbridge     bridge              local
d972764776bb        host                host                local
2aittxoqcj4c        ingress             overlay             swarm
64614b19d7aa        none                null                local
js7axcrq1ddj        proxy               overlay             swarm
whthoj3zcugc        swarmpit_net        overlay             swarm
$ docker network rm ingress
WARNING! Before removing the routing-mesh network, make sure all the nodes in your swarm run the same docker engine version. Otherwise, removal may not be effective and functionality of newly create ingress networks will be impaired.
Are you sure you want to continue? [y/N] y
ingress
$ docker network create --ingress --driver overlay ingress
Error response from daemon: network with name ingress already exist

So, it doesn't really remove the ingress network.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Feb 7, 2018

Member

@johnjelinek could it be there were still tasks lingering around that were using the ingress network? Anything in the logs that could be helpful?

Member

thaJeztah commented Feb 7, 2018

@johnjelinek could it be there were still tasks lingering around that were using the ingress network? Anything in the logs that could be helpful?

@johnjelinek

This comment has been minimized.

Show comment
Hide comment
@johnjelinek

johnjelinek Feb 7, 2018

nah, previously when I ran docker network rm ingress, it would complain about a service that depended on it (like swarm-visualizer). So I removed all those services first, and now I'm trying to docker network rm ingress, and it won't go away, and yet I can't bring the swarm-visualizer back.

johnjelinek commented Feb 7, 2018

nah, previously when I ran docker network rm ingress, it would complain about a service that depended on it (like swarm-visualizer). So I removed all those services first, and now I'm trying to docker network rm ingress, and it won't go away, and yet I can't bring the swarm-visualizer back.

@johnjelinek

This comment has been minimized.

Show comment
Hide comment
@johnjelinek

johnjelinek Feb 7, 2018

I saw these errors in the logs, could that be related?

Feb 07 18:54:22 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:22.661441266Z" level=error msg="Failed to deserialize netlink ndmsg: Link not found"
Feb 07 18:54:23 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:23.908867756Z" level=info msg="memberlist: Suspect 2a3ee67b8d68 has failed, no acks received"
Feb 07 18:54:48 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:48.005063849Z" level=info msg="NetworkDB stats dfw-dev-swm-m-1(8cafedcba7b1) - netID:2aittxoqcj4cne658awt4q596 leaving:false netPeers:2 entries:4 Queue qLen:0 n
Feb 07 18:54:48 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:48.005700612Z" level=info msg="NetworkDB stats dfw-dev-swm-m-1(8cafedcba7b1) - netID:whthoj3zcugc5qq9trcueqz8y leaving:false netPeers:5 entries:12 Queue qLen:0
Feb 07 18:54:50 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:50.646988639Z" level=info msg="Node 7681dc520428/172.20.96.132, joined gossip cluster"
Feb 07 18:54:50 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:50.647536681Z" level=info msg="Node 7681dc520428/172.20.96.132, is the new incarnation of the failed node 2a3ee67b8d68/172.20.96.132"
Feb 07 18:54:50 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:50.647830092Z" level=info msg="Node 2a3ee67b8d68 change state NodeFailed --> NodeLeft"
Feb 07 18:54:50 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:50.648100530Z" level=info msg="Node 7681dc520428/172.20.96.132, added to nodes list"
Feb 07 18:55:00 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:55:00.148770834Z" level=error msg="Failed to deserialize netlink ndmsg: Link not found"
Feb 07 18:55:00 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:55:00.149299708Z" level=error msg="Failed to deserialize netlink ndmsg: Link not found"
Feb 07 18:55:00 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:55:00.149556987Z" level=error msg="Failed to deserialize netlink ndmsg: Link not found"

johnjelinek commented Feb 7, 2018

I saw these errors in the logs, could that be related?

Feb 07 18:54:22 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:22.661441266Z" level=error msg="Failed to deserialize netlink ndmsg: Link not found"
Feb 07 18:54:23 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:23.908867756Z" level=info msg="memberlist: Suspect 2a3ee67b8d68 has failed, no acks received"
Feb 07 18:54:48 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:48.005063849Z" level=info msg="NetworkDB stats dfw-dev-swm-m-1(8cafedcba7b1) - netID:2aittxoqcj4cne658awt4q596 leaving:false netPeers:2 entries:4 Queue qLen:0 n
Feb 07 18:54:48 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:48.005700612Z" level=info msg="NetworkDB stats dfw-dev-swm-m-1(8cafedcba7b1) - netID:whthoj3zcugc5qq9trcueqz8y leaving:false netPeers:5 entries:12 Queue qLen:0
Feb 07 18:54:50 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:50.646988639Z" level=info msg="Node 7681dc520428/172.20.96.132, joined gossip cluster"
Feb 07 18:54:50 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:50.647536681Z" level=info msg="Node 7681dc520428/172.20.96.132, is the new incarnation of the failed node 2a3ee67b8d68/172.20.96.132"
Feb 07 18:54:50 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:50.647830092Z" level=info msg="Node 2a3ee67b8d68 change state NodeFailed --> NodeLeft"
Feb 07 18:54:50 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:54:50.648100530Z" level=info msg="Node 7681dc520428/172.20.96.132, added to nodes list"
Feb 07 18:55:00 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:55:00.148770834Z" level=error msg="Failed to deserialize netlink ndmsg: Link not found"
Feb 07 18:55:00 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:55:00.149299708Z" level=error msg="Failed to deserialize netlink ndmsg: Link not found"
Feb 07 18:55:00 dfw-dev-swm-m-1 dockerd[14070]: time="2018-02-07T18:55:00.149556987Z" level=error msg="Failed to deserialize netlink ndmsg: Link not found"
@johnjelinek

This comment has been minimized.

Show comment
Hide comment
@johnjelinek

johnjelinek Feb 7, 2018

meh, ended up resorting to blowing away the swarm and starting from scratch 😖 ingress network is back on all nodes now.

johnjelinek commented Feb 7, 2018

meh, ended up resorting to blowing away the swarm and starting from scratch 😖 ingress network is back on all nodes now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment