New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conntrack flush support #32505

Merged
merged 3 commits into from Apr 11, 2017

Conversation

Projects
None yet
7 participants
@fcrisciani
Contributor

fcrisciani commented Apr 11, 2017

- What I did
Fixes #8795
Fixes #31610
Fixes #31414

On external connectivity removal, the bridge driver will take care of erasing all the conntrack flows relative to the endpoint that is going down

- How I did it
Introduced the conntrack support in the netlink library and then from libnetwork will cleanup the flows passing the IP address of the endpoint that is going down

- How to verify it
Added a test that creates a server container and a client container and establishes an UDP flow (using UDP for simplicity) between them.
The test verifies the presence of the flow fetching its from the conntrack table.
The server container is then destroyed and the tests validates that the previous flow got purged from conntrack.

- Description for the changelog

Conntrack flush on external connectivity removal

- A picture of a cute animal (not mandatory but encouraged)
15

fcrisciani added some commits Apr 11, 2017

Vendoring Netlink library
- Added conntrack support

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Vendoring Libnetwork library
- adding conntrack flush fix for #8795

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>

@mavenugo mavenugo added this to the 17.05.0 milestone Apr 11, 2017

@mavenugo

This comment has been minimized.

Show comment
Hide comment
@mavenugo

mavenugo Apr 11, 2017

Contributor

@fcrisciani thanks for resolving this long standing issue and a good test to catch it as well.

LGTM

Contributor

mavenugo commented Apr 11, 2017

@fcrisciani thanks for resolving this long standing issue and a good test to catch it as well.

LGTM

@vieux

This comment has been minimized.

Show comment
Hide comment
@vieux

vieux Apr 11, 2017

Collaborator

LGTM

Collaborator

vieux commented Apr 11, 2017

LGTM

@mavenugo

This comment has been minimized.

Show comment
Hide comment
@mavenugo

mavenugo Apr 11, 2017

Contributor

@fcrisciani the windowsRS1 failure seems genuine -

02:32:17 # github.com/docker/docker/vendor/github.com/vishvananda/netns
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:27: undefined: syscall.Stat_t
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:28: undefined: syscall.Fstat
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:31: undefined: syscall.Fstat
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:39: undefined: syscall.Stat_t
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:43: undefined: syscall.Fstat
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:57: cannot use int(*ns) (type int) as type syscall.Handle in argument to syscall.Close
02:32:17 FAIL	github.com/docker/docker/integration-cli [build failed]
Contributor

mavenugo commented Apr 11, 2017

@fcrisciani the windowsRS1 failure seems genuine -

02:32:17 # github.com/docker/docker/vendor/github.com/vishvananda/netns
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:27: undefined: syscall.Stat_t
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:28: undefined: syscall.Fstat
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:31: undefined: syscall.Fstat
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:39: undefined: syscall.Stat_t
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:43: undefined: syscall.Fstat
02:32:17 ..\vendor\github.com\vishvananda\netns\netns.go:57: cannot use int(*ns) (type int) as type syscall.Handle in argument to syscall.Close
02:32:17 FAIL	github.com/docker/docker/integration-cli [build failed]
Adding test for #8795
When a container was being destroyed was possible to have
flows in conntrack left behind on the host.
If a flow is present into the conntrack table, the packet
processing will skip the POSTROUTING table of iptables and
will use the information in conntrack to do the translation.
For this reason is possible that long lived flows created
towards a container that is destroyed, will actually affect
new flows incoming to the host, creating erroneous conditions
where traffic cannot reach new containers.
The fix takes care of cleaning them up when a container is
destroyed.

The test of this commit is actually reproducing the condition
where an UDP flow is established towards a container that is then
destroyed. The test verifies that the flow established is gone
after the container is destroyed.

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Apr 11, 2017

Member

ping @mavenugo @vieux all green now; ready to merge?

Member

thaJeztah commented Apr 11, 2017

ping @mavenugo @vieux all green now; ready to merge?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Apr 11, 2017

Member

adding impact/changelog, because #8795 has been a long standing issue, so worth a mention in the changelog

Member

thaJeztah commented Apr 11, 2017

adding impact/changelog, because #8795 has been a long standing issue, so worth a mention in the changelog

@thaJeztah thaJeztah referenced this pull request Apr 11, 2017

Merged

17.05.0 rc1 changelog #32498

@mavenugo

This comment has been minimized.

Show comment
Hide comment
@mavenugo
Contributor

mavenugo commented Apr 11, 2017

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Apr 11, 2017

Member

Thanks! I'll go ahead and merge 👍

Member

thaJeztah commented Apr 11, 2017

Thanks! I'll go ahead and merge 👍

@thaJeztah thaJeztah merged commit f30e94a into moby:master Apr 11, 2017

7 checks passed

dco-signed All commits are signed
experimental Jenkins build Docker-PRs-experimental 32819 has succeeded
Details
janky Jenkins build Docker-PRs 41428 has succeeded
Details
powerpc Jenkins build Docker-PRs-powerpc 1618 has succeeded
Details
vendor Jenkins build Docker-PRs-vendor 3147 has succeeded
Details
windowsRS1 Jenkins build Docker-PRs-WoW-RS1 12566 has succeeded
Details
z Jenkins build Docker-PRs-s390x 1452 has succeeded
Details

@fcrisciani fcrisciani deleted the fcrisciani:conntrack_test branch Apr 11, 2017

// Launch the server, this will remain listening on an exposed port and reply to any request in a ping/pong fashion
cmd := "while true; do echo hello | nc -w 1 -lu 8080; done"
_, _, err := dockerCmdWithError("run", "-d", "--name", "server", "--net", "testbind", "-p", "8080:8080/udp", "appropriate/nc", "sh", "-c", cmd)

This comment has been minimized.

@seemethere

seemethere Aug 31, 2017

Contributor

appropriate/nc?

tisk tisk

@seemethere

seemethere Aug 31, 2017

Contributor

appropriate/nc?

tisk tisk

This comment has been minimized.

@fcrisciani

fcrisciani Aug 31, 2017

Contributor

?

@fcrisciani

fcrisciani Aug 31, 2017

Contributor

?

This comment has been minimized.

@vieux

vieux Aug 31, 2017

Collaborator

@fcrisciani I think @seemethere was refering to the use of a 3rd party image, and it that case it was not updated in 2 years. could you replace it with a simple alpine for the official library ? it includes nc

@vieux

vieux Aug 31, 2017

Collaborator

@fcrisciani I think @seemethere was refering to the use of a 3rd party image, and it that case it was not updated in 2 years. could you replace it with a simple alpine for the official library ? it includes nc

This comment has been minimized.

@fcrisciani

fcrisciani Aug 31, 2017

Contributor

@vieux @seemethere Using that image is not accidental but is intended. The nc version in alpine does not support for example the option -q. There is a bug in nc that let the command never return also when specify the -w option.
Feel free to try to change it to the alpine removing the -q option to let the command work, but most likely the test will fail timing out because the nc command will hang forever.

@fcrisciani

fcrisciani Aug 31, 2017

Contributor

@vieux @seemethere Using that image is not accidental but is intended. The nc version in alpine does not support for example the option -q. There is a bug in nc that let the command never return also when specify the -w option.
Feel free to try to change it to the alpine removing the -q option to let the command work, but most likely the test will fail timing out because the nc command will hang forever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment