New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

networking: ensure the netns directory is mounted #3761

Merged
merged 1 commit into from Aug 30, 2017

Conversation

Projects
None yet
3 participants
@JulienBalestra
Contributor

JulienBalestra commented Aug 1, 2017

Context

Coming from Kubernetes issue 48427.
Initially I was thinking of doing it in the kubelet rkt code but as @squeed suggested, it's simpler and less hacky to do it in rkt codebase.

TL;DR:

This PR ensure the /run/netns directory is always mounted once as tmpfs.
The ip netns add <some-id> command has this behavior by default. The kubelet calls this command for every new hostNetwork: false pods.

A mount of /run/netns over any existing netns files like /run/netns/cni-30b37e42-157e-9796-dd74-ba4a3689bbde cause issues with them.

Warning

Upgrading rkt with this fix needs a full drain of any running rkt-netns containers on the node.

@squeed

This comment has been minimized.

Show comment
Hide comment
@squeed

squeed Aug 1, 2017

Contributor

I did some exploration, and sadly it's a bug in iproute2, not rkt. I'll outline the bug, the fix, and how we might be able to work around it.

iproute2 doesn't actually do a mount -t tmpfs .... Instead, it wants to make /run/netns a mountpoint with shared propagation so that an instance of iproute2 inside a container can manage the host. This is a good idea. The basic sequence of actions it does are:

  1. mkdir -p /run/netns
  2. mount --bind /run/netns /run/netns
  3. mount --make-rshared /run/netns

However, there is a bug in step 2. The bind mount keeps any already-existing files in /run/netns, but doesn't actually copy any mounts in /run/netns. Since network namespaces are recorded as bind mounts, this copies them as files but doesn't preserve the actual ns. The fix is for iproute2 to do mount --rbind

iproute2 skips step 2 if /run/netns is already a mountpoint. So, rkt should mirror the logic in iproute, but with the bind mount an rbind.

Make sense?

Contributor

squeed commented Aug 1, 2017

I did some exploration, and sadly it's a bug in iproute2, not rkt. I'll outline the bug, the fix, and how we might be able to work around it.

iproute2 doesn't actually do a mount -t tmpfs .... Instead, it wants to make /run/netns a mountpoint with shared propagation so that an instance of iproute2 inside a container can manage the host. This is a good idea. The basic sequence of actions it does are:

  1. mkdir -p /run/netns
  2. mount --bind /run/netns /run/netns
  3. mount --make-rshared /run/netns

However, there is a bug in step 2. The bind mount keeps any already-existing files in /run/netns, but doesn't actually copy any mounts in /run/netns. Since network namespaces are recorded as bind mounts, this copies them as files but doesn't preserve the actual ns. The fix is for iproute2 to do mount --rbind

iproute2 skips step 2 if /run/netns is already a mountpoint. So, rkt should mirror the logic in iproute, but with the bind mount an rbind.

Make sense?

@JulienBalestra

This comment has been minimized.

Show comment
Hide comment
@JulienBalestra

JulienBalestra Aug 2, 2017

Contributor

I made the changes suggested to "mirror the logic in iproute" + your suggested fix.

I improved the error message returned by syscall.Mount.
In case of failure, it's too short to be clear in the logs.

Example errno EINVAL:
stage1: networking: failed to setup network: invalid argument

Contributor

JulienBalestra commented Aug 2, 2017

I made the changes suggested to "mirror the logic in iproute" + your suggested fix.

I improved the error message returned by syscall.Mount.
In case of failure, it's too short to be clear in the logs.

Example errno EINVAL:
stage1: networking: failed to setup network: invalid argument

@lucab lucab requested a review from squeed Aug 3, 2017

@lucab

This comment has been minimized.

Show comment
Hide comment
@lucab

lucab Aug 3, 2017

Member

Everything up to here looks fine to me. I've milestoned this for the next release, but I'd like to wait and see whether we get some feedback on the iproute2 patch.

Member

lucab commented Aug 3, 2017

Everything up to here looks fine to me. I've milestoned this for the next release, but I'd like to wait and see whether we get some feedback on the iproute2 patch.

@lucab lucab added this to the 1.29.0 milestone Aug 3, 2017

Show outdated Hide outdated networking/podenv.go Outdated
Show outdated Hide outdated networking/networking.go Outdated
@JulienBalestra

This comment has been minimized.

Show comment
Hide comment
@JulienBalestra

JulienBalestra Aug 4, 2017

Contributor

@lucab I removed the recursive call and moved the mkdir in the method.
Do we have a feedback for the iproute patch ?
While I am here, do you know how long it could takes for that merged patch to be propagated into CoreOS container linux stable channel ?

Contributor

JulienBalestra commented Aug 4, 2017

@lucab I removed the recursive call and moved the mkdir in the method.
Do we have a feedback for the iproute patch ?
While I am here, do you know how long it could takes for that merged patch to be propagated into CoreOS container linux stable channel ?

Show outdated Hide outdated networking/podenv.go Outdated
Show outdated Hide outdated networking/podenv.go Outdated
Show outdated Hide outdated networking/podenv.go Outdated

@lucab lucab added the needs/review label Aug 29, 2017

@lucab

lucab approved these changes Aug 30, 2017

LGTM

@lucab lucab removed the needs/review label Aug 30, 2017

@squeed

This comment has been minimized.

Show comment
Hide comment
@squeed

squeed Aug 30, 2017

Contributor

Tested myself, works great!

Contributor

squeed commented Aug 30, 2017

Tested myself, works great!

@squeed squeed merged commit d3a23a2 into rkt:master Aug 30, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
semaphoreci The build passed on Semaphore.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment