New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hook rkt kubelet runtime up to network plugins #25062
Conversation
3db3e9d
to
125724d
Compare
@yifan-gu @euank @bprashanth here's the direction I think the whole rkt/CNI/kubenet ball of wax should go... let me know what you think! |
on Fedora 23 with F24's rkt RPM (1.3.0+gitdd7aa64) runs pods, configures the network, and cleans up when the pod terminates for me. If others could test too, preferably with whatever rkt development version rktnetes will get based on, that would be awesome... |
} | ||
|
||
// Run pod in the namespace we just created and set up | ||
runPrepared = fmt.Sprintf("%s netns exec %s %s", r.ipPath, uuid, runPrepared) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that ip netns exec
breaks on some systems (e.g. my development laptop, running a fairly modern gentoo install):
$ sudo ip netns add foo
$ sudo ip netns exec foo rkt run --net=host docker://busybox --exec=ls -- /sys/fs/cgroup
image: using image from local store for image name coreos.com/rkt/stage1-coreos:1.4.0+gitb7589f9
image: using image from local store for url docker://busybox
stage1: error calling sd_pid_get_owner_uid: exec format error
$ sudo ip netns exec foo ls /sys/fs/cgroup
# nothing
$ ip -V
ip utility, iproute2-ss160111
nsenter
didn't have that issue for me and nsenter
is already depended on for ExecInContainer
.
I can't reproduce this issue on CoreOS (ip -V -> iproute2-ss150210
)
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=882047
That's the reason I used nsenter
for execing rather than ip
. I don't know the state of this bug on other distros, but if you're confident it doesn't affect any real users (only the fake users running gentoo :) or if it's a kernel misconfiguration on my end or such, I'm happy to be persuaded. Otherwise, nsenter seems like the safer way to be sure this isn't an issue.
@euank hmm, interesting. For core tools like that, I tend to think that if the tool is broken, that's not really Kubernetes problem and that tool should get fixed, and will get fixed. Of course if it impacts a wide audience, then we should work around the problem until it's fixed. Otherwise we never coalesce on common tools because we've got tons of workarounds for older versions :( |
@euank not opposed to using nsenter, just wondering what versions of iproute2 had that bug, and how widely that version is deployed before falling back to nsenter. |
@@ -1320,6 +1393,13 @@ func (r *Runtime) KillPod(pod *api.Pod, runningPod kubecontainer.Pod) error { | |||
return err | |||
} | |||
|
|||
if !kubecontainer.IsHostNetworkPod(pod) { | |||
if err := r.cleanupPodNetwork(&runningPod, true); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcbw This takes care of the network cleanup when we explicitly kill/stop the pods. But what if the pod is a crash loop, and is deleted by garbage collection?
Actually, I don't find any places in dockertools that take care of the network when the pod is garbage collected today, did I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcbw I guess the docker pod works fine today because the pause
container hardly dies, so in most cases the only time it dies is when it's been explicitly killed, which invokes the networkplugin.TearDown()
accordingly.
@euank looks like the iproute2 patch that fixes the issue is 144e6ce1679a768e987230efb4afa402a5ab58ac and that's been part of 3.8 and later, which was released on 2013-02-21, eg a really long time ago.... |
@yifan-gu I was wondering about the GC code, yeah. Are there other places that should clean up the netns besides the GC code? |
It's possible I'm encountering a separate bug actually since I'm on 4.4.0... I'll look into it a bit more. |
@euank sure; if you can figure out what's going on there that would be great. But it's looking like we should probably use nsenter as you suggest, at least for now. I'll make that change tomorrow and repush the PR. |
I don't think so. cc @kubernetes/sig-node @yujuhong @Random-Liu @dchen1107 #25062 (comment) |
@@ -94,6 +94,8 @@ type Runtime interface { | |||
RemoveImage(image ImageSpec) error | |||
// Returns Image statistics. | |||
ImageStats() (*ImageStats, error) | |||
// Returns the filesystem path of the pod's network namespace | |||
GetNetNS(containerID ContainerID) (string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it essential for adding GetNetNS()
to Runtime
interface? Not all container runtimes (e.g., HyperContainer, which is a hypervisor-based container runtime) run containers inside network namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it essential for adding GetNetNS() to Runtime interface? Not all container runtimes (e.g., HyperContainer, which is a hypervisor-based container runtime) run containers inside network namespace.
@feiskyer Good point, though I'd like to reduce the usage of casting in the network plugins and potentially make them usable for more runtimes. What if we specified in the Runtime interface that a Runtime was not required to return a valid netns path if that runtime does not do namespace creation? Network plugins would then get an error and exit, indicating this plugin was not compatible with the runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's cleaner to have a NetworkPluggableRuntime
(name?) interface which just has GetNetNS
since that's the only method needed, and then docker & rkt can implement it and you can cast to that, rather than a concrete implementation.
If some runtimes will stub out the method, it's not really part of a container runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see the difference right now between GetNetNS() and a NetworkPluggableRuntime. If we add more methods that runtimes won't implement, then sure, maybe we re-evaluate.
Also note that the only reason there is a GetRuntime() is for GetNetNS(); CNI uses it for Status() too but I think that's bogus and it should follow the same model as kubenet does, but that's another PR. I think we can probably get rid of GetRuntime() in favor of some other interface that the runtime actually implements. Even better, since the runtimes themselves are calling the functions, we should just pass an interface into them (or nil) instead of having a roundable GetRuntime().
In summary, I'd like to go with GetNetNS() for now, and rework GetRuntime() in a further PR if that's OK? With the understanding that yes, not all runtimes support returning a netns path now or in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is indeed just conceptual/academic right now, not practical, so I'm fine with this for now with the hope of an even better solution later. Thanks for the explanation as well.
@@ -1424,6 +1504,20 @@ func (r *Runtime) SyncPod(pod *api.Pod, podStatus api.PodStatus, internalPodStat | |||
return | |||
} | |||
|
|||
func netnsPathFromUuid(uuid string) string { | |||
return fmt.Sprintf("/var/run/netns/%s", uuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to prepend k8-rkt-
or something so a user can easily identify namespaces we created. Fine with it as is as well.
LGTM. Thank you very much @dcbw ! |
It's not clear to me if those three failures are caused by this or flakes, but since one of them could relate to the ip in podStatus ( |
Hmm all 3 instances looks like the test failed trying to contact apiserver:
|
As a note. The |
For reference, two of the commits here are probably what's needed, I'm testing to make sure that didn't break anything. If possible, I want to just get this PR's changes in so we can continue iterating, but if it's better for the |
LGTM. pending on tests. |
An apiserver failure seems unrelated to this, but I'm not sure enough it's a flake to file a flake issue yet. I'll file an issue if the below retest passes, proving it a flake. ... And the above is a catch 22 for retesting 😄 @k8s-bot test this issue: #IGNORE (failure link in case it needs to be preserved in the case of it being a flake) |
@euank the pass-pod-IP commit looks OK to me, my mistake to have missed that originally... |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e build/test passed for commit 552b648. |
Automatic merge from submit-queue |
This is needed for the /etc/hosts mount and the downward API to work. Furthermore, this is required for the reported `PodStatus` to be correct. The `Status` bit mostly worked prior to kubernetes#25062, and this restores that functionality in addition to the new functionality.
This is needed for the /etc/hosts mount and the downward API to work. Furthermore, this is required for the reported `PodStatus` to be correct. The `Status` bit mostly worked prior to kubernetes#25062, and this restores that functionality in addition to the new functionality.
Automatic merge from submit-queue rkt: Pass through podIP This is needed for the /etc/hosts mount and the downward API to work. Furthermore, this is required for the reported `PodStatus` to be correct. The `Status` bit mostly worked prior to #25062, and this restores that functionality in addition to the new functionality. In retrospect, the regression in status is large enough the prior PR should have included at least some of this; my bad for not realizing the full implications there. #25902 is needed for downwards api stuff, but either merge order is fine as neither will break badly by itself. cc @yifan-gu @dcbw
[release-4.4] Bug 1843876: UPSTREAM: 91008: Do not swallow NotFound error for DeletePod in dsc.manage Origin-commit: 024016aca68ca0588c6f5199a181a0616451cfdd
No description provided.