Hook rkt kubelet runtime up to network plugins #25062

dcbw · 2016-05-03T00:51:09Z

No description provided.

dcbw · 2016-05-03T01:00:24Z

@yifan-gu @euank @bprashanth here's the direction I think the whole rkt/CNI/kubenet ball of wax should go... let me know what you think!

dcbw · 2016-05-03T01:04:59Z

CONTAINER_RUNTIME=rkt RKT_PATH=/usr/bin/rkt RKT_STAGE1_IMAGE=/usr/libexec/rkt/stage1-host.aci LOG_LEVEL=5 NET_PLUGIN=kubenet hack/local-up-cluster.sh -o _output/local/bin/linux/amd64/

on Fedora 23 with F24's rkt RPM (1.3.0+gitdd7aa64) runs pods, configures the network, and cleans up when the pod terminates for me. If others could test too, preferably with whatever rkt development version rktnetes will get based on, that would be awesome...

euank · 2016-05-03T01:10:18Z

pkg/kubelet/rkt/rkt.go

+		}
+
+		// Run pod in the namespace we just created and set up
+		runPrepared = fmt.Sprintf("%s netns exec %s %s", r.ipPath, uuid, runPrepared)


I found that ip netns exec breaks on some systems (e.g. my development laptop, running a fairly modern gentoo install):

$ sudo ip netns add foo $ sudo ip netns exec foo rkt run --net=host docker://busybox --exec=ls -- /sys/fs/cgroup image: using image from local store for image name coreos.com/rkt/stage1-coreos:1.4.0+gitb7589f9 image: using image from local store for url docker://busybox stage1: error calling sd_pid_get_owner_uid: exec format error $ sudo ip netns exec foo ls /sys/fs/cgroup # nothing $ ip -V ip utility, iproute2-ss160111

nsenter didn't have that issue for me and nsenter is already depended on for ExecInContainer.

I can't reproduce this issue on CoreOS (ip -V -> iproute2-ss150210)

Ref: https://bugzilla.redhat.com/show_bug.cgi?id=882047

That's the reason I used nsenter for execing rather than ip. I don't know the state of this bug on other distros, but if you're confident it doesn't affect any real users (only the fake users running gentoo :) or if it's a kernel misconfiguration on my end or such, I'm happy to be persuaded. Otherwise, nsenter seems like the safer way to be sure this isn't an issue.

dcbw · 2016-05-03T01:17:29Z

@euank hmm, interesting. For core tools like that, I tend to think that if the tool is broken, that's not really Kubernetes problem and that tool should get fixed, and will get fixed. Of course if it impacts a wide audience, then we should work around the problem until it's fixed. Otherwise we never coalesce on common tools because we've got tons of workarounds for older versions :(

dcbw · 2016-05-03T01:21:04Z

@euank not opposed to using nsenter, just wondering what versions of iproute2 had that bug, and how widely that version is deployed before falling back to nsenter.

yifan-gu · 2016-05-03T01:22:24Z

pkg/kubelet/rkt/rkt.go

@@ -1320,6 +1393,13 @@ func (r *Runtime) KillPod(pod *api.Pod, runningPod kubecontainer.Pod) error {
 		return err
 	}

+	if !kubecontainer.IsHostNetworkPod(pod) {
+		if err := r.cleanupPodNetwork(&runningPod, true); err != nil {


@dcbw This takes care of the network cleanup when we explicitly kill/stop the pods. But what if the pod is a crash loop, and is deleted by garbage collection?
Actually, I don't find any places in dockertools that take care of the network when the pod is garbage collected today, did I miss something?

@dcbw I guess the docker pod works fine today because the pause container hardly dies, so in most cases the only time it dies is when it's been explicitly killed, which invokes the networkplugin.TearDown() accordingly.

dcbw · 2016-05-03T01:27:40Z

@euank looks like the iproute2 patch that fixes the issue is 144e6ce1679a768e987230efb4afa402a5ab58ac and that's been part of 3.8 and later, which was released on 2013-02-21, eg a really long time ago....

dcbw · 2016-05-03T01:29:39Z

@yifan-gu I was wondering about the GC code, yeah. Are there other places that should clean up the netns besides the GC code?

euank · 2016-05-03T01:41:32Z

It's possible I'm encountering a separate bug actually since I'm on 4.4.0... I'll look into it a bit more.

dcbw · 2016-05-03T01:51:06Z

@euank sure; if you can figure out what's going on there that would be great. But it's looking like we should probably use nsenter as you suggest, at least for now. I'll make that change tomorrow and repush the PR.

yifan-gu · 2016-05-03T02:28:46Z

@yifan-gu I was wondering about the GC code, yeah. Are there other places that should clean up the netns besides the GC code?

I don't think so. cc @kubernetes/sig-node @yujuhong @Random-Liu @dchen1107 #25062 (comment)

feiskyer · 2016-05-03T09:53:42Z

pkg/kubelet/container/runtime.go

@@ -94,6 +94,8 @@ type Runtime interface {
 	RemoveImage(image ImageSpec) error
 	// Returns Image statistics.
 	ImageStats() (*ImageStats, error)
+	// Returns the filesystem path of the pod's network namespace
+	GetNetNS(containerID ContainerID) (string, error)


Is it essential for adding GetNetNS() to Runtime interface? Not all container runtimes (e.g., HyperContainer, which is a hypervisor-based container runtime) run containers inside network namespace.

Is it essential for adding GetNetNS() to Runtime interface? Not all container runtimes (e.g., HyperContainer, which is a hypervisor-based container runtime) run containers inside network namespace.

@feiskyer Good point, though I'd like to reduce the usage of casting in the network plugins and potentially make them usable for more runtimes. What if we specified in the Runtime interface that a Runtime was not required to return a valid netns path if that runtime does not do namespace creation? Network plugins would then get an error and exit, indicating this plugin was not compatible with the runtime.

I think it's cleaner to have a NetworkPluggableRuntime (name?) interface which just has GetNetNS since that's the only method needed, and then docker & rkt can implement it and you can cast to that, rather than a concrete implementation.
If some runtimes will stub out the method, it's not really part of a container runtime.

I don't really see the difference right now between GetNetNS() and a NetworkPluggableRuntime. If we add more methods that runtimes won't implement, then sure, maybe we re-evaluate.

Also note that the only reason there is a GetRuntime() is for GetNetNS(); CNI uses it for Status() too but I think that's bogus and it should follow the same model as kubenet does, but that's another PR. I think we can probably get rid of GetRuntime() in favor of some other interface that the runtime actually implements. Even better, since the runtimes themselves are calling the functions, we should just pass an interface into them (or nil) instead of having a roundable GetRuntime().

In summary, I'd like to go with GetNetNS() for now, and rework GetRuntime() in a further PR if that's OK? With the understanding that yes, not all runtimes support returning a netns path now or in the future.

The difference is indeed just conceptual/academic right now, not practical, so I'm fine with this for now with the hope of an even better solution later. Thanks for the explanation as well.

euank · 2016-05-03T18:25:00Z

pkg/kubelet/rkt/rkt.go

@@ -1424,6 +1504,20 @@ func (r *Runtime) SyncPod(pod *api.Pod, podStatus api.PodStatus, internalPodStat
 	return
 }

+func netnsPathFromUuid(uuid string) string {
+	return fmt.Sprintf("/var/run/netns/%s", uuid)


It might be nice to prepend k8-rkt- or something so a user can easily identify namespaces we created. Fine with it as is as well.

yifan-gu · 2016-05-18T16:42:34Z

LGTM. Thank you very much @dcbw !

euank · 2016-05-18T20:04:59Z

It's not clear to me if those three failures are caused by this or flakes, but since one of them could relate to the ip in podStatus (kubelet managed /etc/hosts file) and one is networking (healthz), it's certainly worth looking a bit deeper.

bprashanth · 2016-05-18T20:34:59Z

Hmm all 3 instances looks like the test failed trying to contact apiserver:
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/25062/kubernetes-pull-build-test-e2e-gce/40667/

  Expected error:
      <*url.Error | 0xc820509dd0>: {
          Op: "Get",
          URL: "https://104.154.95.7/api/v1/watch/namespaces/e2e-tests-e2e-kubelet-etc-hosts-4a3o9/serviceaccounts?fieldSelector=metadata.name%3Ddefault",
          Err: {
              Op: "dial",
              Net: "tcp",
              Source: nil,
              Addr: {
                  IP: "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x9a_\a",
                  Port: 443,
                  Zone: "",
              },
              Err: {},
          },
      }


/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:508
  should *not* be restarted with a /healthz http liveness probe [Conformance] [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/pods.go:1164

  creating pod liveness-http
  Expected error:
      <*url.Error | 0xc82038cf00>: {
          Op: "Post",
          URL: "https://104.154.95.7/api/v1/namespaces/e2e-tests-pods-6nwro/pods",
          Err: {
              Op: "dial",
              Net: "tcp",
              Source: nil,
              Addr: {
                  IP: "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x9a_\a",
                  Port: 443,
                  Zone: "",
              },
              Err: {},
          },
      }
      Post https://104.154.95.7/api/v1/namespaces/e2e-tests-pods-6nwro/pods: dial tcp 104.154.95.7:443: i/o timeout
  not to have occurred

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:508
  should update labels on modification [Conformance] [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/downwardapi_volume.go:96

  Expected error:
      <*url.Error | 0xc8205d0690>: {
          Op: "Post",
          URL: "https://104.154.95.7/api/v1/namespaces/e2e-tests-downward-api-7q9y5/pods",
          Err: {
              Op: "dial",
              Net: "tcp",
              Source: nil,
              Addr: {
                  IP: "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x9a_\a",
                  Port: 443,
                  Zone: "",
              },
              Err: {},
          },
      }
      Post https://104.154.95.7/api/v1/namespaces/e2e-tests-downward-api-7q9y5/pods: dial tcp 104.154.95.7:443: i/o timeout
  not to have occurred

yifan-gu · 2016-05-18T21:41:31Z

As a note. The defaultNetworkName can be dropped now after this PR. And we need the update the GetPodStatus() to return the correct pod IP after this PR is merged. cc @euank

euank · 2016-05-18T21:54:06Z

For reference, two of the commits here are probably what's needed, I'm testing to make sure that didn't break anything. If possible, I want to just get this PR's changes in so we can continue iterating, but if it's better for the networkPlugin.GetPodNetworkStatus related bits to be here (in terms of speed and/or correctness), that's fine too.

freehan · 2016-05-18T23:59:44Z

LGTM. pending on tests.

euank · 2016-05-19T00:42:25Z

An apiserver failure seems unrelated to this, but I'm not sure enough it's a flake to file a flake issue yet. I'll file an issue if the below retest passes, proving it a flake.

... And the above is a catch 22 for retesting 😄

@k8s-bot test this issue: #IGNORE

(failure link in case it needs to be preserved in the case of it being a flake)

dcbw · 2016-05-19T14:46:47Z

@euank the pass-pod-IP commit looks OK to me, my mistake to have missed that originally...

k8s-github-robot · 2016-05-21T06:56:07Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2016-05-21T07:34:39Z

GCE e2e build/test passed for commit 552b648.

k8s-github-robot · 2016-05-21T07:45:19Z

Automatic merge from submit-queue

This is needed for the /etc/hosts mount and the downward API to work. Furthermore, this is required for the reported `PodStatus` to be correct. The `Status` bit mostly worked prior to kubernetes#25062, and this restores that functionality in addition to the new functionality.

@yifan-gu

Automatic merge from submit-queue rkt: Pass through podIP This is needed for the /etc/hosts mount and the downward API to work. Furthermore, this is required for the reported `PodStatus` to be correct. The `Status` bit mostly worked prior to #25062, and this restores that functionality in addition to the new functionality. In retrospect, the regression in status is large enough the prior PR should have included at least some of this; my bad for not realizing the full implications there. #25902 is needed for downwards api stuff, but either merge order is fine as neither will break badly by itself. cc @yifan-gu @dcbw

[release-4.4] Bug 1843876: UPSTREAM: 91008: Do not swallow NotFound error for DeletePod in dsc.manage Origin-commit: 024016aca68ca0588c6f5199a181a0616451cfdd

googlebot added the cla: yes label May 3, 2016

dcbw force-pushed the kubenet-rkt branch 2 times, most recently from 3db3e9d to 125724d Compare May 3, 2016 00:57

dcbw mentioned this pull request May 3, 2016

Write CNI conf to network-plugin-dir #24672

Closed

k8s-github-robot assigned dchen1107 May 3, 2016

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels May 3, 2016

euank reviewed May 3, 2016
View reviewed changes

yifan-gu assigned euank and unassigned dchen1107 May 3, 2016

dcbw force-pushed the kubenet-rkt branch from 125724d to f175a77 Compare May 3, 2016 01:15

yifan-gu reviewed May 3, 2016
View reviewed changes

dcbw force-pushed the kubenet-rkt branch from f175a77 to f71d5cd Compare May 3, 2016 01:32

k8s-github-robot added the needs-ok-to-merge label May 3, 2016

yifan-gu modified the milestones: rktnetes-v1.1, rktnetes-v1.0 May 3, 2016

feiskyer reviewed May 3, 2016
View reviewed changes

alban mentioned this pull request May 3, 2016

Allow passing existing network namespace to --net as a net name rkt/rkt#2525

Closed

euank reviewed May 3, 2016
View reviewed changes

yifan-gu mentioned this pull request May 19, 2016

cluster/gce/coreos: Update addon manifests. #22663

Merged

yifan-gu added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2016

euank mentioned this pull request May 19, 2016

e2e flake - Post https://$ip/api: dial tcp $ip: i/o timeout not to have occurred #25855

Closed

euank mentioned this pull request May 19, 2016

Proposal: Make net-conf and net-plugin directory configurable rkt/rkt#2249

Open

yifan-gu modified the milestones: v1.3, rktnetes-v1.0 May 20, 2016

k8s-github-robot merged commit 423a415 into kubernetes:master May 21, 2016

pskrzyns mentioned this pull request May 23, 2016

After last merge can't create pod witk rkt-kvm and get pod ip with rkt-coreos #26081

Closed

This was referenced May 23, 2016

rkt: Pass through podIP #26096

Merged

rkt networking discussion #26102

Closed

yifan-gu mentioned this pull request Jun 1, 2016

rkt-coreos: can't create pod - invalid memory address or nil pointer dereference #26540

Closed

dcbw mentioned this pull request Aug 5, 2016

Kubelet network plugin for client/server container runtimes #28667

Closed

bboreham mentioned this pull request Oct 20, 2016

Kubernetes 1.3.5 breaks cni config with docker #30681

Closed

cmluciano mentioned this pull request Mar 2, 2017

Pod can not access service IP when no nsenter in $PATH #42314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hook rkt kubelet runtime up to network plugins #25062

Hook rkt kubelet runtime up to network plugins #25062

dcbw commented May 3, 2016

dcbw commented May 3, 2016

dcbw commented May 3, 2016

euank May 3, 2016

dcbw commented May 3, 2016 •

edited

dcbw commented May 3, 2016

yifan-gu May 3, 2016 •

edited

yifan-gu May 3, 2016 •

edited

dcbw commented May 3, 2016

dcbw commented May 3, 2016

euank commented May 3, 2016

dcbw commented May 3, 2016

yifan-gu commented May 3, 2016

feiskyer May 3, 2016 •

edited

dcbw May 3, 2016 •

edited

euank May 3, 2016

dcbw May 3, 2016

euank May 3, 2016

euank May 3, 2016

yifan-gu commented May 18, 2016

euank commented May 18, 2016 •

edited

bprashanth commented May 18, 2016

yifan-gu commented May 18, 2016 •

edited

euank commented May 18, 2016

freehan commented May 18, 2016

euank commented May 19, 2016

dcbw commented May 19, 2016

k8s-github-robot commented May 21, 2016

k8s-bot commented May 21, 2016

k8s-github-robot commented May 21, 2016

Hook rkt kubelet runtime up to network plugins #25062

Hook rkt kubelet runtime up to network plugins #25062

Conversation

dcbw commented May 3, 2016

dcbw commented May 3, 2016

dcbw commented May 3, 2016

euank May 3, 2016

Choose a reason for hiding this comment

dcbw commented May 3, 2016 • edited

dcbw commented May 3, 2016

yifan-gu May 3, 2016 • edited

Choose a reason for hiding this comment

yifan-gu May 3, 2016 • edited

Choose a reason for hiding this comment

dcbw commented May 3, 2016

dcbw commented May 3, 2016

euank commented May 3, 2016

dcbw commented May 3, 2016

yifan-gu commented May 3, 2016

feiskyer May 3, 2016 • edited

Choose a reason for hiding this comment

dcbw May 3, 2016 • edited

Choose a reason for hiding this comment

euank May 3, 2016

Choose a reason for hiding this comment

dcbw May 3, 2016

Choose a reason for hiding this comment

euank May 3, 2016

Choose a reason for hiding this comment

euank May 3, 2016

Choose a reason for hiding this comment

yifan-gu commented May 18, 2016

euank commented May 18, 2016 • edited

bprashanth commented May 18, 2016

yifan-gu commented May 18, 2016 • edited

euank commented May 18, 2016

freehan commented May 18, 2016

euank commented May 19, 2016

dcbw commented May 19, 2016

k8s-github-robot commented May 21, 2016

k8s-bot commented May 21, 2016

k8s-github-robot commented May 21, 2016

dcbw commented May 3, 2016 •

edited

yifan-gu May 3, 2016 •

edited

yifan-gu May 3, 2016 •

edited

feiskyer May 3, 2016 •

edited

dcbw May 3, 2016 •

edited

euank commented May 18, 2016 •

edited

yifan-gu commented May 18, 2016 •

edited