Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hook rkt kubelet runtime up to network plugins #25062

Merged
merged 6 commits into from May 21, 2016

Conversation

dcbw
Copy link
Member

@dcbw dcbw commented May 3, 2016

No description provided.

@dcbw dcbw force-pushed the kubenet-rkt branch 2 times, most recently from 3db3e9d to 125724d Compare May 3, 2016 00:57
@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels May 3, 2016
@dcbw
Copy link
Member Author

dcbw commented May 3, 2016

@yifan-gu @euank @bprashanth here's the direction I think the whole rkt/CNI/kubenet ball of wax should go... let me know what you think!

@dcbw
Copy link
Member Author

dcbw commented May 3, 2016

CONTAINER_RUNTIME=rkt RKT_PATH=/usr/bin/rkt RKT_STAGE1_IMAGE=/usr/libexec/rkt/stage1-host.aci LOG_LEVEL=5 NET_PLUGIN=kubenet hack/local-up-cluster.sh -o _output/local/bin/linux/amd64/

on Fedora 23 with F24's rkt RPM (1.3.0+gitdd7aa64) runs pods, configures the network, and cleans up when the pod terminates for me. If others could test too, preferably with whatever rkt development version rktnetes will get based on, that would be awesome...

}

// Run pod in the namespace we just created and set up
runPrepared = fmt.Sprintf("%s netns exec %s %s", r.ipPath, uuid, runPrepared)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that ip netns exec breaks on some systems (e.g. my development laptop, running a fairly modern gentoo install):

$ sudo ip netns add foo
$ sudo ip netns exec foo rkt run --net=host docker://busybox --exec=ls -- /sys/fs/cgroup
image: using image from local store for image name coreos.com/rkt/stage1-coreos:1.4.0+gitb7589f9
image: using image from local store for url docker://busybox
stage1: error calling sd_pid_get_owner_uid: exec format error
$ sudo ip netns exec foo ls /sys/fs/cgroup
# nothing
$ ip -V
ip utility, iproute2-ss160111

nsenter didn't have that issue for me and nsenter is already depended on for ExecInContainer.

I can't reproduce this issue on CoreOS (ip -V -> iproute2-ss150210)

Ref: https://bugzilla.redhat.com/show_bug.cgi?id=882047

That's the reason I used nsenter for execing rather than ip. I don't know the state of this bug on other distros, but if you're confident it doesn't affect any real users (only the fake users running gentoo :) or if it's a kernel misconfiguration on my end or such, I'm happy to be persuaded. Otherwise, nsenter seems like the safer way to be sure this isn't an issue.

@dcbw
Copy link
Member Author

dcbw commented May 3, 2016

@euank hmm, interesting. For core tools like that, I tend to think that if the tool is broken, that's not really Kubernetes problem and that tool should get fixed, and will get fixed. Of course if it impacts a wide audience, then we should work around the problem until it's fixed. Otherwise we never coalesce on common tools because we've got tons of workarounds for older versions :(

@dcbw
Copy link
Member Author

dcbw commented May 3, 2016

@euank not opposed to using nsenter, just wondering what versions of iproute2 had that bug, and how widely that version is deployed before falling back to nsenter.

@@ -1320,6 +1393,13 @@ func (r *Runtime) KillPod(pod *api.Pod, runningPod kubecontainer.Pod) error {
return err
}

if !kubecontainer.IsHostNetworkPod(pod) {
if err := r.cleanupPodNetwork(&runningPod, true); err != nil {
Copy link
Contributor

@yifan-gu yifan-gu May 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcbw This takes care of the network cleanup when we explicitly kill/stop the pods. But what if the pod is a crash loop, and is deleted by garbage collection?
Actually, I don't find any places in dockertools that take care of the network when the pod is garbage collected today, did I miss something?

Copy link
Contributor

@yifan-gu yifan-gu May 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcbw I guess the docker pod works fine today because the pause container hardly dies, so in most cases the only time it dies is when it's been explicitly killed, which invokes the networkplugin.TearDown() accordingly.

@dcbw
Copy link
Member Author

dcbw commented May 3, 2016

@euank looks like the iproute2 patch that fixes the issue is 144e6ce1679a768e987230efb4afa402a5ab58ac and that's been part of 3.8 and later, which was released on 2013-02-21, eg a really long time ago....

@dcbw
Copy link
Member Author

dcbw commented May 3, 2016

@yifan-gu I was wondering about the GC code, yeah. Are there other places that should clean up the netns besides the GC code?

@euank
Copy link
Contributor

euank commented May 3, 2016

It's possible I'm encountering a separate bug actually since I'm on 4.4.0... I'll look into it a bit more.

@dcbw
Copy link
Member Author

dcbw commented May 3, 2016

@euank sure; if you can figure out what's going on there that would be great. But it's looking like we should probably use nsenter as you suggest, at least for now. I'll make that change tomorrow and repush the PR.

@yifan-gu
Copy link
Contributor

yifan-gu commented May 3, 2016

@yifan-gu I was wondering about the GC code, yeah. Are there other places that should clean up the netns besides the GC code?

I don't think so. cc @kubernetes/sig-node @yujuhong @Random-Liu @dchen1107 #25062 (comment)

@yifan-gu yifan-gu modified the milestones: rktnetes-v1.1, rktnetes-v1.0 May 3, 2016
@@ -94,6 +94,8 @@ type Runtime interface {
RemoveImage(image ImageSpec) error
// Returns Image statistics.
ImageStats() (*ImageStats, error)
// Returns the filesystem path of the pod's network namespace
GetNetNS(containerID ContainerID) (string, error)
Copy link
Member

@feiskyer feiskyer May 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it essential for adding GetNetNS() to Runtime interface? Not all container runtimes (e.g., HyperContainer, which is a hypervisor-based container runtime) run containers inside network namespace.

Copy link
Member Author

@dcbw dcbw May 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it essential for adding GetNetNS() to Runtime interface? Not all container runtimes (e.g., HyperContainer, which is a hypervisor-based container runtime) run containers inside network namespace.

@feiskyer Good point, though I'd like to reduce the usage of casting in the network plugins and potentially make them usable for more runtimes. What if we specified in the Runtime interface that a Runtime was not required to return a valid netns path if that runtime does not do namespace creation? Network plugins would then get an error and exit, indicating this plugin was not compatible with the runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's cleaner to have a NetworkPluggableRuntime (name?) interface which just has GetNetNS since that's the only method needed, and then docker & rkt can implement it and you can cast to that, rather than a concrete implementation.
If some runtimes will stub out the method, it's not really part of a container runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the difference right now between GetNetNS() and a NetworkPluggableRuntime. If we add more methods that runtimes won't implement, then sure, maybe we re-evaluate.

Also note that the only reason there is a GetRuntime() is for GetNetNS(); CNI uses it for Status() too but I think that's bogus and it should follow the same model as kubenet does, but that's another PR. I think we can probably get rid of GetRuntime() in favor of some other interface that the runtime actually implements. Even better, since the runtimes themselves are calling the functions, we should just pass an interface into them (or nil) instead of having a roundable GetRuntime().

In summary, I'd like to go with GetNetNS() for now, and rework GetRuntime() in a further PR if that's OK? With the understanding that yes, not all runtimes support returning a netns path now or in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is indeed just conceptual/academic right now, not practical, so I'm fine with this for now with the hope of an even better solution later. Thanks for the explanation as well.

@@ -1424,6 +1504,20 @@ func (r *Runtime) SyncPod(pod *api.Pod, podStatus api.PodStatus, internalPodStat
return
}

func netnsPathFromUuid(uuid string) string {
return fmt.Sprintf("/var/run/netns/%s", uuid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to prepend k8-rkt- or something so a user can easily identify namespaces we created. Fine with it as is as well.

@yifan-gu
Copy link
Contributor

LGTM. Thank you very much @dcbw !

@euank
Copy link
Contributor

euank commented May 18, 2016

It's not clear to me if those three failures are caused by this or flakes, but since one of them could relate to the ip in podStatus (kubelet managed /etc/hosts file) and one is networking (healthz), it's certainly worth looking a bit deeper.

@bprashanth
Copy link
Contributor

Hmm all 3 instances looks like the test failed trying to contact apiserver:
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/pr-logs/pull/25062/kubernetes-pull-build-test-e2e-gce/40667/

  Expected error:
      <*url.Error | 0xc820509dd0>: {
          Op: "Get",
          URL: "https://104.154.95.7/api/v1/watch/namespaces/e2e-tests-e2e-kubelet-etc-hosts-4a3o9/serviceaccounts?fieldSelector=metadata.name%3Ddefault",
          Err: {
              Op: "dial",
              Net: "tcp",
              Source: nil,
              Addr: {
                  IP: "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x9a_\a",
                  Port: 443,
                  Zone: "",
              },
              Err: {},
          },
      }


/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:508
  should *not* be restarted with a /healthz http liveness probe [Conformance] [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/pods.go:1164

  creating pod liveness-http
  Expected error:
      <*url.Error | 0xc82038cf00>: {
          Op: "Post",
          URL: "https://104.154.95.7/api/v1/namespaces/e2e-tests-pods-6nwro/pods",
          Err: {
              Op: "dial",
              Net: "tcp",
              Source: nil,
              Addr: {
                  IP: "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x9a_\a",
                  Port: 443,
                  Zone: "",
              },
              Err: {},
          },
      }
      Post https://104.154.95.7/api/v1/namespaces/e2e-tests-pods-6nwro/pods: dial tcp 104.154.95.7:443: i/o timeout
  not to have occurred

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:508
  should update labels on modification [Conformance] [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/downwardapi_volume.go:96

  Expected error:
      <*url.Error | 0xc8205d0690>: {
          Op: "Post",
          URL: "https://104.154.95.7/api/v1/namespaces/e2e-tests-downward-api-7q9y5/pods",
          Err: {
              Op: "dial",
              Net: "tcp",
              Source: nil,
              Addr: {
                  IP: "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xffh\x9a_\a",
                  Port: 443,
                  Zone: "",
              },
              Err: {},
          },
      }
      Post https://104.154.95.7/api/v1/namespaces/e2e-tests-downward-api-7q9y5/pods: dial tcp 104.154.95.7:443: i/o timeout
  not to have occurred

@yifan-gu
Copy link
Contributor

yifan-gu commented May 18, 2016

As a note. The defaultNetworkName can be dropped now after this PR. And we need the update the GetPodStatus() to return the correct pod IP after this PR is merged. cc @euank

@euank
Copy link
Contributor

euank commented May 18, 2016

For reference, two of the commits here are probably what's needed, I'm testing to make sure that didn't break anything. If possible, I want to just get this PR's changes in so we can continue iterating, but if it's better for the networkPlugin.GetPodNetworkStatus related bits to be here (in terms of speed and/or correctness), that's fine too.

@freehan
Copy link
Contributor

freehan commented May 18, 2016

LGTM. pending on tests.

@euank
Copy link
Contributor

euank commented May 19, 2016

An apiserver failure seems unrelated to this, but I'm not sure enough it's a flake to file a flake issue yet. I'll file an issue if the below retest passes, proving it a flake.

... And the above is a catch 22 for retesting 😄

@k8s-bot test this issue: #IGNORE

(failure link in case it needs to be preserved in the case of it being a flake)

@yifan-gu yifan-gu added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 19, 2016
@dcbw
Copy link
Member Author

dcbw commented May 19, 2016

@euank the pass-pod-IP commit looks OK to me, my mistake to have missed that originally...

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented May 21, 2016

GCE e2e build/test passed for commit 552b648.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 423a415 into kubernetes:master May 21, 2016
euank added a commit to euank/kubernetes that referenced this pull request May 23, 2016
This is needed for the /etc/hosts mount and the downward API to work.
Furthermore, this is required for the reported `PodStatus` to be
correct.

The `Status` bit mostly worked prior to kubernetes#25062, and this restores that
functionality in addition to the new functionality.
This was referenced May 23, 2016
euank added a commit to euank/kubernetes that referenced this pull request May 25, 2016
This is needed for the /etc/hosts mount and the downward API to work.
Furthermore, this is required for the reported `PodStatus` to be
correct.

The `Status` bit mostly worked prior to kubernetes#25062, and this restores that
functionality in addition to the new functionality.
k8s-github-robot pushed a commit that referenced this pull request May 28, 2016
Automatic merge from submit-queue

rkt: Pass through podIP

This is needed for the /etc/hosts mount and the downward API to work.
Furthermore, this is required for the reported `PodStatus` to be
correct.

The `Status` bit mostly worked prior to #25062, and this restores that
functionality in addition to the new functionality.

In retrospect, the regression in status is large enough the prior PR should have included at least some of this; my bad for not realizing the full implications there.

#25902 is needed for downwards api stuff, but either merge order is fine as neither will break badly by itself.

cc @yifan-gu @dcbw
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this pull request Jun 25, 2020
[release-4.4] Bug 1843876: UPSTREAM: 91008: Do not swallow NotFound error for DeletePod in dsc.manage

Origin-commit: 024016aca68ca0588c6f5199a181a0616451cfdd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet