Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add namespace targeting mode to CRI and kubelet #84731

Merged
merged 3 commits into from Feb 20, 2020
Merged

Conversation

@verb
Copy link
Contributor

verb commented Nov 4, 2019

What type of PR is this?
/kind feature
/sig node
/priority important-longterm

What this PR does / why we need it: This adds a TARGET NamespaceMode to the CRI and implements namespace targeting for the PID namespace in the kubelet. This is necessary to implement the TargetContainerName field of EphemeralContainer.

Which issue(s) this PR fixes:
Fixes #84359

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

The kubelet and the default docker runtime now support running ephemeral containers in the Linux process namespace of a target container. Other container runtimes must implement this feature before it will be available in that runtime.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://git.k8s.io/enhancements/keps/sig-node/20190212-ephemeral-containers.md
@verb

This comment has been minimized.

Copy link
Contributor Author

verb commented Nov 5, 2019

/assign @yujuhong

@yujuhong

This comment has been minimized.

Copy link
Member

yujuhong commented Nov 7, 2019

@k8s-ci-robot k8s-ci-robot requested review from feiskyer, mrunalp and Random-Liu Nov 7, 2019
@verb

This comment has been minimized.

Copy link
Contributor Author

verb commented Dec 18, 2019

@yujuhong Shall we try to get this into 1.18?

@verb verb force-pushed the verb:ec-pid branch from fee37ea to bbbd22b Dec 18, 2019
@verb

This comment has been minimized.

Copy link
Contributor Author

verb commented Jan 13, 2020

/retest

@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 14, 2020

We discussed this in sig-node today. One concern brought up is who is going to reap the ephemeral container processes.

Actually the ephemeral container process targeting a container is similar with exec process, there is no much difference.

And here is something we found when we were dealing with an internal exec process leakage:

I highly suspect that this is caused by the kernel behavior change torvalds/linux@c6c70f4.

  • In linux <4.11, the background exec children are reparented to containerd-shim after exec exits, and containerd-shim reaps all its child processes. So those processes won't be left zombie.
  • In linux >=4.11, the background exec children are reparented to container init process after exec exits, and most container init processes don't reap arbitrary children.

Based on the kernel change, this doesn't seem to be a kernel bug, it is an expected kernel behavior. If that is the case, we have to live with it.

I talked with Docker people, and proposed some ideas, but 3 of them need users to change their application:

  1. Don't run background processes in exec (liveness, readiness probes);
  2. Run a simple init process in their container;
  3. Switch to shared pod pid namespace if it works for their application, because when pid namespace is shared, the pause process will reap all zombies in the pid namespace. https://github.com/kubernetes/kubernetes/tree/master/build/pause

Given so, I believe:

  1. We'll have similar issue with the ephemeral containers with linux 4.11+;
  2. We already have this issue with exec today.
Copy link
Member

yujuhong left a comment

3. Switch to shared pod pid namespace if it works for their application, because when pid namespace is shared, the pause process will reap all zombies in the pid namespace. https://github.com/kubernetes/kubernetes/tree/master/build/pause

If the target pod uses shared pid namespace, there wouldn't have reaping problem, right? Maybe we should add this in the feature description to warn the users.

Also, since this is for debugging/troubleshooting, users may still want to do this even with the possibility of leakage.

@Random-Liu

This comment has been minimized.

Copy link
Member

Random-Liu commented Jan 14, 2020

Also, since this is for debugging/troubleshooting, users may still want to do this even with the possibility of leakage.

Discussed with @yujuhong offline.

Yeah, during the meeting we only brought up the concern about the process reaping thing, but we didn't know the exact impact. If #84731 (comment) is the only impact, it seems fine to us, because:

  1. Exec already has the same issue today;
  2. Being able to debug a bad pod seems more important than leaking a zombie in the bad pod; :P
  3. Users can avoid this problem after knowing it, e.g. share pid namespace, run a init process, cleanup background processes before exiting the debug container.

Maybe a clear document for this case is good enough?

"github.com/stretchr/testify/assert"
"k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"

This comment has been minimized.

Copy link
@yujuhong

yujuhong Jan 14, 2020

Member

Why the alias? Same in other places too.

This comment has been minimized.

Copy link
@verb

verb Jan 15, 2020

Author Contributor

v1 "k8s.io/api/core/v1"

This was added by goimports and I didn't notice. TIL this is intentional: golang/go#30051 (comment)

// will be made to match that of container target_id.
// For example, a container with a PID namespace of TARGET expects to view
// all of the processes that container target_id can view.
TARGET = 3;

This comment has been minimized.

Copy link
@yujuhong

yujuhong Jan 14, 2020

Member

Looks okay to me.

@yujuhong

This comment has been minimized.

Copy link
Member

yujuhong commented Jan 21, 2020

/lgtm
Will approve soon-ish if no one else has any comments

@k8s-ci-robot k8s-ci-robot added the lgtm label Jan 21, 2020
@verb verb force-pushed the verb:ec-pid branch from bbbd22b to d05bcf6 Jan 30, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm label Jan 30, 2020
@verb

This comment has been minimized.

Copy link
Contributor Author

verb commented Jan 30, 2020

@yujuhong rebased due to a merge conflict in pkg/kubelet/kuberuntime/kuberuntime_container_linux_test.go

Also, I sent kubernetes/enhancements#1533 to you to answer the questions that were asked in the sig-node meeting.

@verb

This comment has been minimized.

Copy link
Contributor Author

verb commented Jan 30, 2020

/retest

1 similar comment
@verb

This comment has been minimized.

Copy link
Contributor Author

verb commented Jan 31, 2020

/retest

@verb

This comment has been minimized.

Copy link
Contributor Author

verb commented Jan 31, 2020

"Something went wrong: failed to acquire k8s binaries"

/retest

Copy link
Member

feiskyer left a comment

/lgtm

@verb

This comment has been minimized.

Copy link
Contributor Author

verb commented Feb 10, 2020

@yujuhong friendly ping, looks ready for approval?

@yujuhong

This comment has been minimized.

Copy link
Member

yujuhong commented Feb 19, 2020

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 19, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: verb, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Feb 20, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Feb 20, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Feb 20, 2020

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@k8s-ci-robot k8s-ci-robot merged commit d0983b5 into kubernetes:master Feb 20, 2020
16 checks passed
16 checks passed
cla/linuxfoundation verb authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-e2e-kind-ipv6 Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.