Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support total process ID limiting for nodes #73651

Merged
merged 1 commit into from Feb 14, 2019

Conversation

@RobertKrawitz
Copy link
Contributor

RobertKrawitz commented Feb 1, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:
Implement node-level process count limit to prevent multiple pods from collectively fork bombing a system

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kubelet now accepts `pid=<number>` in the `--system-reserved` and `--kube-reserved` options to ensure that the specified number of process IDs will be reserved for the system as a whole and for Kubernetes system daemons respectively.  Please reference `Kube Reserved` and `System Reserved` in `Reserve Compute Resources for System Daemons` in the Kubernetes documentation for general discussion of resource reservation.  To utilize this functionality, you must set the feature gate `SupportNodePidsLimit=true`
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 1, 2019

Hi @RobertKrawitz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@RobertKrawitz

This comment has been minimized.

Copy link
Contributor Author

RobertKrawitz commented Feb 1, 2019

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Feb 2, 2019

great to see the progress here, would be good if we have a node e2e test that validates as well, most likely here: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/node_container_manager_test.go

@RobertKrawitz RobertKrawitz force-pushed the RobertKrawitz:node_pids_limit branch from cd3901a to 945b0dd Feb 5, 2019

@RobertKrawitz

This comment has been minimized.

Copy link
Contributor Author

RobertKrawitz commented Feb 5, 2019

Current code successfully applies PID limit at least with one pod.

@rphillips

This comment has been minimized.

Copy link
Member

rphillips commented Feb 5, 2019

/ok-to-test

@RobertKrawitz RobertKrawitz force-pushed the RobertKrawitz:node_pids_limit branch from 945b0dd to d51a5a8 Feb 5, 2019

@RobertKrawitz

This comment has been minimized.

Copy link
Contributor Author

RobertKrawitz commented Feb 5, 2019

@dashpole ping

@@ -4157,6 +4157,9 @@ const (
// Local ephemeral storage, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
// The resource name for ResourceEphemeralStorage is alpha and it can change across releases.
ResourceEphemeralStorage ResourceName = "ephemeral-storage"
// Process IDs, in count

This comment has been minimized.

@dashpole

dashpole Feb 5, 2019

Contributor

I don't think we want to add ResourcePID as a first-class resource. We don't expect users to add requests/limits for pids, and we probably don't want pids to show up in the capacity/allocatable of the node.

This comment has been minimized.

@RobertKrawitz

RobertKrawitz Feb 5, 2019

Author Contributor

Is it OK for me to use a fixed string somewhere for this, or do I really need to keep this completely special cased?

This comment has been minimized.

@dashpole

dashpole Feb 5, 2019

Contributor

You can definitely add a string somewhere. I am just making the general point that we don't want to add this to capacity/allocatable. I tried to implement this yesterday, so I know that complicates the implementation.

This comment has been minimized.

@RobertKrawitz

RobertKrawitz Feb 5, 2019

Author Contributor

It definitely makes it rather messy.

@RobertKrawitz RobertKrawitz force-pushed the RobertKrawitz:node_pids_limit branch 5 times, most recently from 6f3005b to e548aad Feb 5, 2019

@RobertKrawitz RobertKrawitz force-pushed the RobertKrawitz:node_pids_limit branch from d76def6 to e0ebfcb Feb 13, 2019

@@ -125,6 +125,7 @@ var (
NoSnatTestProxy = Config{e2eRegistry, "no-snat-test-proxy", "1.0"}
// Pause - when these values are updated, also update cmd/kubelet/app/options/container_runtime.go
Pause = Config{gcRegistry, "pause", "3.1"}
Perl = Config{dockerLibraryRegistry, "perl", "5.26"}

This comment has been minimized.

@dashpole

dashpole Feb 13, 2019

Contributor

nit: remove this (unless you plan to add the test back in this PR)

This comment has been minimized.

@RobertKrawitz

RobertKrawitz Feb 13, 2019

Author Contributor

Fixed (and image_list.go).

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Feb 13, 2019

This looks great. So glad this is going in 1.14

@RobertKrawitz RobertKrawitz force-pushed the RobertKrawitz:node_pids_limit branch from e0ebfcb to 2597a1d Feb 13, 2019

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Feb 13, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Feb 13, 2019

@sjenning

This comment has been minimized.

Copy link
Contributor

sjenning commented Feb 13, 2019

/priority important-soon

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Feb 13, 2019

feature is approved for milestone

/milestone 1.14

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 13, 2019

@derekwaynecarr: The provided milestone is not valid for this repository. Milestones in this repository: [next-candidate, v1.10, v1.11, v1.12, v1.13, v1.14, v1.15, v1.16, v1.17, v1.18, v1.4, v1.5, v1.6, v1.7, v1.8, v1.9]

Use /milestone clear to clear the milestone.

In response to this:

feature is approved for milestone

/milestone 1.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Feb 13, 2019

great job @RobertKrawitz !

/approve

@derekwaynecarr

This comment has been minimized.

Copy link
Member

derekwaynecarr commented Feb 13, 2019

/milestone v1.14

@k8s-ci-robot k8s-ci-robot added this to the v1.14 milestone Feb 13, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 13, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, RobertKrawitz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RobertKrawitz

This comment has been minimized.

Copy link
Contributor Author

RobertKrawitz commented Feb 13, 2019

/retest

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 13, 2019

@RobertKrawitz: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized 2597a1d link /test pull-kubernetes-local-e2e-containerized

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit 888ff40 into kubernetes:master Feb 14, 2019

9 of 17 checks passed

pull-kubernetes-local-e2e-containerized Job failed.
Details
pull-kubernetes-bazel-test Job triggered.
Details
pull-kubernetes-e2e-gce Job triggered.
Details
pull-kubernetes-e2e-gce-100-performance Job triggered.
Details
pull-kubernetes-integration Job triggered.
Details
pull-kubernetes-kubemark-e2e-gce-big Job triggered.
Details
pull-kubernetes-node-e2e Job triggered.
Details
pull-kubernetes-verify Job triggered.
Details
cla/linuxfoundation RobertKrawitz authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-godeps Skipped
pull-kubernetes-local-e2e Skipped
pull-kubernetes-typecheck Job succeeded.
Details
pull-publishing-bot-validate Skipped
tide In merge pool.
Details
}
pidlimits, err := pidlimit.Stats()
if err == nil && pidlimits != nil && pidlimits.MaxPID != nil {
internalCapacity[pidlimit.PIDs] = *resource.NewQuantity(

This comment has been minimized.

@tedyu

tedyu Feb 14, 2019

Contributor

It seems internalCapacity only needs to be allocated when the above if condition is true.
Otherwise capacity can be used.

@RobertKrawitz RobertKrawitz deleted the RobertKrawitz:node_pids_limit branch Feb 15, 2019

@sjenning sjenning referenced this pull request Mar 1, 2019

Open

Pid Limiting #757

RobertKrawitz added a commit to RobertKrawitz/website that referenced this pull request Mar 1, 2019

@RobertKrawitz RobertKrawitz referenced this pull request Mar 1, 2019

Merged

Node PID limiting #12932

RobertKrawitz added a commit to RobertKrawitz/website that referenced this pull request Mar 4, 2019

k8s-ci-robot added a commit to kubernetes/website that referenced this pull request Mar 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.