Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
SupportPodPidsLimit feature beta with tests #72076
What this PR does / why we need it:
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Dec 15, 2018
Dec 17, 2018
@yujuhong - I have a hold so we can discuss in sig-node. I am fine adding a knob to protect across all pods in addition to per pod as it simplifies configuring a pid reserve. I felt that the big and small pod pid limit could be added in the future as it’s not inconsistent with a default pod pid limit enforced local to node.
To capture discussion from sig-node meeting:
My slight concern before was that the per-pod limit was going to be hard to pick/set, and not so useful after all. Alternatively, a node-wide limit for all pods (similar to allocatable) will provide a safety net for the node, and would be easier to roll out.
Setting a sensible default limit like this addressed my concern, so this looks good to me.
@derekwaynecarr a couple more thoughts to consider:
We currently have just basic node-level protection through node-level eviction. The next step IMO is isolation between node daemons (kubelet, runtime, etc) and user pods using Node Allocatable. This would allow isolating user-caused PID exhaustion to user pods to prevent the node from being made unusable if eviction doesn't catch pressure in-time. Achieving pod-to-pod PID isolation through
As Node Allocatable is a well-established pattern for isolating node daemons and user workloads using a combination of cgroups and eviction. It would be easy to implement, and the design should not be contentious. I think we should seriously consider adding it under this same feature gate.
We should definitely still move forward with the
Edit: Realized this feature gate is just for pids per pod, so allocatable shouldn't be tied to it. We should still consider adding it though!
Jan 7, 2019
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: derekwaynecarr
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing
I updated the help text for
@dashpole - I agree we should have eviction threshold and node allocatable enforcement for pids in a follow-on PR. This affords us the opportunity for a future PR to restrict pid limiting at the node allocatable level while maintaining backward compatibility. FWIW, I am not convinced right now that PidPressure condition is actually working, we should investigate that further. PTAL at this PR.
It wouldn't be too hard to write an actual fork-bomb test using the eviction framework. I'll try and add that in the next few weeks. I opened #72654 to track this.
@dashpole updated the copyright.
At the node allocatable level, we currently do not bound pids.
cat /sys/fs/cgroup/pids/kubepods/pids.max max
This means that by default, pods are bound to the node allocatable pid limit which is
At the pod cgroup level, we do not write to the pid cgroup unless the configured value is positive.
The moment we add support for setting a node allocatable pid limit, the pod pid limit in this PR will be bounded by that value in the cgroup hierarchy. I kept the help text documentation in terms of node allocatable rather than host configuration since we know that will happen in a follow-on PR.
3 similar comments
Jan 10, 2019
18 of 19 checks passed
@derekwaynecarr this has caused the Validate Node Allocatable test to fail: https://k8s-testgrid.appspot.com/sig-node-kubelet#node-kubelet-serial&include-filter-by-regex=Validate%20Node%20Allocatable
We need to check if