-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add node-level plugin support for pod admission handler #1461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Welcome @SaranBalaji90! |
|
Hi @SaranBalaji90. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
riking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be integrated with the CRI?
|
|
||
| This functionality will provide cluster admins the ability to restrict specific pods to run on subset of their worker nodes. This includes, restricting pod using host networking, host’s pid namespace, some specific volumes, etc. Even though this can be achieved through other functionalities like, pod security policies, taints and toleration features there are some caveats where pod might still end up with this subset of worker nodes. | ||
|
|
||
| For example, when admins uses clusters provisioned through managed service provider, then they have full access to the cluster. So they can always delete pod security policy or launch pod that tolerates any taints and get launched on these nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cluster admins could simply set the 'node' field of a pod, bypassing any constraints enforced by a scheduler or its plugins, such as taints."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not clear it makes sense to protect against the cluster admin ability to circumvent security policies.
|
|
||
| In future if kernel or docker supports any new functionalities and if not all Kubernetes worker nodes in a cluster supports that feature then we need to keep adding these pod admit handlers in kubernetes source code to validate if pod can be admitted by kubelet or not. | ||
|
|
||
| This KEP is to discuss if we can move these validations out of kubernetes source code to a separate plugin that runs outside kubernetes. By this way, we don’t have to update kubernetes source code for every new functionalities supported in future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A finished KEP describes an enhancement, and is not an invitation to discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riking sorry, meant to say this KEP is to propose changes to move some stuff out of k8s source code. Will update this along with implementation detail of whatever we decide on.
that's a good suggestion. Haven't looked at CRI in depth, will take a look at this. |
derekwaynecarr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not adverse to additional plugins for kubelet admission that are not compiled w/ the kubelet, but I feel like the kubelet should have the function to validate the pod spec.
|
|
||
| ## Summary | ||
|
|
||
| Today, kubelet on the worker node performs known set of validations and decides if the pod can be allowed to execute on the node or not. Kubelet rejects the pods if any of the requirements in pod spec is not supported by docker running on the node. For example pods requiring to update specific sysctl parameters or requiring privileged escalations or using non default proc mount. Not just based on docker capabilities, kubelet rejects pod based on node capabilities as well. For example, it rejects pods which requires specific apparmor profile if its not supported by the kernel and also based on topology manager component decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/docker/container-engine to reflect cri
|
|
||
| This functionality will provide cluster admins the ability to restrict specific pods to run on subset of their worker nodes. This includes, restricting pod using host networking, host’s pid namespace, some specific volumes, etc. Even though this can be achieved through other functionalities like, pod security policies, taints and toleration features there are some caveats where pod might still end up with this subset of worker nodes. | ||
|
|
||
| For example, when admins uses clusters provisioned through managed service provider, then they have full access to the cluster. So they can always delete pod security policy or launch pod that tolerates any taints and get launched on these nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not clear it makes sense to protect against the cluster admin ability to circumvent security policies.
|
|
||
| For example, when admins uses clusters provisioned through managed service provider, then they have full access to the cluster. So they can always delete pod security policy or launch pod that tolerates any taints and get launched on these nodes. | ||
|
|
||
| In future if kernel or docker supports any new functionalities and if not all Kubernetes worker nodes in a cluster supports that feature then we need to keep adding these pod admit handlers in kubernetes source code to validate if pod can be admitted by kubelet or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there specific scenarios you have in mind?
|
|
||
| (I’m still working on actual implementation but wanted to put this idea out and get initial feedback before arriving at complete solution) | ||
|
|
||
| Initial proposal is to build something similar to networking plugin architecture where kubelet invokes CNI binary and retrieves the IP associated with pod. In the same way, we can have set of plugins defined on the node which will indicate whether kubelet can admit the pod or not. In future we can move some of the pod admit handler to its own plugin if required. For example, create one plugin specifically for docker and validate pod spec against available docker functionalities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this inverses the problem. the expectation is that a node can satisfy the pod spec requirements. the container runtime is a detail.
|
We discussed about this during our Jan 21st sig/node meeting. Will update the KEP and bring it up again in upcoming sig/node meeting. |
0965c20 to
8201200
Compare
|
@SaranBalaji90: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/assign discussed in sig-node on 4/21, will review and reflect on potential issues.
|
|
/cc |
|
|
||
| ## Proposal | ||
|
|
||
| The approach taken is similar to the container networking interface (CNI) plugin architecture. With CNI, kubelet invokes one or more CNI plugin binaries on the host to set up a Pod’s networking. kubelet discovers available CNI plugins by [examining](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L242) a well-known directory (`/etc/cni/net.d`) for configuration files and [loading](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L248) plugin [descriptors](https://github.com/kubernetes/kubernetes/blob/f4db8212be53c69a27d893d6a4111422fbce8008/pkg/kubelet/dockershim/network/plugins.go#L52) upon startup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dealt with CNI while developed kuryr-kubernetes CNI plugin. And I found cmd line interface of CNI is little bit out dated, it doesnt' conform to actual state, since all known for me CNI nowdays use daemons in runtime/and persistently run in pod and cmd line client to connect to its daemon. So RPC is better, what kind of RPC it's another question. I liked the way podresources were implemeted or device plugins. If neceserity this feature will be proven, I vote for the solution based on RPC (gRPC) working through unix domain socket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under design, I have included the structs for configuration file. There I specified three options for plugin types - binary file, unix socket and local grpc server. We can decide if we need to support just unix socket and remove the other two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unlike device plugins or cni, there is an implication that no other pod is running on this host as part of bootstrapping the host. dns and sdn is configured outside of kube-api model itself.
|
|
||
| The EKS user’s worker node administrative access depends on the type of worker node the EKS user chooses. EKS users have three options. The first option is to bring their own EC2 instances as worker nodes. The second option is for EKS users to launch a managed worker node group. These first two options both result in the EKS user maintaining full host-level administrative rights on the worker nodes. The final option — the option that motivated this proposal — is for the EKS user to forego worker node management entirely using AWS Fargate, a serverless computing environment. With AWS Fargate, the EKS user does not have host-level administrative access to their worker node; in fact, the worker node runs on a serverless computing platform that abstracts away the entire notion of a host. | ||
|
|
||
| In building the AWS EKS support for AWS Fargate, the AWS Kubernetes engineering team faced a dilemma: how could they prevent Pods destined to run on Fargate nodes from using host networking or assuming elevated host user privileges? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are daemonset pods targeting this node type? is there a dns plugin or sdn configured via a daemonset? do things like a node exporter or similar monitoring components exclude this node type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently daemonset pods doesn't run on these node types. But that's something we are looking into.
|
|
||
| The approach taken is similar to the container networking interface (CNI) plugin architecture. With CNI, kubelet invokes one or more CNI plugin binaries on the host to set up a Pod’s networking. kubelet discovers available CNI plugins by [examining](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L242) a well-known directory (`/etc/cni/net.d`) for configuration files and [loading](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L248) plugin [descriptors](https://github.com/kubernetes/kubernetes/blob/f4db8212be53c69a27d893d6a4111422fbce8008/pkg/kubelet/dockershim/network/plugins.go#L52) upon startup. | ||
|
|
||
| To support pluggable validation for pod admission on the worker node, we propose to have kubelet similarly discover node-local Pod admission plugins listed in a new PodAdmissionPluginDir flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the implication here is that dynamic kubelet configuration is disabled on managed nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. I guess you meant, if its enabled then users can update the kubelet configuration and change the plugin dir?
| } | ||
| ``` | ||
|
|
||
| A node-local Pod admission plugin has the following structure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to kube-apiserver admission configuration, i could see wanting to enable more flexibility here outside of a file based configuration surface on the local kubelet. maybe kubelet can source its admission configuration from multiple sources so future scenarios could allow node local extension where desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I guess you mean we shouldn't stick with just file based configuration to read the plugin details but this could also be extended in future to read from different sources.
|
|
||
| Kubelet will reject the Pod if any required capabilities in the Pod.Spec are not supported by the container engine running on the node. Such capabilities might include the ability to set sysctl parameters, use of elevated system privileges or use of a non-default process mount. Likewise, kubelet checks the Pod against node capabilities; for example, the presence of a specific apparmor profile or host kernel. | ||
|
|
||
| These validations represent final, last-minute checks immediately before the Pod is started by the container runtime. These node-local checks differ from API-layer validations like Pod Security Policies or Validating Admission webhooks. Whereas the latter may be deactivated or removed by Kubernetes cluster administrators, the former node-local checks cannot be disabled. As such, they represent a final defense against malicious actors and misconfigured Pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whereas the latter may be deactivated or removed by Kubernetes cluster administrators, the former node-local checks cannot be disabled.
This doesn't make much sense to me. Why are node-local checks different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tallclair node-local checks can be same, its just that these validations cant be removed by cluster administrator, whereas validation webhooks or psp can be removed by cluster admins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1712, which proposes static admission webhooks. Does that address this concern?
|
|
||
| Kubelet will reject the Pod if any required capabilities in the Pod.Spec are not supported by the container engine running on the node. Such capabilities might include the ability to set sysctl parameters, use of elevated system privileges or use of a non-default process mount. Likewise, kubelet checks the Pod against node capabilities; for example, the presence of a specific apparmor profile or host kernel. | ||
|
|
||
| These validations represent final, last-minute checks immediately before the Pod is started by the container runtime. These node-local checks differ from API-layer validations like Pod Security Policies or Validating Admission webhooks. Whereas the latter may be deactivated or removed by Kubernetes cluster administrators, the former node-local checks cannot be disabled. As such, they represent a final defense against malicious actors and misconfigured Pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they represent a final defense against malicious actors and misconfigured Pods
I think it's worth keeping this 2 use cases separate. It makes sense to me to have some protection against misconfigured pods, since not all configuration details are available at the cluster level. However, I'm more skeptical of node-level admission offering a increase in security over cluster-level admission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, will update this. Also, we do perform validations in the control plane and block such misconfigured pods. But if cluster admins removes webhook/psp or have their own scheduler that bypasses our checks then we need something on the node to block such pods from running.
|
|
||
| Amazon Elastic Kubernetes Service (EKS) provides users a managed Kubernetes control plane. EKS users are provisioned a Kubernetes cluster running on AWS cloud infrastructure. While the EKS user does not have host-level administrative access to the master nodes, it is important to point out that they do have administrative rights on that Kubernetes cluster. | ||
|
|
||
| The EKS user’s worker node administrative access depends on the type of worker node the EKS user chooses. EKS users have three options. The first option is to bring their own EC2 instances as worker nodes. The second option is for EKS users to launch a managed worker node group. These first two options both result in the EKS user maintaining full host-level administrative rights on the worker nodes. The final option — the option that motivated this proposal — is for the EKS user to forego worker node management entirely using AWS Fargate, a serverless computing environment. With AWS Fargate, the EKS user does not have host-level administrative access to their worker node; in fact, the worker node runs on a serverless computing platform that abstracts away the entire notion of a host. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please linewrap the whole document (I like 80 chars) so that it's easier to leave comments on parts of the paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that. Will update the doc.
|
|
||
| In building the AWS EKS support for AWS Fargate, the AWS Kubernetes engineering team faced a dilemma: how could they prevent Pods destined to run on Fargate nodes from using host networking or assuming elevated host user privileges? | ||
|
|
||
| The team initially investigated using a Pod Security Policy (PSP) that would prevent Pods with a Fargate scheduler type from having an elevated security context or using host networking. However, because the EKS user has administrative rights on the Kubernetes cluster, API-layer constructs such as a Pod Security Policy may be deleted, which would effectively disable the effect of that PSP. Likewise, the second solution the team landed on — using Node taints and tolerations — was similarly bound to the Kubernetes API layer, which meant EKS users could modify those Node taints and tolerations, effectively disabling the effects. A third potential solution involving OCI hooks was then investigated. OCI hooks are separate executables that an OCI-compatible container runtime invokes that can modify the behaviour of the containers in a sandbox. While this solution would have solved the API-layer problem, it introduced other issues, such as the inefficiency of downloading the container image to the Node before the OCI hook was run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads like you considered some built-in policy controls that didn't work, and then jumped to building a new node-level custom policy enforcement mechanism. What about custom cluster-level policy? We already have AdmissionWebhooks for exactly that reason. If your concern is a clusteradmin being able to mess with the admission webhook, then I would rather consider a statically configured admission webhook (I think this has already been proposed elsewhere?) before proposing a completely new mechanism in the kubelet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with static admission webhook, the problem is we will put heavy load on the controllers right, because webhook is going to reject and controller will keep creating the pods. But adding it as soft admit handler in kubelet, will not put pressure on controllers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if I'm not wrong users in system:masters group can delete the validation webhook right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, #1712 is the proposal I was referring to.
the problem is we will put heavy load on the controllers right, because webhook is going to reject and controller will keep creating the pods.
Controllers will back off. I'm not sure if they treat a rejection on pod creation differently from a failed pod.
| "type": "shell" | ||
| }, | ||
| { | ||
| "type": "fargatecheck", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "type": "fargatecheck", | |
| "name": "fargatecheck", |
|
|
||
| This functionality adds a new feature gate named “PodAdmissionPlugin” which decides whether to invoke admission plugin or not. | ||
|
|
||
| #### Kubelet to pod admission plugin communication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the interface for the shell type? Send over stdin, and get a response over stdout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is similar to how CNI operates today (passing required parameters using env variable and read the response over stdout).
| } | ||
| ``` | ||
|
|
||
| #### Implementation detail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does bootstrapping work? For the non-shell types, it looks like the assumption is that the server is running prior to the Kubelet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes your right server should be running before kubelet can accept pods. I was initially thinking about managing these by the same component that manages kubelet. For eg in linux env, managing it through systemd process. But more I think about this, shell type might be simpler for this approach. Users don't have to monitor one more component on the host. Happy to hear your feedback here.
| } | ||
| ``` | ||
|
|
||
| #### Implementation detail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the failure modes? Fail open or fail closed? How would a failure be debugged? Would kubelet start if it couldn't connect to an admission hook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fail closed, reason being if we can't validate the spec and admit the pod then whatever we are trying to protect might be violated. Kubelet can start even if it couldn't connect to an admission hook, but will not accept pods without validating the pod spec with the plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add these details to the KEP.
|
|
||
| Other option is to enable this functionality through feature flag “enablePodAdmissionPlugin” and have the directory path defined inside the kubelet itself. | ||
|
|
||
| ### Design Details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A problem with the existing kubelet admission approach is that it can cause controllers to thrash. E.g. what if a DaemonSet controller is trying to schedule a pod on the Kubelet, and the kubelet keeps rejecting it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your right about this, but if a pod is rejected by soft admit handler then this doesn't apply right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Historically soft-reject was added because controllers treated a failed pod differently from an error on create. I think this has since been resolved, and controllers should properly backoff on failed pods. It would be good to clarify these interactions in the KEP.
|
|
||
| Other option is to enable this functionality through feature flag “enablePodAdmissionPlugin” and have the directory path defined inside the kubelet itself. | ||
|
|
||
| ### Design Details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would admission still apply to static pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I will look into this. Not sure if kubelet invokes admit handlers for static pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever you conclude, please update the KEP to include it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like admit handlers are invoked on static pods too. I will update the KEP to reflect this. Thanks.
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
|
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
|
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
|
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
We didn't go with this approach as https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1872-manifest-based-admission-webhooks will help to achieve the same result. |
OCPCLOUD-1910: Installing Cluster API components in OpenShift
This is initial proposal doc for adding "out of tree" plugin support for pod admission handler. It is not currently possible to add additional validations (without changing kubelet code) before admitting the Pod. This PR provides flexibility for adding such validations without updating kubelet source code.