Add node-level plugin support for pod admission handler #1461

SaranBalaji90 · 2020-01-16T05:12:03Z

This is initial proposal doc for adding "out of tree" plugin support for pod admission handler. It is not currently possible to add additional validations (without changing kubelet code) before admitting the Pod. This PR provides flexibility for adding such validations without updating kubelet source code.

k8s-ci-robot · 2020-01-16T05:12:40Z

Welcome @SaranBalaji90!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2020-01-16T05:12:40Z

Hi @SaranBalaji90. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

riking

Should this be integrated with the CRI?

riking · 2020-01-17T01:27:44Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+This functionality will provide cluster admins the ability to restrict specific pods to run on subset of their worker nodes. This includes, restricting pod using host networking, host’s pid namespace, some specific volumes, etc. Even though this can be achieved through other functionalities like, pod security policies, taints and toleration features there are some caveats where pod might still end up with this subset of worker nodes.
+
+For example, when admins uses clusters provisioned through managed service provider, then they have full access to the cluster. So they can always delete pod security policy or launch pod that tolerates any taints and get launched on these nodes.


"Cluster admins could simply set the 'node' field of a pod, bypassing any constraints enforced by a scheduler or its plugins, such as taints."

its not clear it makes sense to protect against the cluster admin ability to circumvent security policies.

riking · 2020-01-17T01:30:34Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+In future if kernel or docker supports any new functionalities and if not all Kubernetes worker nodes in a cluster supports that feature then we need to keep adding these pod admit handlers in kubernetes source code to validate if pod can be admitted by kubelet or not. 
+
+This KEP is to discuss if we can move these validations out of kubernetes source code to a separate plugin that runs outside kubernetes. By this way, we don’t have to update kubernetes source code for every new functionalities supported in future.


A finished KEP describes an enhancement, and is not an invitation to discussion.

@riking sorry, meant to say this KEP is to propose changes to move some stuff out of k8s source code. Will update this along with implementation detail of whatever we decide on.

SaranBalaji90 · 2020-01-17T05:09:05Z

Should this be integrated with the CRI?

that's a good suggestion. Haven't looked at CRI in depth, will take a look at this.

derekwaynecarr

I am not adverse to additional plugins for kubelet admission that are not compiled w/ the kubelet, but I feel like the kubelet should have the function to validate the pod spec.

derekwaynecarr · 2020-01-21T16:30:56Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+## Summary
+
+Today, kubelet on the worker node performs known set of validations and decides if the pod can be allowed to execute on the node or not. Kubelet rejects the pods if any of the requirements in pod spec is not supported by docker running on the node. For example pods requiring to update specific sysctl parameters or requiring privileged escalations or using non default proc mount. Not just based on docker capabilities, kubelet rejects pod based on node capabilities as well. For example, it rejects pods which requires specific apparmor profile if its not supported by the kernel and also based on topology manager component decision.


nit: s/docker/container-engine to reflect cri

derekwaynecarr · 2020-01-21T16:33:16Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+This functionality will provide cluster admins the ability to restrict specific pods to run on subset of their worker nodes. This includes, restricting pod using host networking, host’s pid namespace, some specific volumes, etc. Even though this can be achieved through other functionalities like, pod security policies, taints and toleration features there are some caveats where pod might still end up with this subset of worker nodes.
+
+For example, when admins uses clusters provisioned through managed service provider, then they have full access to the cluster. So they can always delete pod security policy or launch pod that tolerates any taints and get launched on these nodes.


its not clear it makes sense to protect against the cluster admin ability to circumvent security policies.

derekwaynecarr · 2020-01-21T16:33:39Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+For example, when admins uses clusters provisioned through managed service provider, then they have full access to the cluster. So they can always delete pod security policy or launch pod that tolerates any taints and get launched on these nodes.
+
+In future if kernel or docker supports any new functionalities and if not all Kubernetes worker nodes in a cluster supports that feature then we need to keep adding these pod admit handlers in kubernetes source code to validate if pod can be admitted by kubelet or not. 


are there specific scenarios you have in mind?

derekwaynecarr · 2020-01-21T16:56:07Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+(I’m still working on actual implementation but wanted to put this idea out and get initial feedback before arriving at complete solution) 
+
+Initial proposal is to build something similar to networking plugin architecture where kubelet invokes CNI binary and retrieves the IP associated with pod. In the same way, we can have set of plugins defined on the node which will indicate whether kubelet can admit the pod or not. In future we can move some of the pod admit handler to its own plugin if required. For example, create one plugin specifically for docker and validate pod spec against available docker functionalities.


i think this inverses the problem. the expectation is that a node can satisfy the pod spec requirements. the container runtime is a detail.

SaranBalaji90 · 2020-01-24T18:21:45Z

We discussed about this during our Jan 21st sig/node meeting. Will update the KEP and bring it up again in upcoming sig/node meeting.

k8s-ci-robot · 2020-04-20T17:08:15Z

@SaranBalaji90: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-enhancements-verify	`aab0d07`	link	`/test pull-enhancements-verify`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

derekwaynecarr · 2020-04-21T17:37:45Z

/assign

discussed in sig-node on 4/21, will review and reflect on potential issues.

what happens if a pod is rejected by this admission plugin?
could a status get reported on why the rejection happened?
similar issue has arisen with in-tree admission handlers in kubelet today
what is anticipated maintenance cycle of this plugin?
does it tie to kubelet readiness (akin to CNI)?
have you considered a scheduler extension instead of a node local kubelet approach?

dashpole · 2020-04-21T17:56:55Z

/cc

AlexeyPerevalov · 2020-04-21T17:44:27Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+## Proposal
+
+The approach taken is similar to the container networking interface (CNI) plugin architecture. With CNI, kubelet invokes one or more CNI plugin binaries on the host to set up a Pod’s networking. kubelet discovers available CNI plugins by [examining](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L242) a well-known directory (`/etc/cni/net.d`) for configuration files and [loading](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L248) plugin [descriptors](https://github.com/kubernetes/kubernetes/blob/f4db8212be53c69a27d893d6a4111422fbce8008/pkg/kubelet/dockershim/network/plugins.go#L52) upon startup.


I dealt with CNI while developed kuryr-kubernetes CNI plugin. And I found cmd line interface of CNI is little bit out dated, it doesnt' conform to actual state, since all known for me CNI nowdays use daemons in runtime/and persistently run in pod and cmd line client to connect to its daemon. So RPC is better, what kind of RPC it's another question. I liked the way podresources were implemeted or device plugins. If neceserity this feature will be proven, I vote for the solution based on RPC (gRPC) working through unix domain socket.

Under design, I have included the structs for configuration file. There I specified three options for plugin types - binary file, unix socket and local grpc server. We can decide if we need to support just unix socket and remove the other two.

unlike device plugins or cni, there is an implication that no other pod is running on this host as part of bootstrapping the host. dns and sdn is configured outside of kube-api model itself.

derekwaynecarr · 2020-05-05T15:08:36Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+The EKS user’s worker node administrative access depends on the type of worker node the EKS user chooses. EKS users have three options. The first option is to bring their own EC2 instances as worker nodes. The second option is for EKS users to launch a managed worker node group. These first two options both result in the EKS user maintaining full host-level administrative rights on the worker nodes. The final option — the option that motivated this proposal — is for the EKS user to forego worker node management entirely using AWS Fargate, a serverless computing environment. With AWS Fargate, the EKS user does not have host-level administrative access to their worker node; in fact, the worker node runs on a serverless computing platform that abstracts away the entire notion of a host.
+
+In building the AWS EKS support for AWS Fargate, the AWS Kubernetes engineering team faced a dilemma: how could they prevent Pods destined to run on Fargate nodes from using host networking or assuming elevated host user privileges?


are daemonset pods targeting this node type? is there a dns plugin or sdn configured via a daemonset? do things like a node exporter or similar monitoring components exclude this node type?

Currently daemonset pods doesn't run on these node types. But that's something we are looking into.

derekwaynecarr · 2020-05-05T16:43:19Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+The approach taken is similar to the container networking interface (CNI) plugin architecture. With CNI, kubelet invokes one or more CNI plugin binaries on the host to set up a Pod’s networking. kubelet discovers available CNI plugins by [examining](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L242) a well-known directory (`/etc/cni/net.d`) for configuration files and [loading](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L248) plugin [descriptors](https://github.com/kubernetes/kubernetes/blob/f4db8212be53c69a27d893d6a4111422fbce8008/pkg/kubelet/dockershim/network/plugins.go#L52) upon startup.
+
+To support pluggable validation for pod admission on the worker node, we propose to have kubelet similarly discover node-local Pod admission plugins listed in a new PodAdmissionPluginDir flag.


the implication here is that dynamic kubelet configuration is disabled on managed nodes.

That's right. I guess you meant, if its enabled then users can update the kubelet configuration and change the plugin dir?

derekwaynecarr · 2020-05-05T16:46:08Z

keps/sig-node/20200115-pod-addmission-plugin.md

+}
+```
+
+A node-local Pod admission plugin has the following structure:


similar to kube-apiserver admission configuration, i could see wanting to enable more flexibility here outside of a file based configuration surface on the local kubelet. maybe kubelet can source its admission configuration from multiple sources so future scenarios could allow node local extension where desired.

That's a good idea. I guess you mean we shouldn't stick with just file based configuration to read the plugin details but this could also be extended in future to read from different sources.

tallclair · 2020-05-06T21:27:04Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+Kubelet will reject the Pod if any required  capabilities in the Pod.Spec are not supported by the container engine running on the node. Such capabilities might include the ability to set sysctl parameters, use of elevated system privileges or use of a non-default process mount. Likewise, kubelet checks the Pod against node capabilities; for example, the presence of a specific apparmor profile or host kernel.
+
+These validations represent final, last-minute checks immediately before the Pod is started by the container runtime. These node-local checks differ from API-layer validations like Pod Security Policies or Validating Admission webhooks. Whereas the latter may be deactivated or removed by Kubernetes cluster administrators, the former node-local checks cannot be disabled. As such, they represent a final defense against malicious actors and misconfigured Pods.


Whereas the latter may be deactivated or removed by Kubernetes cluster administrators, the former node-local checks cannot be disabled.

This doesn't make much sense to me. Why are node-local checks different?

@tallclair node-local checks can be same, its just that these validations cant be removed by cluster administrator, whereas validation webhooks or psp can be removed by cluster admins.

See #1712, which proposes static admission webhooks. Does that address this concern?

tallclair · 2020-05-06T21:28:54Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+Kubelet will reject the Pod if any required  capabilities in the Pod.Spec are not supported by the container engine running on the node. Such capabilities might include the ability to set sysctl parameters, use of elevated system privileges or use of a non-default process mount. Likewise, kubelet checks the Pod against node capabilities; for example, the presence of a specific apparmor profile or host kernel.
+
+These validations represent final, last-minute checks immediately before the Pod is started by the container runtime. These node-local checks differ from API-layer validations like Pod Security Policies or Validating Admission webhooks. Whereas the latter may be deactivated or removed by Kubernetes cluster administrators, the former node-local checks cannot be disabled. As such, they represent a final defense against malicious actors and misconfigured Pods.


they represent a final defense against malicious actors and misconfigured Pods

I think it's worth keeping this 2 use cases separate. It makes sense to me to have some protection against misconfigured pods, since not all configuration details are available at the cluster level. However, I'm more skeptical of node-level admission offering a increase in security over cluster-level admission.

makes sense, will update this. Also, we do perform validations in the control plane and block such misconfigured pods. But if cluster admins removes webhook/psp or have their own scheduler that bypasses our checks then we need something on the node to block such pods from running.

tallclair · 2020-05-06T22:19:53Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+Amazon Elastic Kubernetes Service (EKS) provides users a managed Kubernetes control plane. EKS users are provisioned a Kubernetes cluster running on AWS cloud infrastructure. While the EKS user does not have host-level administrative access to the master nodes, it is important to point out that they do have administrative rights on that Kubernetes cluster.
+
+The EKS user’s worker node administrative access depends on the type of worker node the EKS user chooses. EKS users have three options. The first option is to bring their own EC2 instances as worker nodes. The second option is for EKS users to launch a managed worker node group. These first two options both result in the EKS user maintaining full host-level administrative rights on the worker nodes. The final option — the option that motivated this proposal — is for the EKS user to forego worker node management entirely using AWS Fargate, a serverless computing environment. With AWS Fargate, the EKS user does not have host-level administrative access to their worker node; in fact, the worker node runs on a serverless computing platform that abstracts away the entire notion of a host.


nit: please linewrap the whole document (I like 80 chars) so that it's easier to leave comments on parts of the paragraph.

Sorry about that. Will update the doc.

tallclair · 2020-05-06T22:23:41Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+In building the AWS EKS support for AWS Fargate, the AWS Kubernetes engineering team faced a dilemma: how could they prevent Pods destined to run on Fargate nodes from using host networking or assuming elevated host user privileges?
+
+The team initially investigated using a Pod Security Policy (PSP) that would prevent Pods with a Fargate scheduler type from having an elevated security context or using host networking. However, because the EKS user has administrative rights on the Kubernetes cluster, API-layer constructs such as a Pod Security Policy may be deleted, which would effectively disable the effect of that PSP. Likewise, the second solution the team landed on — using Node taints and tolerations — was similarly bound to the Kubernetes API layer, which meant EKS users could modify those Node taints and tolerations, effectively disabling the effects. A third potential solution involving OCI hooks was then investigated. OCI hooks are separate executables that an OCI-compatible container runtime invokes that can modify the behaviour of the containers in a sandbox. While this solution would have solved the API-layer problem, it introduced other issues, such as the inefficiency of downloading the container image to the Node before the OCI hook was run.


This reads like you considered some built-in policy controls that didn't work, and then jumped to building a new node-level custom policy enforcement mechanism. What about custom cluster-level policy? We already have AdmissionWebhooks for exactly that reason. If your concern is a clusteradmin being able to mess with the admission webhook, then I would rather consider a statically configured admission webhook (I think this has already been proposed elsewhere?) before proposing a completely new mechanism in the kubelet.

Even with static admission webhook, the problem is we will put heavy load on the controllers right, because webhook is going to reject and controller will keep creating the pods. But adding it as soft admit handler in kubelet, will not put pressure on controllers.

Also, if I'm not wrong users in system:masters group can delete the validation webhook right.

BTW, #1712 is the proposal I was referring to.

the problem is we will put heavy load on the controllers right, because webhook is going to reject and controller will keep creating the pods.

Controllers will back off. I'm not sure if they treat a rejection on pod creation differently from a failed pod.

tallclair · 2020-05-07T00:55:30Z

keps/sig-node/20200115-pod-addmission-plugin.md

+      "type": "shell"
+    },
+    {
+      "type": "fargatecheck",


Suggested change

"type": "fargatecheck",

"name": "fargatecheck",

tallclair · 2020-05-07T00:56:42Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+This functionality adds a new feature gate named “PodAdmissionPlugin” which decides whether to invoke admission plugin or not.
+
+#### Kubelet to pod admission plugin communication


What is the interface for the shell type? Send over stdin, and get a response over stdout?

yes, this is similar to how CNI operates today (passing required parameters using env variable and read the response over stdout).

tallclair · 2020-05-07T00:58:12Z

keps/sig-node/20200115-pod-addmission-plugin.md

+}
+```
+
+#### Implementation detail


How does bootstrapping work? For the non-shell types, it looks like the assumption is that the server is running prior to the Kubelet?

yes your right server should be running before kubelet can accept pods. I was initially thinking about managing these by the same component that manages kubelet. For eg in linux env, managing it through systemd process. But more I think about this, shell type might be simpler for this approach. Users don't have to monitor one more component on the host. Happy to hear your feedback here.

tallclair · 2020-05-07T00:59:57Z

keps/sig-node/20200115-pod-addmission-plugin.md

+}
+```
+
+#### Implementation detail


What are the failure modes? Fail open or fail closed? How would a failure be debugged? Would kubelet start if it couldn't connect to an admission hook?

It should be fail closed, reason being if we can't validate the spec and admit the pod then whatever we are trying to protect might be violated. Kubelet can start even if it couldn't connect to an admission hook, but will not accept pods without validating the pod spec with the plugin.

Please add these details to the KEP.

tallclair · 2020-05-07T01:01:15Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+Other option is to enable this functionality through feature flag “enablePodAdmissionPlugin” and have the directory path defined inside the kubelet itself.
+
+### Design Details


A problem with the existing kubelet admission approach is that it can cause controllers to thrash. E.g. what if a DaemonSet controller is trying to schedule a pod on the Kubelet, and the kubelet keeps rejecting it?

Your right about this, but if a pod is rejected by soft admit handler then this doesn't apply right.

Yeah. Historically soft-reject was added because controllers treated a failed pod differently from an error on create. I think this has since been resolved, and controllers should properly backoff on failed pods. It would be good to clarify these interactions in the KEP.

tallclair · 2020-05-07T01:01:47Z

keps/sig-node/20200115-pod-addmission-plugin.md

+
+Other option is to enable this functionality through feature flag “enablePodAdmissionPlugin” and have the directory path defined inside the kubelet itself.
+
+### Design Details


Would admission still apply to static pods?

Good question, I will look into this. Not sure if kubelet invokes admit handlers for static pods.

Whatever you conclude, please update the KEP to include it.

Looks like admit handlers are invoked on static pods too. I will update the KEP to reflect this. Thanks.

fejta-bot · 2020-08-20T22:05:52Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-09-19T22:49:15Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-10-19T23:31:59Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-10-19T23:32:12Z

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SaranBalaji90 · 2021-01-29T01:56:59Z

We didn't go with this approach as https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1872-manifest-based-admission-webhooks will help to achieve the same result.

OCPCLOUD-1910: Installing Cluster API components in OpenShift

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 16, 2020

k8s-ci-robot requested review from dchen1107 and derekwaynecarr January 16, 2020 05:12

SaranBalaji90 changed the title ~~Add plugin support for pod admission handler~~ [WIP] Add plugin support for pod admission handler Jan 16, 2020

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 16, 2020

riking reviewed Jan 17, 2020

View reviewed changes

SaranBalaji90 closed this Jan 17, 2020

SaranBalaji90 reopened this Jan 17, 2020

derekwaynecarr reviewed Jan 21, 2020

View reviewed changes

SaranBalaji90 force-pushed the master branch 3 times, most recently from 0965c20 to 8201200 Compare April 15, 2020 17:32

k8s-ci-robot added the sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. label Apr 15, 2020

SaranBalaji90 force-pushed the master branch from 8201200 to 0df6522 Compare April 20, 2020 17:00

Add plugin support for pod admission handler

aab0d07

SaranBalaji90 force-pushed the master branch from 0df6522 to aab0d07 Compare April 20, 2020 17:07

k8s-ci-robot assigned derekwaynecarr Apr 21, 2020

k8s-ci-robot requested a review from dashpole April 21, 2020 17:56

AlexeyPerevalov reviewed Apr 22, 2020

View reviewed changes

derekwaynecarr reviewed May 5, 2020

View reviewed changes

tallclair reviewed May 7, 2020

View reviewed changes

tallclair self-assigned this May 19, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 19, 2020

k8s-ci-robot closed this Oct 19, 2020

SaranBalaji90 mentioned this pull request Jul 6, 2021

Add node-level plugin support for pod admission handler #2811

Closed

SaranBalaji90 mentioned this pull request Jul 14, 2021

Add Node-local plugin support for pod admit handler #2823

Closed

3 tasks

ingvagabund pushed a commit to ingvagabund/enhancements that referenced this pull request Feb 26, 2025

Merge pull request kubernetes#1461 from damdo/add-cluster-api-install

ebc0646

OCPCLOUD-1910: Installing Cluster API components in OpenShift


		This functionality will provide cluster admins the ability to restrict specific pods to run on subset of their worker nodes. This includes, restricting pod using host networking, host’s pid namespace, some specific volumes, etc. Even though this can be achieved through other functionalities like, pod security policies, taints and toleration features there are some caveats where pod might still end up with this subset of worker nodes.

		For example, when admins uses clusters provisioned through managed service provider, then they have full access to the cluster. So they can always delete pod security policy or launch pod that tolerates any taints and get launched on these nodes.


		In future if kernel or docker supports any new functionalities and if not all Kubernetes worker nodes in a cluster supports that feature then we need to keep adding these pod admit handlers in kubernetes source code to validate if pod can be admitted by kubelet or not.

		This KEP is to discuss if we can move these validations out of kubernetes source code to a separate plugin that runs outside kubernetes. By this way, we don’t have to update kubernetes source code for every new functionalities supported in future.


		## Summary

		Today, kubelet on the worker node performs known set of validations and decides if the pod can be allowed to execute on the node or not. Kubelet rejects the pods if any of the requirements in pod spec is not supported by docker running on the node. For example pods requiring to update specific sysctl parameters or requiring privileged escalations or using non default proc mount. Not just based on docker capabilities, kubelet rejects pod based on node capabilities as well. For example, it rejects pods which requires specific apparmor profile if its not supported by the kernel and also based on topology manager component decision.


		(I’m still working on actual implementation but wanted to put this idea out and get initial feedback before arriving at complete solution)

		Initial proposal is to build something similar to networking plugin architecture where kubelet invokes CNI binary and retrieves the IP associated with pod. In the same way, we can have set of plugins defined on the node which will indicate whether kubelet can admit the pod or not. In future we can move some of the pod admit handler to its own plugin if required. For example, create one plugin specifically for docker and validate pod spec against available docker functionalities.


		## Proposal

		The approach taken is similar to the container networking interface (CNI) plugin architecture. With CNI, kubelet invokes one or more CNI plugin binaries on the host to set up a Pod’s networking. kubelet discovers available CNI plugins by [examining](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L242) a well-known directory (`/etc/cni/net.d`) for configuration files and [loading](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L248) plugin [descriptors](https://github.com/kubernetes/kubernetes/blob/f4db8212be53c69a27d893d6a4111422fbce8008/pkg/kubelet/dockershim/network/plugins.go#L52) upon startup.


		The EKS user’s worker node administrative access depends on the type of worker node the EKS user chooses. EKS users have three options. The first option is to bring their own EC2 instances as worker nodes. The second option is for EKS users to launch a managed worker node group. These first two options both result in the EKS user maintaining full host-level administrative rights on the worker nodes. The final option — the option that motivated this proposal — is for the EKS user to forego worker node management entirely using AWS Fargate, a serverless computing environment. With AWS Fargate, the EKS user does not have host-level administrative access to their worker node; in fact, the worker node runs on a serverless computing platform that abstracts away the entire notion of a host.

		In building the AWS EKS support for AWS Fargate, the AWS Kubernetes engineering team faced a dilemma: how could they prevent Pods destined to run on Fargate nodes from using host networking or assuming elevated host user privileges?


		The approach taken is similar to the container networking interface (CNI) plugin architecture. With CNI, kubelet invokes one or more CNI plugin binaries on the host to set up a Pod’s networking. kubelet discovers available CNI plugins by [examining](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L242) a well-known directory (`/etc/cni/net.d`) for configuration files and [loading](https://github.com/kubernetes/kubernetes/blob/dd5272b76f07bea60628af0bb793f3cca385bf5e/pkg/kubelet/dockershim/docker_service.go#L248) plugin [descriptors](https://github.com/kubernetes/kubernetes/blob/f4db8212be53c69a27d893d6a4111422fbce8008/pkg/kubelet/dockershim/network/plugins.go#L52) upon startup.

		To support pluggable validation for pod admission on the worker node, we propose to have kubelet similarly discover node-local Pod admission plugins listed in a new PodAdmissionPluginDir flag.


		Kubelet will reject the Pod if any required capabilities in the Pod.Spec are not supported by the container engine running on the node. Such capabilities might include the ability to set sysctl parameters, use of elevated system privileges or use of a non-default process mount. Likewise, kubelet checks the Pod against node capabilities; for example, the presence of a specific apparmor profile or host kernel.

		These validations represent final, last-minute checks immediately before the Pod is started by the container runtime. These node-local checks differ from API-layer validations like Pod Security Policies or Validating Admission webhooks. Whereas the latter may be deactivated or removed by Kubernetes cluster administrators, the former node-local checks cannot be disabled. As such, they represent a final defense against malicious actors and misconfigured Pods.


		Amazon Elastic Kubernetes Service (EKS) provides users a managed Kubernetes control plane. EKS users are provisioned a Kubernetes cluster running on AWS cloud infrastructure. While the EKS user does not have host-level administrative access to the master nodes, it is important to point out that they do have administrative rights on that Kubernetes cluster.

		The EKS user’s worker node administrative access depends on the type of worker node the EKS user chooses. EKS users have three options. The first option is to bring their own EC2 instances as worker nodes. The second option is for EKS users to launch a managed worker node group. These first two options both result in the EKS user maintaining full host-level administrative rights on the worker nodes. The final option — the option that motivated this proposal — is for the EKS user to forego worker node management entirely using AWS Fargate, a serverless computing environment. With AWS Fargate, the EKS user does not have host-level administrative access to their worker node; in fact, the worker node runs on a serverless computing platform that abstracts away the entire notion of a host.


		In building the AWS EKS support for AWS Fargate, the AWS Kubernetes engineering team faced a dilemma: how could they prevent Pods destined to run on Fargate nodes from using host networking or assuming elevated host user privileges?

		The team initially investigated using a Pod Security Policy (PSP) that would prevent Pods with a Fargate scheduler type from having an elevated security context or using host networking. However, because the EKS user has administrative rights on the Kubernetes cluster, API-layer constructs such as a Pod Security Policy may be deleted, which would effectively disable the effect of that PSP. Likewise, the second solution the team landed on — using Node taints and tolerations — was similarly bound to the Kubernetes API layer, which meant EKS users could modify those Node taints and tolerations, effectively disabling the effects. A third potential solution involving OCI hooks was then investigated. OCI hooks are separate executables that an OCI-compatible container runtime invokes that can modify the behaviour of the containers in a sandbox. While this solution would have solved the API-layer problem, it introduced other issues, such as the inefficiency of downloading the container image to the Node before the OCI hook was run.


		This functionality adds a new feature gate named “PodAdmissionPlugin” which decides whether to invoke admission plugin or not.

		#### Kubelet to pod admission plugin communication


		Other option is to enable this functionality through feature flag “enablePodAdmissionPlugin” and have the directory path defined inside the kubelet itself.

		### Design Details

Add node-level plugin support for pod admission handler #1461

Add node-level plugin support for pod admission handler #1461

Uh oh!

Conversation

SaranBalaji90 commented Jan 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jan 16, 2020

Uh oh!

k8s-ci-robot commented Jan 16, 2020

Uh oh!

riking left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SaranBalaji90 commented Jan 17, 2020

Uh oh!

derekwaynecarr left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SaranBalaji90 commented Jan 24, 2020

Uh oh!

k8s-ci-robot commented Apr 20, 2020

Uh oh!

derekwaynecarr commented Apr 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dashpole commented Apr 21, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SaranBalaji90 May 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SaranBalaji90 commented Jan 16, 2020 •

edited

Loading

derekwaynecarr commented Apr 21, 2020 •

edited

Loading

SaranBalaji90 May 19, 2020 •

edited

Loading

SaranBalaji90 May 19, 2020 •

edited

Loading

tallclair May 7, 2020 •

edited

Loading