Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax namespace restriction for critical pods #76310

Open
wants to merge 3 commits into
base: master
from

Conversation

@ravisantoshgudimetla
Copy link
Contributor

commented Apr 9, 2019

What type of PR is this?

/kind bug

What this PR does / why we need it:

I think, we should relax the restriction on critical pods to be created within kube-system, I have outlined an approach which ensures that, initial quota will be restricted to kube-system namespace which can later be expanded to other namespaces(https://github.com/kubernetes/kubernetes/pull/76310/files#diff-4e1245d9faac6e47c01c1c068f4de517R199). Let me know, if I have missed something.

Which issue(s) this PR fixes:

Fixes #76308

Special notes for your reviewer:
/cc @bsalamat @vikaschoudhary16

Does this PR introduce a user-facing change?:

Allow critical pods to be created within namespaces other than kube-system
@ravisantoshgudimetla

This comment has been minimized.

Copy link
Contributor Author

commented Apr 9, 2019

/sig scheduling
/kind bug

return admission.NewForbidden(a, fmt.Errorf("pods with %v priorityClass is not permitted in %v namespace", pcName, a.GetNamespace()))
// If ResourceQuotaScopeSelectors is enabled, we should let critical priorityClass to be created
// any namespace where administrator wants it to be created.
// TODO: @ravig Add a default quota at cluster bootstrap which limits critical priorityClasses

This comment has been minimized.

Copy link
@vikaschoudhary16

vikaschoudhary16 Apr 9, 2019

Member

I think we would desire that quota is required mandatorily to be present in a namespace for allowing SystemPriorityClasses to be created in that namespace. To enforce this, AdmissionConfiguration object with matching scopes for SystemPriorityClasses must be configured.

Then to allow creation, quota can be created in desired namespaces.

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla Apr 9, 2019

Author Contributor

Thnx, @vikaschoudhary16, yes that's what I am planning to do. Will create an admission config obj with SystemPriorityClasses in kube-system namespace at the bootstrap phase so that we are backwards compatible.

/cc @bsalamat

This comment has been minimized.

Copy link
@bsalamat

bsalamat Apr 9, 2019

Member

The problem with this approach is that users will need to add extra configuration to their clusters when they enable ResourceQuotaScopeSelectors. If they don't, this change may create a security gap in their clusters.
Whatever solution we propose here should not need extra configuration by users.

@bsalamat
Copy link
Member

left a comment

This restriction was an informed decision that we made to prevent a potential vulnerability where a user may create a lot of high priority pods that would preempt system critical components. I still believe that other pods in the system shouldn't be run with those predefined priority classes reserved for system critical components. Why pods in other namespaces need to run with these special priority classes instead of running with a user defined priority that is higher than any other priority, but lower than the system critical priorities?

// any namespace where administrator wants it to be created.
// TODO: @ravig Add a default quota at cluster bootstrap which limits critical priorityClasses
// to be in kube-system namespace. This can be modified later.
if !utilfeature.DefaultFeatureGate.Enabled(features.ResourceQuotaScopeSelectors) {

This comment has been minimized.

Copy link
@bsalamat

bsalamat Apr 9, 2019

Member

Our recent benchmarks show that checking feature gates is relatively slow. We should avoid checking feature gates here. Instead we should check the feature gates at the time of plugin creation and store it in a member variable and use that variable instead.

@trdavi2

This comment has been minimized.

Copy link

commented Apr 24, 2019

Is there any estimate as to when this will be merged and released? My team and I could really use it

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the ravisantoshgudimetla:fix-priority-quota branch from f989af8 to c00ebc2 May 13, 2019

@k8s-ci-robot k8s-ci-robot added size/XL and removed size/S labels May 13, 2019

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the ravisantoshgudimetla:fix-priority-quota branch from c00ebc2 to b392aa2 May 13, 2019

@k8s-ci-robot k8s-ci-robot added size/XXL and removed size/XL labels May 13, 2019

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the ravisantoshgudimetla:fix-priority-quota branch from b392aa2 to 9dd6107 May 13, 2019

@k8s-ci-robot k8s-ci-robot added size/L and removed size/XXL labels May 13, 2019

lister schedulingv1listers.PriorityClassLister
client kubernetes.Interface
lister schedulingv1listers.PriorityClassLister
features featuregate.FeatureGate

This comment has been minimized.

Copy link
@deads2k

deads2k May 28, 2019

Contributor

no. Use the bools for the things you want to check and assign them.

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 28, 2019

Author Contributor

Looking at this comment #76310 (comment), I made featuregates part of plugin, we don't assign here rather check for those values. If we set it as bool I cannot think of cleaner way to get status for all the featuregates we need inside the admission plugin.

@@ -42,6 +43,7 @@ func Register(plugins *admission.Plugins) {
// load the configuration provided (if any)
configuration, err := LoadConfiguration(config)
if err != nil {
klog.Infof("Error while loading default config %v", err)

This comment has been minimized.

Copy link
@deads2k

deads2k May 28, 2019

Contributor

this looks extraneous. Why isn't it normally reported?

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 28, 2019

Author Contributor

Ya, it wasn't showing up in my log when testing locally. I did not debug it though. I will revert this for now and come back to it after debugging.

// LoadConfiguration loads the provided configuration.
func LoadConfiguration(config io.Reader) (*resourcequotaapi.Configuration, error) {
// if no config is provided, return a default configuration
if config == nil {
externalConfig := &resourcequotav1beta1.Configuration{}
scheme.Default(externalConfig)

This comment has been minimized.

Copy link
@deads2k

deads2k May 28, 2019

Contributor

did you just lose defaulting?

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 28, 2019

Author Contributor

I did that purposefully. I did not notice any defaulterFuncs for this type. Not sure, if this is needed for every type? Do we need to?

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 29, 2019

Author Contributor

Added it back.

@@ -63,5 +79,7 @@ func LoadConfiguration(config io.Reader) (*resourcequotaapi.Configuration, error
if !ok {
return nil, fmt.Errorf("unexpected type: %T", decodedObj)
}
// append default quota admission config
resourceQuotaConfiguration.LimitedResources = append(resourceQuotaConfiguration.LimitedResources, defaultLimitedResources()...)

This comment has been minimized.

Copy link
@deads2k

deads2k May 28, 2019

Contributor

ux on this is weird. Is it a default or is requiredLimitedResources

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 28, 2019

Author Contributor

So, if the user provides some limitedResources this value will be appended. It's not a default.

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the ravisantoshgudimetla:fix-priority-quota branch from 2e3c3ad to 7f2501a May 28, 2019

if err != nil {
return fmt.Errorf("unable to add default system priority classes: %v", err)
// Unable to get the default resource quota for reasons other than "not found".
klog.Warning("Unable to get system default resourceQuota %v: %v. Retrying..", DefaultSystemQuota, err)

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 28, 2019

Author Contributor

Do you want this to be logged at V(5) and above @deads2k?

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the ravisantoshgudimetla:fix-priority-quota branch 3 times, most recently from b85a45b to ef7cd19 May 28, 2019

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the ravisantoshgudimetla:fix-priority-quota branch from ef7cd19 to 9b3a2d6 May 28, 2019

@ravisantoshgudimetla

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

/retest

}

// newFeatureGates returns the list of needed featuregates for this plugin to be used.
func newFeatureGates() featureGateStatus {

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 29, 2019

Author Contributor

@deads2k - Are you fine with this initialization?

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the ravisantoshgudimetla:fix-priority-quota branch from e56ad66 to c36eba3 May 29, 2019

@timmycarr

This comment has been minimized.

Copy link

commented May 29, 2019

Hi! Friendly reminder from your release team: we are starting the code freeze for 1.15 tomorrow EOD. Looks like we have movement here. Can I get quick confirmation that this is still planned for the 1.15 cycle?

@ravisantoshgudimetla

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

/retest

@ravisantoshgudimetla

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

Can I get quick confirmation that this is still planned for the 1.15 cycle?

It's still planned for 1.15

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the ravisantoshgudimetla:fix-priority-quota branch from c36eba3 to 9a5aea2 May 29, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 29, 2019

@ravisantoshgudimetla: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e 9dd6107 link /test pull-kubernetes-local-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ravisantoshgudimetla

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2019

/retest

}
}

func AddSystemPriorityClasses(hookContext genericapiserver.PostStartHookContext) error {

This comment has been minimized.

Copy link
@bsalamat

bsalamat May 30, 2019

Member

I don't think this function needs to be public.

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 30, 2019

Author Contributor

Sure, I can make these functions private.

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 30, 2019

Author Contributor

Done..

// This quota can be deleted/updated by cluster-admin. If the cluster-admin wants critical pods to be created in
// namespace other than `kube-system`, he/she can configure quotas which allows these critical pods to be created
// in that namespace.
func AddDefaultSystemQuota(hookContext genericapiserver.PostStartHookContext) error {

This comment has been minimized.

Copy link
@bsalamat

bsalamat May 30, 2019

Member

This function can be private too.

return admission.NewForbidden(a, fmt.Errorf("pods with %v priorityClass is not permitted in %v namespace", pcName, a.GetNamespace()))
// If ResourceQuotaScopeSelectors is enabled, we should let pods with critical priorityClass to be created
// any namespace where administrator wants it to be created.
if !p.featureGates.resourceQuotaFeatureGateEnabled {

This comment has been minimized.

Copy link
@bsalamat

bsalamat May 30, 2019

Member

This is a risky change. In our current code, we protect clusters from users who may create pods with system-critical priority causing preemption of important workloads. The protection is enabled regardless of existence of quota or whether the quota feature is enabled. With this change, users can create pods with system-critical priority if quota is disabled.

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 30, 2019

Author Contributor

You mean users can create pods with system-critical priority in any namespace when resourceQuotaScopeSelectors is disabled? If yes, I don't think so - You cannot create critical pods in any namespace other than kube-system, if you try to create cluster-critical pods, it would throw an error because of quota violation with current change. If resourcequota is disabled, we will go back to priority admission controller rejecting pod creation in namespaces other than kube-system namespace.

This comment has been minimized.

Copy link
@bsalamat

bsalamat May 30, 2019

Member

You are right. I missed it initially.

// append pods as limited resources so that we can limit the number of critical pods to be created. We have
// a matching default quota in `kube-system` namespace which allows unlimited pods to be created in that
// namespace
config.LimitedResources = append(config.LimitedResources, AddCriticalPodLimitedResources()...)

This comment has been minimized.

Copy link
@bsalamat

bsalamat May 30, 2019

Member

What if there is an existing and conflicting LimitedResource config? What would be the behavior of ResourceQuota?

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 30, 2019

Author Contributor

It may not happen here since we're creating config object here but it can certainly happen at https://github.com/kubernetes/kubernetes/pull/76310/files#diff-468653fefaaca80a88656dcb46002defR87. I thought about this but don't have a clean solution atm because of combinations possible.

This comment has been minimized.

Copy link
@bsalamat

bsalamat May 30, 2019

Member

Yeah, those conflicts could cause surprises. For that reason, I am not comfortable with this change at the moment. I think more checks are needed and if conflicting LimitedResources exist, we should either merge them, or at least not add the new rules and throw errors.

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 30, 2019

Author Contributor

if conflicting LimitedResources exist, we should either merge them, or at least not add the new rules and throw errors.

So to be clear, even if we have conflicting LimitedResources it won't cause an error, as long as we have a matching limitedresource we would proceed with allowing system-critical-pods to be created in kube-system namespace. The only thing that concerns me is should we allow that to happen or not(the combinations I was talking about earlier) and it might not be transparent to cluster-admin as to why this is happening but this has been age old problem and clear documentation on why this happens might solve the problem.

I updated the integration test to have conflicting LimitedResource operators(In, NotIn) and still the test passes - https://github.com/kubernetes/kubernetes/pull/76310/files#diff-4b71ba6f24c40153543bb61e06d2e54cR403

This comment has been minimized.

Copy link
@derekwaynecarr

derekwaynecarr May 30, 2019

Member

i am not sure i see an immediate conflict, by definition, a pod with system-critical or system-node critical priority was a limited resource in that it could only be created in one location with an implied quota. this is basically codifying that rule in the configuration.

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 30, 2019

Author Contributor

I agree with you. No one wants to create or would have created a config which tells I want my critical-pods to be created in any namespace.

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the ravisantoshgudimetla:fix-priority-quota branch from 9a5aea2 to 8f4a8a5 May 30, 2019

@ravisantoshgudimetla ravisantoshgudimetla force-pushed the ravisantoshgudimetla:fix-priority-quota branch from 8f4a8a5 to 81e9e61 May 30, 2019

client := clientset.NewForConfigOrDie(hookContext.LoopbackClientConfig)
_, err = client.CoreV1().ResourceQuotas(metav1.NamespaceSystem).Get(DefaultSystemQuota, metav1.GetOptions{})
if apierrors.IsNotFound(err) {
_, err := client.CoreV1().ResourceQuotas(metav1.NamespaceSystem).Create(systemDefaultQuota())

This comment has been minimized.

Copy link
@derekwaynecarr

derekwaynecarr May 30, 2019

Member

doesnt this prevent the user from deleting the quota if its just recreated?

This comment has been minimized.

Copy link
@ravisantoshgudimetla

ravisantoshgudimetla May 30, 2019

Author Contributor

I don't think so, the re-creation happens during bootstrap stage where user cannot interact with the api-server yet. Once the bootstrap is complete, users will be able to delete the quota.

// append pods as limited resources so that we can limit the number of critical pods to be created. We have
// a matching default quota in `kube-system` namespace which allows unlimited pods to be created in that
// namespace
config.LimitedResources = append(config.LimitedResources, AddCriticalPodLimitedResources()...)

This comment has been minimized.

Copy link
@derekwaynecarr

derekwaynecarr May 30, 2019

Member

i am not sure i see an immediate conflict, by definition, a pod with system-critical or system-node critical priority was a limited resource in that it could only be created in one location with an implied quota. this is basically codifying that rule in the configuration.

@soggiest

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

/milestone v1.16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.