Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate control-plane labels #1061

Open
sozercan opened this issue Jan 12, 2021 · 19 comments
Open

Deprecate control-plane labels #1061

sozercan opened this issue Jan 12, 2021 · 19 comments
Labels
enhancement New feature or request triaged

Comments

@sozercan
Copy link
Member

Describe the solution you'd like
[A clear and concise description of what you want to happen.]
control-plane labels are leftovers from kubebuilder 1.0. They can be deprecated but we should give users time to move off using these labels.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Gatekeeper version:
  • Kubernetes version: (use kubectl version):
@sozercan sozercan added the enhancement New feature or request label Jan 12, 2021
@shomron
Copy link
Contributor

shomron commented Jan 13, 2021

+1 for deprecation, however since deployment spec.selector is immutable after creation, we need to consider documentation for the upgrade process.

@maxsmythe
Copy link
Contributor

What is the normal upgrade path for something like this? Deleting the deployment and recreating?

@cristiklein
Copy link

Could someone clarify why the deprecation of the control-plane label is necessary?

It seems to me that the control-plane label is an established (although not fully "standardized") way of marking system namespace, for which admission controllers should be excluded. At least AKS advocates that approach.

We are about to contribute a PR to kubespray to add said label to all clusters created with kubespray.

Would it be possible to keep the control-plane label for webhook exclusion in Gatekeeper?

@maxsmythe
Copy link
Contributor

TBH I have no strong feelings about the existence (or lack thereof) of a control-plane label on Gatekeeper resources.

I do think that the label being under-defined is a problem. What should the significance of control-plane be? What is an appropriate value for that label to have? I think that, without an authoritative answer to this question, the notion of using this label to make meaningful decisions is potentially brittle as different projects may use it in different ways.

From a security standpoint, I'm not in love with the idea of using labels to control which resources are subject to a webhook and which are not. This makes the ability to set labels equivalent to the ability to bypass policy checks. It can make sense, but needs to be done carefully to avoid unintended consequences.

Historically, Gatekeeper had that label because it was auto-generated by Kubebuilder v1. I think it was then applied to the gatekeeper-system namespace and used as a way to bypass the G8r webhook to avoid self-management until it was replaced by the (more locked-down and explicit) admission.gatekeeper.sh/ignore label. Now it exists as a kind of vestigial appendage (or so I thought).

As I said, I am kind of "meh" about the label with a lean toward removing it just to avoid bikeshedding over label values, etc. If there is an emerging standard that should be something to consider.

Curious to hear others' opinions.

@shomron
Copy link
Contributor

shomron commented Jan 22, 2021

What is the normal upgrade path for something like this? Deleting the deployment and recreating?
Yes, I believe that is the only way. That implies a temporary lapse in enforcement as that environment rolls over to the new deployment.

Would it be possible to keep the control-plane label for webhook exclusion in Gatekeeper?

I think we're talking about several different changes all in one.

  1. Removal of the control-plane labels from the Deployment's pod selector. This is internal cleanup and reduces the chances of different kubebuilder-generated deployments in the same namespace from conflicting with each other (trying to steal each other's pods). Even if they were left in place, I think the generic value control-plane: controller-manager on the GK webhook deployment is risky.
  2. Removal of the control-plane labels from the gatekeeper-system namespace - again this is auto-generated and its significance is not defined anywhere. Gatekeeper already defines a new label with well-defined semantics for avoiding self-managing and having both set on the namespace does not provide any value I can see.
  3. Removal of the control-plane as a default exclusion in the GK reference ValidatingWebhookConfig - if we do this, we should call it out in our release notes and give users clear guidance on a safe transition and how to keep the old behavior if they want.

The control-plane label does not follow best-practices as it is not prefixed in any way. As a reference see some well-known labels documented by Kubernetes here and here. My opinion is that it is better to phase it out than to perpetuate it into other projects like kubespray.

My 2c.

@ritazh
Copy link
Member

ritazh commented Jan 23, 2021

+1 on phased approach to deprecate this.

We can also introduce a new field in the helm chart to allow users to provide this label via helm values.yaml. It can initially default to control-plane then after few releases with warnings in our release notes, we can remove it from the default values.yaml but provide steps to help guide users to set it in their own values.yaml.

@cristiklein
Copy link

cristiklein commented Jan 23, 2021

I see your point. The control-plane label is too generic and needs to be replaced with something more intuitive. I retracted the kubespray PR and we'll look for a better alternative for Compliant Kubernetes.

@sozercan
Copy link
Member Author

  1. Removal of the control-plane as a default exclusion in the GK reference ValidatingWebhookConfig - if we do this, we should call it out in our release notes and give users clear guidance on a safe transition and how to keep the old behavior if they want.

This has been removed starting with v3.1.0: #758

@ritazh
Copy link
Member

ritazh commented Apr 2, 2021

Closed via #758

@ritazh ritazh closed this as completed Apr 2, 2021
@sozercan sozercan reopened this Apr 2, 2021
@sozercan
Copy link
Member Author

sozercan commented Apr 2, 2021

@ritazh we still have control-plane references in our manifests. #758 only removed from vwh.

@stale
Copy link

stale bot commented Jul 23, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jul 23, 2022
@sozercan sozercan added triaged and removed wontfix This will not be worked on labels Jul 27, 2022
@tspearconquest
Copy link

tspearconquest commented Nov 14, 2022

Hi, is it currently safe as an end user to remove control-plane from the gatekeeper-system namespace? I believe that label is contributing to #2142 as per my comment on that issue. Please see comment for more detail

Edit: I believe this may also be causing https://issuetracker.google.com/issues/204835306?pli=1

@ritazh
Copy link
Member

ritazh commented Nov 14, 2022

Is it safe to remove the control-plane: controller-manager label from the gatekeeper-system namespace currently, if we have already applied the admission.gatekeeper.sh/ignore: no-self-managing label?

If you already have the admission.gatekeeper.sh/ignore: no-self-managing label, then yes it's safe to remove the control-plane label.

I think we have waited long enough to remove the control-plane label everywhere. WDYT? @sozercan @maxsmythe I can open a PR to remove it.

@srmars
Copy link

srmars commented Nov 14, 2022

I am having admission.gatekeeper.sh/ignore: no-self-managing and control-plane=controller-manager label in gatekeeper-system namespace. For testing, to fix the TLS handshake error, I tied to remove the control-plane label from gatekeeper-system namespace and its deployments. After the rolling restart of the gatekeeper deployments the control-plane labels got added automatically back to gatekeeper namespace and deployments. Do we need to update some where to make it permanent

gatekeeper version - v3.8.1

@tspearconquest
Copy link

Hi @sri53 how do you deploy Gatekeeper? Is it by Helm, or by K8S manifests, or by some other method? Also what version of Gatekeeper and Kubernetes do you deploy to?

@srmars
Copy link

srmars commented Nov 18, 2022

Hi @tspearconquest
how do you deploy Gatekeeper? Deployed through Azure Policy Add-on for AKS
what version of Gatekeeper - 3.8.1
Kubernetes version - AKS 1.23.12

@tspearconquest
Copy link

I believe Azure Policy Add-on enforces the label in your case, so if that's correct then removing it for your gatekeeper instance won't be possible until it has been removed in upstream and Azure Policy upgrades to a version which has it removed.

@tspearconquest
Copy link

Hi @ritazh - It seems my suspicion was not correct, and removing the control-plane label did not help.

It's really interesting that this is only affecting Gatekeeper, as we do have other tools with MWH and VWH which do not see this problem, and the traffic causing the errors is 100% coming from the konnectivity-agent pods in kube-system

I also took a look in konnectivity configmap and deployment manifest in one of our clusters to see if I could find a log format option, but I'm afraid I couldn't find any. My main concern is that these are not coming in json format, so it causes a lot of spam for our fluentd instance to try to parse non-json log outputs as json.

@ritazh
Copy link
Member

ritazh commented Nov 18, 2022

Thanks for the update @tspearconquest! I can dig into this a bit later and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triaged
Projects
None yet
Development

No branches or pull requests

7 participants