Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-4040: Controller name #4073

Closed
wants to merge 1 commit into from

Conversation

mwielgus
Copy link
Contributor

@mwielgus mwielgus commented Jun 9, 2023

  • One-line PR description: Establishes controllerName field and adds it to Job and HPA.
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 9, 2023
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 9, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mwielgus
Once this PR has been reviewed and has the lgtm label, please assign johnbelamaric, kow3ns for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

keps/sig-apps/4040-controller-name/README.md Show resolved Hide resolved

### Risks and Mitigations

Objects handled by custom controllers may have different field value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth giving an example. Here is one for Job: .status.failed higher than .spec.backoffLimit, and the Job is not marked as Failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


## Design Details

The supported API controllers will simply skip API objects that have non-default `controllerName` field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about defaulting from apiserver?

Would it set the value to default?

One lesson learned from schedulerName is that we shouldn't do it at the defaulting layer. We should do it in the strategy kubernetes/kubernetes#93067

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


- [x] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name: ControllerName
- Components depending on the feature gate: kube-controller-manager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiserver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

kep-number: 4040
authors:
- "@mwielgus"
owning-sig: sig-apps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

participating-sigs:
  - sig-autoscaling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


## Proposal

Add `controllerName` at the top level of Spec of API objects to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the endpointslice controller we are using a label "endpointslice.kubernetes.io/managed-by"

https://github.com/kubernetes/kubernetes/blob/18d05b646d09b2971dc5400bc288062b0414e8cf/staging/src/k8s.io/api/discovery/v1/well_known_labels.go#L27

I see it in alternatives but is there any special reason why label were discarded?

labels also have the benefit that are already supported by informers to filter information, that makes it easier to support environments with multiple controllers

@thockin should we try to be consistent across all the APIs about this feature? maybe is more a sig-arch topic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think managed-by is different pattern - EndpointSlice is more or less "dumb data", and not produced by humans (usually). Job and HPA have a lot more "active" semantic and are directly produced by humans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounded like a quite important piece of API so I went with a dedicated field, but the whole idea would also work with a label.


## Summary

This feature unblocks custom handling for selected standard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Handling" doesn't tell me anything about this KEP - handling what? I think "custom implementations" is clearer

Currently Kubernetes users, who need custom behavior for
some API objects, can either:

* disable the corresponding controller in controller-manager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone do this? Is this a thing I just don't know about?

Most of our APIs were not really designed as robust abstractions, they are more like configuration files for the core implementations. Making them into abstractions ex post facto seems high-risk. I am commenting before reading the whole thing, but I want to really understand the reward that comes with this risk, versus a new custom API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if anyone actually does it as it is super inconvenient - I'm exploring options to change this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone do this? Is this a thing I just don't know about?

@thockin - yes, see #3901


* duplicate/fork the API as CRD.

However, on managed environments (like, for example, GKE),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't tell me why this is a bad answer.


### Goals

* Establish a simple and common method to mark an API object as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the custom-scheduler is a poor analogy. This is more like PersistentVolumes or Ingress or Gateways - APIs which are defined as an abstraction meant to cover a wide array of implementations. We already have a pattern for this - FoobarClass. It comes with some overhead, but it's the result of trying this several times and finally landing on an acceptable balance of complexity and flexibility.

Gateway goes even further in that it was designed, from the start, for extension at each layer, but I don't know if that is warranted here.

In short:

Foobar.spec.foobarClassName is a string that names a FoobarClass resource.

The FoobarClass type includes a controller string and a parameters stanza, which references a custom resource of some type that the named controller can comprehend.

Usually one FoobarClass is designated the default (via annotation) and any CREATE Foobar operation which does not specify a class gets that.

Users can kubectl get foobarclass to see what is available, or they can ignore it.

Cluster admins can install any number of classes with various parameter sets, and can name them as they see fit.

Cluster providers do not need to do anything in particular to enable this, but MAY pre-populate some classes for their own environment (e.g. cloud provider might define N different GatewayClasses to correspond to LB infra).

To be clear - there's significant complexity buried here, and ample opportunities for mistakes (though nothing that doesn't exist in the simpler model proposed here). Status gets complicated. Defaulting and validation gets complicated. Nothing ensures that a given controller is actually running, so UX gets complicated. Requiring a CR for params is complicated (until all the declarative validation lands). But it's "the devil we know".

So, if we are going to do anything like this, I would want to understand why we would not follow the established pattern.

Also, the fact that we DON'T have a way to kubectl get schedulerclass (or something) makes custom schedulers less interesting.


* While the KEP enables providing custom handling for standard
K8S APIs,
it doesn't focus on any customization or additional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to leave the params out AS LONG AS we know how we will do it when it (inevitably) comes up.


#### Story 1

I want to run a modified, custom version of ControllerManager in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are those still Jobs and not MyCustomJob ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it work with whatever tools are for the original Job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.


#### Story 2

I want to provide a significantly different implementation of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I buy this story. You could use annotations for this, which we did in Service for a while and ultimately added a field for it because annotations suck. That said, we half-assed it and DIDN'T add a class resource, because we know that Gateway is the real path forward (obviates much of Service API).

So I actually buy HPAClass as an idea, though I wonder about how much the API will need to evolve to provide a "significantly different implementation". Job is less obvious to me.

combinations than handled by the default controller manager. These
combinations, while correct from the API validation perspective,
may project logical error or simply be not understood by dependant
code, and, as a result, potentially crash some external components.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go this route there needs to be a firm contract and conformance testing (mayber not k8s conformance but "are you a conformant HPA implementation". I think "may crash components" is not good.

custom controller, there is no guarantee that the object will
behave in GA or test-conformant manner.

Adding the field on security-related APIs may potentially
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a distraction to even talk about this as a system-wide idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, i will narrow down the scope.

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.28"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping this KEP with literal days left before the KEP freeze is unfortunate. I didn't get to read the PRR sections and I have significant concerns about the approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for that. Not all plans match perfectly with K8S release schedule.

@mwielgus
Copy link
Contributor Author

Closing the KEP. Will follow the recommended *Class pattern.

@mwielgus mwielgus closed this Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants