Skip to content

Commit

Permalink
Review remarks
Browse files Browse the repository at this point in the history
  • Loading branch information
mimowo committed Dec 21, 2023
1 parent c24755c commit 0f68654
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 269 deletions.
2 changes: 0 additions & 2 deletions keps/prod-readiness/sig-apps/4368.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
kep-number: 4368
alpha:
approver: "@wojtek-t"
beta:
approver: "@wojtek-t"
272 changes: 5 additions & 267 deletions keps/sig-apps/4368-support-managed-by-label-for-batch-jobs/README.md
Original file line number Diff line number Diff line change
@@ -1,81 +1,5 @@
<!--
**Note:** When your KEP is complete, all of these comment blocks should be removed.
To get started with this template:
- [ ] **Pick a hosting SIG.**
Make sure that the problem space is something the SIG is interested in taking
up. KEPs should not be checked in without a sponsoring SIG.
- [ ] **Create an issue in kubernetes/enhancements**
When filing an enhancement tracking issue, please make sure to complete all
fields in that template. One of the fields asks for a link to the KEP. You
can leave that blank until this KEP is filed, and then go back to the
enhancement and add the link.
- [ ] **Make a copy of this template directory.**
Copy this template into the owning SIG's directory and name it
`NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no
leading-zero padding) assigned to your enhancement above.
- [ ] **Fill out as much of the kep.yaml file as you can.**
At minimum, you should fill in the "Title", "Authors", "Owning-sig",
"Status", and date-related fields.
- [ ] **Fill out this file as best you can.**
At minimum, you should fill in the "Summary" and "Motivation" sections.
These should be easy if you've preflighted the idea of the KEP with the
appropriate SIG(s).
- [ ] **Create a PR for this KEP.**
Assign it to people in the SIG who are sponsoring this process.
- [ ] **Merge early and iterate.**
Avoid getting hung up on specific details and instead aim to get the goals of
the KEP clarified and merged quickly. The best way to do this is to just
start with the high-level sections and fill out details incrementally in
subsequent PRs.
Just because a KEP is merged does not mean it is complete or approved. Any KEP
marked as `provisional` is a working document and subject to change. You can
denote sections that are under active debate as follows:
```
<<[UNRESOLVED optional short context or usernames ]>>
Stuff that is being argued.
<<[/UNRESOLVED]>>
```
When editing KEPS, aim for tightly-scoped, single-topic PRs to keep discussions
focused. If you disagree with what is already in a document, open a new PR
with suggested changes.
One KEP corresponds to one "feature" or "enhancement" for its whole lifecycle.
You do not need a new KEP to move from beta to GA, for example. If
new details emerge that belong in the KEP, edit the KEP. Once a feature has become
"implemented", major changes should get new KEPs.
The canonical place for the latest set of instructions (and the likely source
of this file) is [here](/keps/NNNN-kep-template/README.md).
**Note:** Any PRs to move a KEP to `implementable`, or significant changes once
it is marked `implementable`, must be approved by each of the KEP approvers.
If none of those approvers are still appropriate, then changes to that list
should be approved by the remaining approvers and/or the owning SIG (or
SIG Architecture for cross-cutting KEPs).
-->
# KEP-4368: Support managed-by label for Jobs

<!--
This is the title of your KEP. Keep it short, simple, and descriptive. A good
title can help communicate what the KEP is and should be considered as part of
any review.
-->

<!--
A table of contents is helpful for quickly jumping to sections of a KEP and for
highlighting any additional information provided beyond the standard KEP
template.
Ensure the TOC is wrapped with
<code>&lt;!-- toc --&rt;&lt;!-- /toc --&rt;</code>
tags, and then generate with `hack/update-toc.sh`.
-->

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
Expand Down Expand Up @@ -170,32 +94,13 @@ Items marked with (R) are required *prior to targeting to a milestone / release*

## Summary

<!--
This section is incredibly important for producing high-quality, user-focused
documentation such as release notes or a development roadmap. It should be
possible to collect this information before implementation begins, in order to
avoid requiring implementors to split their attention between writing release
notes and implementing the feature itself. KEP editors and SIG Docs
should help to ensure that the tone and content of the `Summary` section is
useful for a wide audience.
A good summary is probably at least a paragraph in length.
Both in this section and below, follow the guidelines of the [documentation
style guide]. In particular, wrap lines to a reasonable length, to make it
easier for reviewers to cite specific portions, and to minimize diff churn on
updates.
[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md
-->

We support the `managed-by` label as a lightweight mechanism to delegate the Job
synchronization to an external controller.

## Motivation

As a part of [Kueue](https://github.com/kubernetes-sigs) (an effort done by
Batch-WG, in cooperation with SIG-Autoscaling, SIG-Scheduling and SIG-Apps) we
Batch-WG, in cooperation with SIG-Node, SIG-Autoscaling, SIG-Scheduling and SIG-Apps) we
are working on a multi-cluster job dispatcher project, called
[MultiKueue](https://github.com/kubernetes-sigs/kueue/pull/1380).

Expand All @@ -208,56 +113,21 @@ of the Job created by the user.
In order to support this workflow we need a mechanism to disable the main Job
controller, and delegate the status synchronization to the Kueue controller.

<!--
This section is for explicitly listing the motivation, goals, and non-goals of
this KEP. Describe why the change is important and the benefits to users. The
motivation section can optionally provide links to [experience reports] to
demonstrate the interest in a KEP within the wider Kubernetes community.
[experience reports]: https://github.com/golang/go/wiki/ExperienceReports
-->

### Goals

<!--
List the specific goals of the KEP. What is it trying to achieve? How will we
know that this has succeeded?
-->

- support delegation of Job synchronization to an external controller

### Non-Goals

<!--
What is out of scope for this KEP? Listing non-goals helps to focus discussion
and make progress.
-->

- passing custom parameters to the external controller

## Proposal

<!--
This is where we get down to the specifics of what the proposal actually is.
This should have enough detail that reviewers can understand exactly what
you're proposing, but should not include things like API designs or
implementation. What is the desired outcome and how do we measure success?.
The "Design Details" section below is for the real
nitty-gritty.
-->

The proposal is to support the `managed-by` label to disabled the Job controller
from synchronizing jobs with the annotation.

### User Stories (Optional)

<!--
Detail the things that people will be able to do if this KEP is implemented.
Include as much detail as possible so that people can understand the "how" of
the system. The goal here is to make this feel real for users without getting
bogged down.
-->

#### Story 1

As a developer of Kueue I want to have Job API which allows me to implement the
Expand Down Expand Up @@ -322,13 +192,6 @@ portion of jobs would require an actual etcd write after cluster upgrade.

We add evaluation of the strategy to [GA requirements](#ga).

<!--
What are the caveats to the proposal?
What are some important details that didn't come across above?
Go in to as much detail as necessary here.
This might be a good place to talk about core concepts and how they relate.
-->

### Risks and Mitigations

#### Ecosystem fragmentation due to forks
Expand All @@ -345,27 +208,8 @@ control plane can disable by passing `controllers=-job,*` in the manifest for of
Second, we believe that users who had the need to fork the Job controller
already introduced dedicated Job CRDs for their needs.

<!--
What are the risks of this proposal, and how do we mitigate? Think broadly.
For example, consider both security and how this will impact the larger
Kubernetes ecosystem.
How will security be reviewed, and by whom?
How will UX be reviewed, and by whom?
Consider including folks who also work outside the SIG or subproject.
-->

## Design Details

<!--
This section should contain enough information that the specifics of your
change are understandable. This may include API specs (though not always
required) or even code snippets. If there's any ambiguity about HOW your
proposal will be implemented, this is the place to discuss them.
-->

#### API

```golang
Expand All @@ -383,74 +227,24 @@ In the [Beta](#beta) iteration of the feature we skip synchronization of the
Jobs with the `managed-by` label, if it has any different value than `job-controller`.
We also default the label for all newly created jobs, to avoid implicit defaults.

There is no validation for the values of the field.
There is no validation for the values of the field beyond that of standard permitted label values.

### Test Plan

<!--
**Note:** *Not required until targeted at a release.*
The goal is to ensure that we don't accept enhancements with inadequate testing.
All code is expected to have adequate tests (eventually with coverage
expectations). Please adhere to the [Kubernetes testing guidelines][testing-guidelines]
when drafting this test plan.
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
-->

[x] I/we understand the owners of the involved components may require updates to
existing tests to make this code solid enough prior to committing the changes necessary
to implement this enhancement.

##### Prerequisite testing updates

<!--
Based on reviewers feedback describe what additional tests need to be added prior
implementing this enhancement to ensure the enhancements have also solid foundations.
-->

##### Unit tests

<!--
In principle every added code should have complete unit test coverage, so providing
the exact set of tests will not bring additional value.
However, if complete unit test coverage is not possible, explain the reason of it
together with explanation why this is acceptable.
-->

<!--
Additionally, for Alpha try to enumerate the core package you will be touching
to implement this enhancement and provide the current unit coverage for those
in the form of:
- <package>: <date> - <current test coverage>
The data can be easily read from:
https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit
This can inform certain test coverage improvements that we want to do before
extending the production code to implement this enhancement.
-->

- `pkg/controller/job`: `2023-12-20` - `91.5%`
- `pkg/registry/batch/job`: `2023-12-20` - `92.2%`
- `pkg/apis/batch/v1`: `2023-12-20` - `29.3%` (mostly generated code)

##### Integration tests

<!--
Integration tests are contained in k8s.io/kubernetes/test/integration.
Integration tests allow control of the configuration parameters used to start the binaries under test.
This is different from e2e tests which do not allow configuration of parameters.
Doing this allows testing non-default options and multiple different and potentially conflicting command line options.
-->

<!--
This question should be filled when targeting a release.
For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
https://storage.googleapis.com/k8s-triage/index.html
-->

The following scenarios are covered:
- a new created job has the `managed-by` label defaulted to `job-controller`
- the Job controller reconciles jobs with the `managed-by` label equal to `job-controller`
Expand All @@ -461,70 +255,14 @@ During the implementation more scenarios might be covered.

##### e2e tests

<!--
This question should be filled when targeting a release.
For Alpha, describe what tests will be added to ensure proper quality of the enhancement.
The feature does not depend on kubelet, so the functionality can be fully
covered with unit & integration tests.

For Beta and GA, add links to added tests together with links to k8s-triage for those tests:
https://storage.googleapis.com/k8s-triage/index.html
We expect no non-infra related flakes in the last month as a GA graduation criteria.
-->

The following scenarios are covered:
We propose a single e2e test for the following scenario:
- the Job controller does not reconcile a job with any other value of the `managed-by` label

### Graduation Criteria

<!--
**Note:** *Not required until targeted at a release.*
Define graduation milestones.
These may be defined in terms of API maturity, [feature gate] graduations, or as
something else. The KEP should keep this high-level with a focus on what
signals will be looked at to determine graduation.
Consider the following in developing the graduation criteria for this enhancement:
- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]
- [Feature gate][feature gate] lifecycle
- [Deprecation policy][deprecation-policy]
Clearly define what graduation means by either linking to the [API doc
definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning)
or by redefining what graduation means.
In general we try to use the same stages (alpha, beta, GA), regardless of how the
functionality is accessed.
[feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/
Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].
#### Alpha
- Feature implemented behind a feature flag
- Initial e2e tests completed and enabled
**Note:** Generally we also wait at least two releases between beta and
GA/stable, because there's no opportunity for user feedback, or even bug reports,
in back-to-back releases.
**For non-optional features moving to GA, the graduation criteria must include
[conformance tests].**
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
#### Deprecation
- Announce deprecation and support policy of the existing flag
- Two versions passed since introducing the functionality that deprecates the flag (to address version skew)
- Address feedback on usage/changed behavior, provided on GitHub issues
- Deprecate the flag
-->

#### Beta

- skip synchronization of jobs when the `managed-by` label does not exist, or equals `job-controller`
Expand Down

0 comments on commit 0f68654

Please sign in to comment.