Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that the framework is available using RESTMapper instead of getting CRDs #1046

Merged
merged 1 commit into from
Aug 11, 2023

Conversation

tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented Aug 7, 2023

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Using RESTMapper, we can ensure that the manager recognizes the frameworks, and we can reduce passing permissions to the manager.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Aug 7, 2023
@netlify
Copy link

netlify bot commented Aug 7, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 76c693d
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/64d6256524aa160008e1aa31

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 7, 2023
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this. Looks like the right direction, some questions.

main.go Outdated
if isFrameworkEnabled(cfg, name) && crds.Has(name) {
if err := cb.NewReconciler(
gvk := cb.JobType.GetObjectKind().GroupVersionKind()
if _, err := mgr.GetRESTMapper().RESTMapping(gvk.GroupKind(), gvk.Version); err != nil && isFrameworkEnabled(cfg, name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isFrameworkEnabled(cfg, name) && crds.Has(name) was the happy path. So IIUC it should correspond to no error (err= nil).

Also, make sure the error we get is not lost. At least log it, but probably we should propagate it up.

I would also do any checks only if the isFrameworkEnabled responds to true. Otherwise we risk logging an error or propagating up for frameworks which are disabled.

what is the err if there is no mapping, is it NotFound? Maybe there are 3 cases:

  • no error -> happy path
  • NotFound error- > path for !crds.Has(name)
  • other generic error -> propagate the error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isFrameworkEnabled(cfg, name) && crds.Has(name) was the happy path. So IIUC it should correspond to no error (err= nil).

Yes, that's right. It's my bad :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the err if there is no mapping,

It's expected meta.IsNoMatchError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the controller-runtime v0.15, the returned error seems to be changed with discovery.ErrGroupDiscoveryFailed .

And the changing seems the unintended breaking changes:
kubernetes-sigs/controller-runtime#2425

Copy link
Contributor

@trasc trasc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is doing the same thing, especially since we no longer need RBAC permissions I suspect that GetRESTMapper will only have the view of the local schema, which will always contain the types in question.

Was this tested in a live cluster?

@tenzen-y
Copy link
Member Author

tenzen-y commented Aug 8, 2023

I don't think this is doing the same thing, especially since we no longer need RBAC permissions I suspect that GetRESTMapper will only have the view of the local schema, which will always contain the types in question.

Was this tested in a live cluster?

It's a great catch @trasc! You're right. We need to pass GVK in another way.

@trasc
Copy link
Contributor

trasc commented Aug 8, 2023

I don't think this is doing the same thing, especially since we no longer need RBAC permissions I suspect that GetRESTMapper will only have the view of the local schema, which will always contain the types in question.

Was this tested in a live cluster?

It's a great catch @trasc! You're right. We need to pass GVK in another way.

If the check is problematic in some usecase, we can make it configurable (enabled by default).

main.go Outdated
Comment on lines 229 to 232
var NoMatchingErr *discovery.ErrGroupDiscoveryFailed
if !errors.As(err, &NoMatchingErr) {
return err
}
log.Info("No matching API server for job framework, skip to create controller and webhook")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed if this check is effective using ytenzen/kueue-controller:rest-mapper in the KinD cluster:

...
{"level":"info","ts":"2023-08-08T12:35:10.352203711Z","logger":"setup","caller":"workspace/main.go:233","msg":"No matching API server for job framework, skip to create controller and webhook","jobFrameworkName":"kubeflow.org/mpijob"}
...

@tenzen-y
Copy link
Member Author

tenzen-y commented Aug 8, 2023

I don't think this is doing the same thing, especially since we no longer need RBAC permissions I suspect that GetRESTMapper will only have the view of the local schema, which will always contain the types in question.
Was this tested in a live cluster?

It's a great catch @trasc! You're right. We need to pass GVK in another way.

If the check is problematic in some usecase, we can make it configurable (enabled by default).

No worries. I just would like to clean up.

main.go Outdated
Comment on lines 223 to 227
// TODO: If the below PR is released, we need to change a way to check if the GVK is registered.
// REF: https://github.com/kubernetes-sigs/controller-runtime/pull/2425
// if !meta.IsNoMatchError(err) {
// return err
// }
// log.Info("No matching API server for job framework, skip to create controller and webhook")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the controller-runtime v0.15, the returned error seems to be changed with discovery.ErrGroupDiscoveryFailed .

And the changing seems the unintended breaking changes:
kubernetes-sigs/controller-runtime#2425

@tenzen-y tenzen-y changed the title Ensure that the framework is available using RESTMapper instead of getting CRDs WIP: Ensure that the framework is available using RESTMapper instead of getting CRDs Aug 8, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 8, 2023
}
}
if err := noop.SetupWebhook(mgr, cb.JobType); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should think not be under else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current behavior, the noop.SetupWebhook is run even when isFrameworkEnabled==true && crds.Has==false.
So I think we need to run here even when isFrameworkEnabled==true && errors.As(err, &NoMatchingErr)==true.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed that in the happy path you now do return nil. I was thinking the noop.SetupWebhook is called now in that case.

@tenzen-y tenzen-y changed the title WIP: Ensure that the framework is available using RESTMapper instead of getting CRDs Ensure that the framework is available using RESTMapper instead of getting CRDs Aug 8, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 8, 2023
@tenzen-y
Copy link
Member Author

tenzen-y commented Aug 9, 2023

@trasc @mimowo Could you recheck this PR?
/assign @trasc @mimowo

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

main.go Outdated
// if !meta.IsNoMatchError(err) {
// return err
// }
// log.Info("No matching API server for job framework, skip to create controller and webhook")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for the log line, it's not changing :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point!
Done.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 10, 2023

if isFrameworkEnabled(cfg, name) {
if _, err := mgr.GetRESTMapper().RESTMapping(cb.GVK.GroupKind(), cb.GVK.Version); err != nil {
// TODO: If the below PR is released, we need to change a way to check if the GVK is registered.
Copy link
Contributor

@mimowo mimowo Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, one more thought. There is a risk that once the PR is released, say in controller-runtime 0.16, we may not remember to check this when bumping the version and break this. I think it is unlikely a person doing a bump of controller-runtime would look at this comment without prior knowledge.
Some ideas to mitigate this risk:

  1. prepare the check in advance, if !meta.IsNoMatchError(err) && !errors.As(err, &NoMatchingErr) { return err }. Then leave the TODO comment to cleanup up once released
  2. coordinate with the release of the PR and bump controller-runtime (but might be not necessarily involving)
  3. create an issue in advance in kueue to increase visibility, giving a title like "Bump controller-runtime and adjust RESTMapper usage"

I'm leaning towards (1.) or (2.), but (2.) only if it is to happen with Kueue 0.5 release, because I don't like keeping PRs in a freezer for long :). (1.) seems safe to do.

EDIT: 4. integration tests for the scenario.
(4.) would be great, but might be overkill

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 sounds good, but also leave a TODO and issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mimowo It's a great suggestion. I think 1 would be nice.

1 sounds good, but also leave a TODO and issue.

I agree. We should create an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed logic and created an issue: #1054

if !errors.As(err, &NoMatchingErr) {
return err
}
log.Info("No matching API server for job framework, skip to create controller and webhook")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include which framework, by adding a key and a value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The framework name is already set in

kueue/main.go

Line 219 in ed880c7

log := setupLog.WithValues("jobFrameworkName", name)
.


if isFrameworkEnabled(cfg, name) {
if _, err := mgr.GetRESTMapper().RESTMapping(cb.GVK.GroupKind(), cb.GVK.Version); err != nil {
// TODO: If the below PR is released, we need to change a way to check if the GVK is registered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 sounds good, but also leave a TODO and issue.

main.go Outdated
return err

if isFrameworkEnabled(cfg, name) {
if _, err := mgr.GetRESTMapper().RESTMapping(cb.GVK.GroupKind(), cb.GVK.Version); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use something like

		mgr.GetScheme().ObjectKinds(cb.JobType)	

or apiutil.GVKForObject() insted if chaging IntegrationCallbacks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result, both ways (mgr.GetScheme().ObjectKinds(cb.JobType) and apiutil.GVKForObject()) are different from what is expected.

As I checked both ways in the KinD cluster, both ways returned the GVK even if the jobFramework CRD wasn't deployed because I guess that we registered all jobFramework schemes as an init().

Also, mgr.GetScheme().ObjectKinds returns false as a second returned value regardless of whether the jobFramework CRD is deployed in the cluster.

Copy link
Contributor

@trasc trasc Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but what I meant is that instead of changing the IntegrationCallbacks structure and add new field in all the integrations, get the GVK with one of those two calls. something like:

Suggested change
if _, err := mgr.GetRESTMapper().RESTMapping(cb.GVK.GroupKind(), cb.GVK.Version); err != nil {
gvk := apiutil.GVKForObject(cb.JobType)
if _, err := mgr.GetRESTMapper().RESTMapping(gvk.GroupKind(), gvk.Version); err != nil {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would work well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

…tting theCRDs

Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Copy link
Contributor

@trasc trasc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 11, 2023
@k8s-ci-robot k8s-ci-robot merged commit acec9e1 into kubernetes-sigs:main Aug 11, 2023
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.5 milestone Aug 11, 2023
@tenzen-y tenzen-y deleted the use-mapper-provider branch August 11, 2023 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants