-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance controller setup based on available CRDs #3472
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the way, don't we need tests for this change? crashlooping is a big issue so adding tests would be a good idea.
For(&v1alpha1api.InferenceGraph{}). | ||
Owns(&appsv1.Deployment{}). | ||
Complete(r) | ||
ksvcFound, err := utils.IsCrdAvailable(mgr.GetConfig(), "serving.knative.dev/v1", "Service") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If istio changes api version, it would behavior something wrong. Why don't we check all api versions that have the kind of CRD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To manipulate the KNative-related and Istio-related resources, their API libraries are imported and used in go.mod
. Right now, Istio library version 1.19.4 is being used and KNative 0.39.3 is being used. Such libraries would not contain any future versions of the Istio or KNative APIs. Furthermore, we can watch (invoke Own
) only on a specific API version of the CRDs.
So, if either Istio or KNative change their APIs, KServe would require more code changes than just changing this check (see an example in #3150). So, to me, it does not make sense to check for all versions, while the rest of the code would only work against a specific version.
BTW, there is a recommended version matrix in the website which seems to be outdated. I'd think it should work as a compatibility matrix, as I don't think it is feasible to support any combinations.
@israel-hdez
|
ksvcFound, err := utils.IsCrdAvailable(mgr.GetConfig(), "serving.knative.dev/v1", "Service") | ||
if err != nil { | ||
return err | ||
} | ||
|
||
vsFound, err := utils.IsCrdAvailable(mgr.GetConfig(), "networking.istio.io/v1beta1", "VirtualService") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these strings to constants and consider using a loop to check all of these CRDs since we may need to check them for different versions, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree on moving to constants, I'll do it.
About a loop for checking different versions, I'm not sure that's useful. Read more in my reply to Jooho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess the best way is to have a const (array in case of more versions) and check those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified to using constants and 3rd-party library-provided variables
For(&v1alpha1api.InferenceGraph{}). | ||
Owns(&knservingv1.Service{}). | ||
Complete(r) | ||
r.Log.Info("The InferenceGraph controller won't watch serving.knative.dev/v1/Service resources because the CRD is not available.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The r.Log
does not have a warning function. Only Info
and Error
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess warnings would looks like:
V(0).Info(...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would V(0)
be meaningful?
I'm not sure it will have any outcome in the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear in the documentation, but digging a little bit about it, V(0) seems to be a way to add verbosity in the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's the verbosity level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if you run the controller you can do something like --gloglevel 5
to print out only logs that falls into that verbosity boundary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this and I think we should not raise the verbosity for this one as it helps to see why the controller would not watch the involved CRD -- and it is printed only once for the pod lifetime.
So, I think it is worth to have it as a regular Info, given r.Log
does not have a warning function.
I think detecting the change in the setup would add some complexity. I'm not sure if it is worth to do that work, given changes to the stack may not be that common. For this PR, if a Serverless setup is done after KServe is installed, the only additional action needed would be to restart the KServe controller pod and it should start using KNative as normally.
I just tried it. There is no crash, although the controller will try to reconcile continuously. I do see that it tries to write something in the status field, but it fails for some reason (looks like a bug) - so, looks like the error message is hidden by another one. I agree that it should fail more gracefully. Given the detected CRDs it shouldn't reconcile continuously a Serverless InferenceService when the stack is absent. I'll do the needed code changes for this. |
} | ||
|
||
return ctrlBuilder.Complete(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to add tests to check if the configuration is correctly applied after the reconcile runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What configuration are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the isvc, e.g.
if you change the isvc and force knative, and if it is disabled, the isvc should not be reconciled, right? so the confs should be the same than before running the reconcile loop again.
does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point in time, I added to this PR some units in pkg/controller/v1beta1/inferenceservice/controller_test.go
and pkg/controller/v1alpha1/inferencegraph/controller_test.go
. Maybe that's enough?
b6bf6aa
to
3c993e2
Compare
I just added some code to abort reconciliation early and log an |
a3446a2
to
9be59e7
Compare
@Jooho @terrytangyuan @spolti The code is ready. Please, do another review round. I added tests around the impacted I failed to find a way to add tests for the |
return false, err | ||
} | ||
|
||
found := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be reduced to something like:
return gvResources != nil && any(crd.Kind == kind for crd := range gvResources.APIResources), nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied and pasted and it does not build....
I guess this is pseudocode. If it is, then yes, what you provided is the short form of all that code.
return nil, getGvResourcesErr | ||
} | ||
|
||
SetAvailableResourcesForApi(groupVersion, gvResources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is SetAvailableResourcesForApi used outside this package?
If not you could, instead create a func to store it, just put them in the cache here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it is. :)
if ksvcFound { | ||
ctrlBuilder = ctrlBuilder.Owns(&knservingv1.Service{}) | ||
} else { | ||
r.Log.Info("The InferenceService controller won't watch serving.knative.dev/v1/Service resources because the CRD is not available.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't be good to publish it to events as well?
Most of the time we don't usually see logs, having it in the events would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rarely deal with the Events API. So, I don't know.
Personally, looking at the logs would be the first thing I'd do to diagnose things. Then, I typically look at Events of some resource. If we want to record an event, I'm not sure which resource the event should belong to?
/approve |
cmd/manager/main.go
Outdated
|
||
ksvcFound, ksvcCheckErr := utils.IsCrdAvailable(cfg, knservingv1.SchemeGroupVersion.String(), constants.KnativeServiceKind) | ||
if ksvcCheckErr != nil { | ||
setupLog.Error(ksvcCheckErr, "error when checking if Knative Services are available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: seems like not all changes are synced from opendatahub-io#265?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@terrytangyuan @spolti I finally had time to finish this PR. |
/lgtm |
I pushed an empty commit to re-run CI, which is now passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Per prow instructions: /assign @yuzisun |
This enhances the setup of the InferenceService controller and the InferenceGraph controller. Instead of relying on the `defaultDeploymentMode` configuration to determine what CRDs to watch, the setup now checks whether KNative Services and Istio VirtualServices are available in the cluster and setup the watches (invoke `Owns`) accordingly. This enhancement has the following advantages: * A crashloop is prevented if the CRDs are missing in the cluster. The user would still be able to create InferenceServices by taking care of annotating the ISVC for RawDeployment mode. * If RawDeployment mode is configured as the default mode, the controllers would still watch for KNative and Istio resources if these components are available. This will let the controller watch for changes for the dependent resources if the user uses Serverless mode for some of the InferenceServices. * In the InferenceService controller, the watch for the VirtualServices is still conditioned to the value of the `disableVirtualHost` configuration. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Since KServe controllers are modified to watch resources based on available CRDs, a similar change in the setup of the manager is needed: schemas need to be added to the manager based on available CRDs rather than based only on the values in the inferenceservice-config ConfigMap. This would keep both manager setup and controller setup in sync with regards schemas and watches around the CRDs. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
/lgtm |
@yuzisun This is rebased to resolve conflicts, and CI is passing. |
Thanks @israel-hdez for the nice PR descriptions for easier review! /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: israel-hdez, spolti, terrytangyuan, yuzisun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Enhance controller setup based on available CRDs This enhances the setup of the InferenceService controller and the InferenceGraph controller. Instead of relying on the `defaultDeploymentMode` configuration to determine what CRDs to watch, the setup now checks whether KNative Services and Istio VirtualServices are available in the cluster and setup the watches (invoke `Owns`) accordingly. This enhancement has the following advantages: * A crashloop is prevented if the CRDs are missing in the cluster. The user would still be able to create InferenceServices by taking care of annotating the ISVC for RawDeployment mode. * If RawDeployment mode is configured as the default mode, the controllers would still watch for KNative and Istio resources if these components are available. This will let the controller watch for changes for the dependent resources if the user uses Serverless mode for some of the InferenceServices. * In the InferenceService controller, the watch for the VirtualServices is still conditioned to the value of the `disableVirtualHost` configuration. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com> * Controller setup - add schemas based on CRDs available Since KServe controllers are modified to watch resources based on available CRDs, a similar change in the setup of the manager is needed: schemas need to be added to the manager based on available CRDs rather than based only on the values in the inferenceservice-config ConfigMap. This would keep both manager setup and controller setup in sync with regards schemas and watches around the CRDs. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com> --------- Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
* Enhance controller setup based on available CRDs This enhances the setup of the InferenceService controller and the InferenceGraph controller. Instead of relying on the `defaultDeploymentMode` configuration to determine what CRDs to watch, the setup now checks whether KNative Services and Istio VirtualServices are available in the cluster and setup the watches (invoke `Owns`) accordingly. This enhancement has the following advantages: * A crashloop is prevented if the CRDs are missing in the cluster. The user would still be able to create InferenceServices by taking care of annotating the ISVC for RawDeployment mode. * If RawDeployment mode is configured as the default mode, the controllers would still watch for KNative and Istio resources if these components are available. This will let the controller watch for changes for the dependent resources if the user uses Serverless mode for some of the InferenceServices. * In the InferenceService controller, the watch for the VirtualServices is still conditioned to the value of the `disableVirtualHost` configuration. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com> * Controller setup - add schemas based on CRDs available Since KServe controllers are modified to watch resources based on available CRDs, a similar change in the setup of the manager is needed: schemas need to be added to the manager based on available CRDs rather than based only on the values in the inferenceservice-config ConfigMap. This would keep both manager setup and controller setup in sync with regards schemas and watches around the CRDs. Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com> --------- Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com> Signed-off-by: asd981256 <asd981256@gmail.com>
What this PR does / why we need it:
This enhances the setup of the InferenceService controller and the InferenceGraph controller. Instead of relying on the
defaultDeploymentMode
configuration to determine what CRDs to watch, the setup now checks whether KNative Services and Istio VirtualServices are available in the cluster and setup the watches (invokeOwns
) accordingly.This enhancement has the following advantages:
disableVirtualHost
configuration.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #3470
Type of changes
Please delete options that are not relevant.
Feature/Issue validation/testing:
The InferenceGraph controller won't watch serving.knative.dev/v1/Service resources because the CRD is not available.
and verified there no errors are logged.Checklist: