-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable transformers to work with ModelMesh #2136
Enable transformers to work with ModelMesh #2136
Conversation
if ok && (deploymentMode == string(constants.ModelMeshDeployment)) { | ||
// Get predictor host and protocol from annotations in modelmesh deployment mode | ||
argumentPredictorHost = metadata.Annotations["predictor-host"] | ||
argumentPredictorProtocol := metadata.Annotations["predictor-protocol"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a protocol
field on the InferenceService
component spec instead of using the annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuzisun I don't believe these annotations are meant for the user to use, but only as an internal intermediary for passing in needed values from the transformer Reconcile function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinhuang007 I think it would be simpler to just add an additional predictorURL
arg to this GetContainer
function? Then no intermediary annotations would be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njhill GetContainer is an interface, also implemented in other places. The goal is to minimize interface change therefore use of intermediary annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuzisun @njhill @pvaneck The current use of intermediary annotations is aimed to minimize interface changes. Adding a new arg to GetContainer, part of ComponentImplementation interface, would require changes in 30+ files, including predictor, explainer, and a bunch of examples. Not a difficult task, I just wanted to discuss whether it is warranted before making an interface change.
If we decide a new arg should be added, I'd like to make sure the type is correct since the impact is broad. I am currently thinking InferenceServiceStatus might work?
The use of annotation is intended to minimize spec and interface changes. We could certainly make a bigger spec change by adding a new field and then have it populated as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @chinhuang007. Left some comments.
if ok && (deploymentMode == string(constants.ModelMeshDeployment)) { | ||
// Get predictor host and protocol from annotations in modelmesh deployment mode | ||
argumentPredictorHost = metadata.Annotations["predictor-host"] | ||
argumentPredictorProtocol := metadata.Annotations["predictor-protocol"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuzisun I don't believe these annotations are meant for the user to use, but only as an internal intermediary for passing in needed values from the transformer Reconcile function.
} | ||
|
||
isvc.ObjectMeta.Annotations["predictor-host"] = predictorURL.Host | ||
if predictorURL.Scheme == "http" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add the grpc check as the first conditional check since in most cases, the URL from ModelMesh will be gRPC based.
Also, for completeness, check for https
as a potential scheme as well.
pkg/constants/constants.go
Outdated
@@ -109,6 +109,13 @@ var ( | |||
DefaultMinReplicas = 1 | |||
) | |||
|
|||
// Predictor Protocol Constants | |||
var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heads up that https://github.com/kserve/kserve/pull/2118/files#diff-36043890c52c8201a8bc84238c219be45ce07bf172d89f693c7c54ffe70d046e is defining some protocol constants and will soon be merged, so you can rebase once that is in.
@@ -72,6 +73,23 @@ func (c *CustomTransformer) GetContainer(metadata metav1.ObjectMeta, extensions | |||
container := &c.Containers[0] | |||
argumentPredictorHost := fmt.Sprintf("%s.%s", constants.DefaultPredictorServiceName(metadata.Name), metadata.Namespace) | |||
|
|||
deploymentMode, ok := metadata.Annotations[constants.DeploymentMode] | |||
logger, _ := pkglogging.NewLogger("", "INFO") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use the logging library that is typically used in KServe:
import (
logf "sigs.k8s.io/controller-runtime/pkg/log"
)
var log = logf.Log.WithName("CustomTransformerReconciler")
Another example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out the common KServe logger. Just updated, please take a look.
9ee794b
to
9bfae2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chinhuang007 looks good, I added a few comments inline.
container.Args = append(container.Args, []string{ | ||
"--protocol", | ||
argumentPredictorProtocol, | ||
}...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
container.Args = append(container.Args, []string{ | |
"--protocol", | |
argumentPredictorProtocol, | |
}...) | |
container.Args = append(container.Args, "--protocol", argumentPredictorProtocol) |
if ok && (deploymentMode == string(constants.ModelMeshDeployment)) { | ||
// Get predictor host and protocol from annotations in modelmesh deployment mode | ||
argumentPredictorHost = metadata.Annotations["predictor-host"] | ||
argumentPredictorProtocol := metadata.Annotations["predictor-protocol"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinhuang007 I think it would be simpler to just add an additional predictorURL
arg to this GetContainer
function? Then no intermediary annotations would be needed.
// check if predictor URL is populated | ||
if isvc.Status.Components["predictor"].URL == nil { | ||
// exit transformer reconcile with an error when predictor URL not populated | ||
return fmt.Errorf("Predictor URL from ModelMesh is not ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would want to differentiate this from actual errors since it just means "not ready yet". Instead of returning an error from the top-level reconcile func it would be better to return e.g. ctrl.Result{RequeueAfter: 2 * time.Second}
(and could log an info message here).
I guess an additional return value might need to be added to this func to facilitate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njhill This requires interface changes as well. I will be happy to make such changes if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chinhuang007, looks like you already made the change.
@@ -223,7 +230,7 @@ func (r *InferenceServiceReconciler) updateStatus(desiredService *v1beta1api.Inf | |||
// This is important because the copy we loaded from the informer's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether the DeepEqual comparison above should be reduced in scope to not look at the parts of the status that the modelmesh controller "owns" in the case that this is a modelmesh deployment?
@@ -223,7 +230,7 @@ func (r *InferenceServiceReconciler) updateStatus(desiredService *v1beta1api.Inf | |||
// This is important because the copy we loaded from the informer's | |||
// cache may be stale and we don't want to overwrite a prior update | |||
// to status with this stale state. | |||
} else if err := r.Status().Update(context.TODO(), desiredService); err != nil { | |||
} else if err := r.Status().Patch(context.TODO(), desiredService, client.Merge, &client.PatchOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add a comment explaining why we're doing a patch rather than update here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to prevent from overriding status when ModelMesh and KServe were not using the same InferenceService status spec. After ModelMesh changed to use the type definitions from KServe, Patch is no longer needed. Will change back to Update.
9bfae2b
to
b050bf8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chinhuang007 updates look good! I just made one more small simplification suggestion.
if isvc.Status.Components["predictor"].URL == nil { | ||
// tansformer reconcile will retry every 3 second until predictor URL is populated | ||
p.Log.Info("Transformer reconciliation is waiting for predictor URL to be populated") | ||
return ctrl.Result{RequeueAfter: 3 * time.Second}, nil | ||
} | ||
|
||
// add predictor host and protocol to metadata | ||
predictorURL, err := url.Parse(isvc.Status.Components["predictor"].URL.String()) | ||
if err != nil { | ||
return ctrl.Result{}, fmt.Errorf("unable to parse predictor URL: %v", err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can simplify:
if isvc.Status.Components["predictor"].URL == nil { | |
// tansformer reconcile will retry every 3 second until predictor URL is populated | |
p.Log.Info("Transformer reconciliation is waiting for predictor URL to be populated") | |
return ctrl.Result{RequeueAfter: 3 * time.Second}, nil | |
} | |
// add predictor host and protocol to metadata | |
predictorURL, err := url.Parse(isvc.Status.Components["predictor"].URL.String()) | |
if err != nil { | |
return ctrl.Result{}, fmt.Errorf("unable to parse predictor URL: %v", err) | |
} | |
predictorURL := (*url.URL)(isvc.Status.Components["predictor"].URL) | |
if predictorURL == nil { | |
// tansformer reconcile will retry every 3 second until predictor URL is populated | |
p.Log.Info("Transformer reconciliation is waiting for predictor URL to be populated") | |
return ctrl.Result{RequeueAfter: 3 * time.Second}, nil | |
} | |
// add predictor host and protocol to metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njhill Good point! Changed as suggested.
c8bcb52
to
9b880ab
Compare
@@ -168,7 +175,7 @@ func (r *InferenceServiceReconciler) Reconcile(ctx context.Context, req ctrl.Req | |||
reconcilers = append(reconcilers, components.NewExplainer(r.Client, r.Scheme, isvcConfig)) | |||
} | |||
for _, reconciler := range reconcilers { | |||
if err := reconciler.Reconcile(isvc); err != nil { | |||
if _, err := reconciler.Reconcile(isvc); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: is the Result with the RequeueAfter
propagated out? Seems like we would just ignore the returned Requeue Result here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Add code to propagate requeue result to isvc reconcile.
// check if predictor URL is populated | ||
predictorURL := (*url.URL)(isvc.Status.Components["predictor"].URL) | ||
if predictorURL == nil { | ||
// tansformer reconcile will retry every 3 second until predictor URL is populated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: transformer typo
6f32e8f
to
6ec4554
Compare
Add Transformers support for ModelMesh Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Signed-off-by: Chin Huang <chhuang@us.ibm.com>
6ec4554
to
eaddab3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
} | ||
|
||
// add predictor host and protocol to metadata | ||
isvc.ObjectMeta.Annotations["predictor-host"] = predictorURL.Host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinhuang007 Can you help move this to a constant and rename following the internal annotation naming convention
https://github.com/kserve/kserve/blob/master/pkg/constants/constants.go#L80
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chinhuang007, njhill, yuzisun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Can you also update this document with an example of a transformer working with model mesh? |
Sure, will update the doc. |
* Add Transformers support for ModelMesh Add Transformers support for ModelMesh Signed-off-by: Chin Huang <chhuang@us.ibm.com> * add configmap prototype Signed-off-by: Chin Huang <chhuang@us.ibm.com> * Add isvc reconcile login and clean up debug Signed-off-by: Chin Huang <chhuang@us.ibm.com> * Rebase and update based on latest ModelMesh Signed-off-by: Chin Huang <chhuang@us.ibm.com> * Change component reconcile to support retry Signed-off-by: Chin Huang <chhuang@us.ibm.com> * Reduce status comparison scope in ModelMesh mode Signed-off-by: Chin Huang <chhuang@us.ibm.com> Signed-off-by: alexagriffith <agriffith96@gmail.com>
What this PR does / why we need it:
The PR enables transformers to work with ModelMesh predictors in the same InferenceService instance
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):It addresses the issue. The design doc is available here
Type of changes
Feature/Issue validation/testing:
Plan to add or update transformer examples with ModelMesh predictor as a separate PR
Checklist:
Release note: