Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable transformers to work with ModelMesh #2136

Merged
merged 6 commits into from
May 12, 2022

Conversation

chinhuang007
Copy link
Contributor

@chinhuang007 chinhuang007 commented Apr 7, 2022

What this PR does / why we need it:
The PR enables transformers to work with ModelMesh predictors in the same InferenceService instance

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
It addresses the issue. The design doc is available here

Type of changes

  • [x ] New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing:
Plan to add or update transformer examples with ModelMesh predictor as a separate PR

Checklist:

  • [x ] Has code been commented, particularly in hard-to-understand areas?

Release note:

Transformers are now enabled to work with ModelMesh predictors in the same InferenceService instance.

@chinhuang007 chinhuang007 changed the title Support mm transformer Enable transformers to work with ModelMesh Apr 7, 2022
if ok && (deploymentMode == string(constants.ModelMeshDeployment)) {
// Get predictor host and protocol from annotations in modelmesh deployment mode
argumentPredictorHost = metadata.Annotations["predictor-host"]
argumentPredictorProtocol := metadata.Annotations["predictor-protocol"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a protocol field on the InferenceService component spec instead of using the annotation?

Copy link
Member

@pvaneck pvaneck Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzisun I don't believe these annotations are meant for the user to use, but only as an internal intermediary for passing in needed values from the transformer Reconcile function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinhuang007 I think it would be simpler to just add an additional predictorURL arg to this GetContainer function? Then no intermediary annotations would be needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njhill GetContainer is an interface, also implemented in other places. The goal is to minimize interface change therefore use of intermediary annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzisun @njhill @pvaneck The current use of intermediary annotations is aimed to minimize interface changes. Adding a new arg to GetContainer, part of ComponentImplementation interface, would require changes in 30+ files, including predictor, explainer, and a bunch of examples. Not a difficult task, I just wanted to discuss whether it is warranted before making an interface change.

If we decide a new arg should be added, I'd like to make sure the type is correct since the impact is broad. I am currently thinking InferenceServiceStatus might work?

@chinhuang007
Copy link
Contributor Author

The use of annotation is intended to minimize spec and interface changes. We could certainly make a bigger spec change by adding a new field and then have it populated as needed.

Copy link
Member

@pvaneck pvaneck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @chinhuang007. Left some comments.

if ok && (deploymentMode == string(constants.ModelMeshDeployment)) {
// Get predictor host and protocol from annotations in modelmesh deployment mode
argumentPredictorHost = metadata.Annotations["predictor-host"]
argumentPredictorProtocol := metadata.Annotations["predictor-protocol"]
Copy link
Member

@pvaneck pvaneck Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzisun I don't believe these annotations are meant for the user to use, but only as an internal intermediary for passing in needed values from the transformer Reconcile function.

}

isvc.ObjectMeta.Annotations["predictor-host"] = predictorURL.Host
if predictorURL.Scheme == "http" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add the grpc check as the first conditional check since in most cases, the URL from ModelMesh will be gRPC based.

Also, for completeness, check for https as a potential scheme as well.

@@ -109,6 +109,13 @@ var (
DefaultMinReplicas = 1
)

// Predictor Protocol Constants
var (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up that https://github.com/kserve/kserve/pull/2118/files#diff-36043890c52c8201a8bc84238c219be45ce07bf172d89f693c7c54ffe70d046e is defining some protocol constants and will soon be merged, so you can rebase once that is in.

@@ -72,6 +73,23 @@ func (c *CustomTransformer) GetContainer(metadata metav1.ObjectMeta, extensions
container := &c.Containers[0]
argumentPredictorHost := fmt.Sprintf("%s.%s", constants.DefaultPredictorServiceName(metadata.Name), metadata.Namespace)

deploymentMode, ok := metadata.Annotations[constants.DeploymentMode]
logger, _ := pkglogging.NewLogger("", "INFO")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the logging library that is typically used in KServe:

import (
    logf "sigs.k8s.io/controller-runtime/pkg/log"
)

var log = logf.Log.WithName("CustomTransformerReconciler")

Another example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out the common KServe logger. Just updated, please take a look.

Copy link
Contributor

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chinhuang007 looks good, I added a few comments inline.

Comment on lines 91 to 94
container.Args = append(container.Args, []string{
"--protocol",
argumentPredictorProtocol,
}...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
container.Args = append(container.Args, []string{
"--protocol",
argumentPredictorProtocol,
}...)
container.Args = append(container.Args, "--protocol", argumentPredictorProtocol)

if ok && (deploymentMode == string(constants.ModelMeshDeployment)) {
// Get predictor host and protocol from annotations in modelmesh deployment mode
argumentPredictorHost = metadata.Annotations["predictor-host"]
argumentPredictorProtocol := metadata.Annotations["predictor-protocol"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinhuang007 I think it would be simpler to just add an additional predictorURL arg to this GetContainer function? Then no intermediary annotations would be needed.

// check if predictor URL is populated
if isvc.Status.Components["predictor"].URL == nil {
// exit transformer reconcile with an error when predictor URL not populated
return fmt.Errorf("Predictor URL from ModelMesh is not ready")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would want to differentiate this from actual errors since it just means "not ready yet". Instead of returning an error from the top-level reconcile func it would be better to return e.g. ctrl.Result{RequeueAfter: 2 * time.Second} (and could log an info message here).

I guess an additional return value might need to be added to this func to facilitate that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njhill This requires interface changes as well. I will be happy to make such changes if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chinhuang007, looks like you already made the change.

@@ -223,7 +230,7 @@ func (r *InferenceServiceReconciler) updateStatus(desiredService *v1beta1api.Inf
// This is important because the copy we loaded from the informer's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether the DeepEqual comparison above should be reduced in scope to not look at the parts of the status that the modelmesh controller "owns" in the case that this is a modelmesh deployment?

@@ -223,7 +230,7 @@ func (r *InferenceServiceReconciler) updateStatus(desiredService *v1beta1api.Inf
// This is important because the copy we loaded from the informer's
// cache may be stale and we don't want to overwrite a prior update
// to status with this stale state.
} else if err := r.Status().Update(context.TODO(), desiredService); err != nil {
} else if err := r.Status().Patch(context.TODO(), desiredService, client.Merge, &client.PatchOptions{}); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a comment explaining why we're doing a patch rather than update here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to prevent from overriding status when ModelMesh and KServe were not using the same InferenceService status spec. After ModelMesh changed to use the type definitions from KServe, Patch is no longer needed. Will change back to Update.

Copy link
Contributor

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chinhuang007 updates look good! I just made one more small simplification suggestion.

Comment on lines 95 to 107
if isvc.Status.Components["predictor"].URL == nil {
// tansformer reconcile will retry every 3 second until predictor URL is populated
p.Log.Info("Transformer reconciliation is waiting for predictor URL to be populated")
return ctrl.Result{RequeueAfter: 3 * time.Second}, nil
}

// add predictor host and protocol to metadata
predictorURL, err := url.Parse(isvc.Status.Components["predictor"].URL.String())
if err != nil {
return ctrl.Result{}, fmt.Errorf("unable to parse predictor URL: %v", err)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can simplify:

Suggested change
if isvc.Status.Components["predictor"].URL == nil {
// tansformer reconcile will retry every 3 second until predictor URL is populated
p.Log.Info("Transformer reconciliation is waiting for predictor URL to be populated")
return ctrl.Result{RequeueAfter: 3 * time.Second}, nil
}
// add predictor host and protocol to metadata
predictorURL, err := url.Parse(isvc.Status.Components["predictor"].URL.String())
if err != nil {
return ctrl.Result{}, fmt.Errorf("unable to parse predictor URL: %v", err)
}
predictorURL := (*url.URL)(isvc.Status.Components["predictor"].URL)
if predictorURL == nil {
// tansformer reconcile will retry every 3 second until predictor URL is populated
p.Log.Info("Transformer reconciliation is waiting for predictor URL to be populated")
return ctrl.Result{RequeueAfter: 3 * time.Second}, nil
}
// add predictor host and protocol to metadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njhill Good point! Changed as suggested.

@@ -168,7 +175,7 @@ func (r *InferenceServiceReconciler) Reconcile(ctx context.Context, req ctrl.Req
reconcilers = append(reconcilers, components.NewExplainer(r.Client, r.Scheme, isvcConfig))
}
for _, reconciler := range reconcilers {
if err := reconciler.Reconcile(isvc); err != nil {
if _, err := reconciler.Reconcile(isvc); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is the Result with the RequeueAfter propagated out? Seems like we would just ignore the returned Requeue Result here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Add code to propagate requeue result to isvc reconcile.

// check if predictor URL is populated
predictorURL := (*url.URL)(isvc.Status.Components["predictor"].URL)
if predictorURL == nil {
// tansformer reconcile will retry every 3 second until predictor URL is populated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: transformer typo

@chinhuang007 chinhuang007 force-pushed the support-mm-transformer branch 3 times, most recently from 6f32e8f to 6ec4554 Compare April 27, 2022 00:38
@yuzisun yuzisun mentioned this pull request May 1, 2022
15 tasks
Add Transformers support for ModelMesh

Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Copy link
Member

@pvaneck pvaneck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

}

// add predictor host and protocol to metadata
isvc.ObjectMeta.Annotations["predictor-host"] = predictorURL.Host
Copy link
Member

@yuzisun yuzisun May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinhuang007 Can you help move this to a constant and rename following the internal annotation naming convention
https://github.com/kserve/kserve/blob/master/pkg/constants/constants.go#L80

@njhill
Copy link
Contributor

njhill commented May 11, 2022

/lgtm

@yuzisun
Copy link
Member

yuzisun commented May 12, 2022

/approve

@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chinhuang007, njhill, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nishank-lily
Copy link

Can you also update this document with an example of a transformer working with model mesh?
https://kserve.github.io/website/0.9/modelserving/mms/modelmesh/overview/#kserve-integration
This still says transformers do not yet work with ModelMesh.

@chinhuang007
Copy link
Contributor Author

Sure, will update the doc.

alexagriffith pushed a commit to alexagriffith/kserve that referenced this pull request Sep 19, 2022
* Add Transformers support for ModelMesh

Add Transformers support for ModelMesh

Signed-off-by: Chin Huang <chhuang@us.ibm.com>

* add configmap prototype

Signed-off-by: Chin Huang <chhuang@us.ibm.com>

* Add isvc reconcile login and clean up debug

Signed-off-by: Chin Huang <chhuang@us.ibm.com>

* Rebase and update based on latest ModelMesh

Signed-off-by: Chin Huang <chhuang@us.ibm.com>

* Change component reconcile to support retry

Signed-off-by: Chin Huang <chhuang@us.ibm.com>

* Reduce status comparison scope in ModelMesh mode

Signed-off-by: Chin Huang <chhuang@us.ibm.com>
Signed-off-by: alexagriffith <agriffith96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants