Enable transformers to work with ModelMesh #2136

chinhuang007 · 2022-04-07T19:14:24Z

What this PR does / why we need it:
The PR enables transformers to work with ModelMesh predictors in the same InferenceService instance

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
It addresses the issue. The design doc is available here

Type of changes

[x ] New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing:
Plan to add or update transformer examples with ModelMesh predictor as a separate PR

Checklist:

[x ] Has code been commented, particularly in hard-to-understand areas?

Release note:

Transformers are now enabled to work with ModelMesh predictors in the same InferenceService instance.

yuzisun · 2022-04-10T01:06:09Z

pkg/apis/serving/v1beta1/transformer_custom.go

+	if ok && (deploymentMode == string(constants.ModelMeshDeployment)) {
+		// Get predictor host and protocol from annotations in modelmesh deployment mode
+		argumentPredictorHost = metadata.Annotations["predictor-host"]
+		argumentPredictorProtocol := metadata.Annotations["predictor-protocol"]


Should we add a protocol field on the InferenceService component spec instead of using the annotation?

@yuzisun I don't believe these annotations are meant for the user to use, but only as an internal intermediary for passing in needed values from the transformer Reconcile function.

@chinhuang007 I think it would be simpler to just add an additional predictorURL arg to this GetContainer function? Then no intermediary annotations would be needed.

@njhill GetContainer is an interface, also implemented in other places. The goal is to minimize interface change therefore use of intermediary annotations.

@yuzisun @njhill @pvaneck The current use of intermediary annotations is aimed to minimize interface changes. Adding a new arg to GetContainer, part of ComponentImplementation interface, would require changes in 30+ files, including predictor, explainer, and a bunch of examples. Not a difficult task, I just wanted to discuss whether it is warranted before making an interface change.

If we decide a new arg should be added, I'd like to make sure the type is correct since the impact is broad. I am currently thinking InferenceServiceStatus might work?

chinhuang007 · 2022-04-11T21:08:41Z

The use of annotation is intended to minimize spec and interface changes. We could certainly make a bigger spec change by adding a new field and then have it populated as needed.

pvaneck

Thanks, @chinhuang007. Left some comments.

pvaneck · 2022-04-13T00:31:22Z

pkg/apis/serving/v1beta1/transformer_custom.go

+	if ok && (deploymentMode == string(constants.ModelMeshDeployment)) {
+		// Get predictor host and protocol from annotations in modelmesh deployment mode
+		argumentPredictorHost = metadata.Annotations["predictor-host"]
+		argumentPredictorProtocol := metadata.Annotations["predictor-protocol"]


@yuzisun I don't believe these annotations are meant for the user to use, but only as an internal intermediary for passing in needed values from the transformer Reconcile function.

pvaneck · 2022-04-13T00:34:51Z

pkg/controller/v1beta1/inferenceservice/components/transformer.go

+		}
+
+		isvc.ObjectMeta.Annotations["predictor-host"] = predictorURL.Host
+		if predictorURL.Scheme == "http" {


nit: add the grpc check as the first conditional check since in most cases, the URL from ModelMesh will be gRPC based.

Also, for completeness, check for https as a potential scheme as well.

pvaneck · 2022-04-13T00:47:36Z

pkg/constants/constants.go

@@ -109,6 +109,13 @@ var (
 	DefaultMinReplicas  = 1
 )

+// Predictor Protocol Constants
+var (


Heads up that https://github.com/kserve/kserve/pull/2118/files#diff-36043890c52c8201a8bc84238c219be45ce07bf172d89f693c7c54ffe70d046e is defining some protocol constants and will soon be merged, so you can rebase once that is in.

pvaneck · 2022-04-13T01:58:46Z

pkg/apis/serving/v1beta1/transformer_custom.go

@@ -72,6 +73,23 @@ func (c *CustomTransformer) GetContainer(metadata metav1.ObjectMeta, extensions
 	container := &c.Containers[0]
 	argumentPredictorHost := fmt.Sprintf("%s.%s", constants.DefaultPredictorServiceName(metadata.Name), metadata.Namespace)

+	deploymentMode, ok := metadata.Annotations[constants.DeploymentMode]
+	logger, _ := pkglogging.NewLogger("", "INFO")


I think we should use the logging library that is typically used in KServe:

import ( logf "sigs.k8s.io/controller-runtime/pkg/log" ) var log = logf.Log.WithName("CustomTransformerReconciler")

Another example.

Thanks for pointing out the common KServe logger. Just updated, please take a look.

njhill

Thanks @chinhuang007 looks good, I added a few comments inline.

njhill · 2022-04-19T20:19:40Z

pkg/apis/serving/v1beta1/transformer_custom.go

+			container.Args = append(container.Args, []string{
+				"--protocol",
+				argumentPredictorProtocol,
+			}...)


Suggested change

container.Args = append(container.Args, []string{

"--protocol",

argumentPredictorProtocol,

}...)

container.Args = append(container.Args, "--protocol", argumentPredictorProtocol)

njhill · 2022-04-19T20:25:14Z

pkg/apis/serving/v1beta1/transformer_custom.go

+	if ok && (deploymentMode == string(constants.ModelMeshDeployment)) {
+		// Get predictor host and protocol from annotations in modelmesh deployment mode
+		argumentPredictorHost = metadata.Annotations["predictor-host"]
+		argumentPredictorProtocol := metadata.Annotations["predictor-protocol"]


@chinhuang007 I think it would be simpler to just add an additional predictorURL arg to this GetContainer function? Then no intermediary annotations would be needed.

njhill · 2022-04-19T20:28:39Z

pkg/controller/v1beta1/inferenceservice/components/transformer.go

+		// check if predictor URL is populated
+		if isvc.Status.Components["predictor"].URL == nil {
+			// exit transformer reconcile with an error when predictor URL not populated
+			return fmt.Errorf("Predictor URL from ModelMesh is not ready")


I think we would want to differentiate this from actual errors since it just means "not ready yet". Instead of returning an error from the top-level reconcile func it would be better to return e.g. ctrl.Result{RequeueAfter: 2 * time.Second} (and could log an info message here).

I guess an additional return value might need to be added to this func to facilitate that.

@njhill This requires interface changes as well. I will be happy to make such changes if necessary.

Thanks @chinhuang007, looks like you already made the change.

njhill · 2022-04-19T20:36:30Z

pkg/controller/v1beta1/inferenceservice/controller.go

@@ -223,7 +230,7 @@ func (r *InferenceServiceReconciler) updateStatus(desiredService *v1beta1api.Inf
 		// This is important because the copy we loaded from the informer's


Not sure whether the DeepEqual comparison above should be reduced in scope to not look at the parts of the status that the modelmesh controller "owns" in the case that this is a modelmesh deployment?

njhill · 2022-04-19T20:36:53Z

pkg/controller/v1beta1/inferenceservice/controller.go

@@ -223,7 +230,7 @@ func (r *InferenceServiceReconciler) updateStatus(desiredService *v1beta1api.Inf
 		// This is important because the copy we loaded from the informer's
 		// cache may be stale and we don't want to overwrite a prior update
 		// to status with this stale state.
-	} else if err := r.Status().Update(context.TODO(), desiredService); err != nil {
+	} else if err := r.Status().Patch(context.TODO(), desiredService, client.Merge, &client.PatchOptions{}); err != nil {


Could add a comment explaining why we're doing a patch rather than update here?

This is needed to prevent from overriding status when ModelMesh and KServe were not using the same InferenceService status spec. After ModelMesh changed to use the type definitions from KServe, Patch is no longer needed. Will change back to Update.

njhill

Thanks @chinhuang007 updates look good! I just made one more small simplification suggestion.

njhill · 2022-04-26T21:26:27Z

pkg/controller/v1beta1/inferenceservice/components/transformer.go

+		if isvc.Status.Components["predictor"].URL == nil {
+			// tansformer reconcile will retry every 3 second until predictor URL is populated
+			p.Log.Info("Transformer reconciliation is waiting for predictor URL to be populated")
+			return ctrl.Result{RequeueAfter: 3 * time.Second}, nil
+		}
+
+		// add predictor host and protocol to metadata
+		predictorURL, err := url.Parse(isvc.Status.Components["predictor"].URL.String())
+		if err != nil {
+			return ctrl.Result{}, fmt.Errorf("unable to parse predictor URL: %v", err)
+		}
+


can simplify:

Suggested change

if isvc.Status.Components["predictor"].URL == nil {

// tansformer reconcile will retry every 3 second until predictor URL is populated

p.Log.Info("Transformer reconciliation is waiting for predictor URL to be populated")

return ctrl.Result{RequeueAfter: 3 * time.Second}, nil

}

// add predictor host and protocol to metadata

predictorURL, err := url.Parse(isvc.Status.Components["predictor"].URL.String())

if err != nil {

return ctrl.Result{}, fmt.Errorf("unable to parse predictor URL: %v", err)

}

predictorURL := (*url.URL)(isvc.Status.Components["predictor"].URL)

if predictorURL == nil {

// tansformer reconcile will retry every 3 second until predictor URL is populated

p.Log.Info("Transformer reconciliation is waiting for predictor URL to be populated")

return ctrl.Result{RequeueAfter: 3 * time.Second}, nil

}

// add predictor host and protocol to metadata

@njhill Good point! Changed as suggested.

pvaneck · 2022-04-26T22:02:40Z

pkg/controller/v1beta1/inferenceservice/controller.go

@@ -168,7 +175,7 @@ func (r *InferenceServiceReconciler) Reconcile(ctx context.Context, req ctrl.Req
 		reconcilers = append(reconcilers, components.NewExplainer(r.Client, r.Scheme, isvcConfig))
 	}
 	for _, reconciler := range reconcilers {
-		if err := reconciler.Reconcile(isvc); err != nil {
+		if _, err := reconciler.Reconcile(isvc); err != nil {


Question: is the Result with the RequeueAfter propagated out? Seems like we would just ignore the returned Requeue Result here.

Good catch! Add code to propagate requeue result to isvc reconcile.

pvaneck · 2022-04-26T22:04:17Z

pkg/controller/v1beta1/inferenceservice/components/transformer.go

+		// check if predictor URL is populated
+		predictorURL := (*url.URL)(isvc.Status.Components["predictor"].URL)
+		if predictorURL == nil {
+			// tansformer reconcile will retry every 3 second until predictor URL is populated


nit: transformer typo

Add Transformers support for ModelMesh Signed-off-by: Chin Huang <chhuang@us.ibm.com>

Signed-off-by: Chin Huang <chhuang@us.ibm.com>

pvaneck

/lgtm

yuzisun · 2022-05-11T23:05:26Z

pkg/controller/v1beta1/inferenceservice/components/transformer.go

+		}
+
+		// add predictor host and protocol to metadata
+		isvc.ObjectMeta.Annotations["predictor-host"] = predictorURL.Host


@chinhuang007 Can you help move this to a constant and rename following the internal annotation naming convention
https://github.com/kserve/kserve/blob/master/pkg/constants/constants.go#L80

njhill · 2022-05-11T23:18:28Z

/lgtm

yuzisun · 2022-05-12T02:57:22Z

/approve

kserve-oss-bot · 2022-05-12T02:57:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chinhuang007, njhill, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuzisun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

nishank-lily · 2022-07-28T18:52:19Z

Can you also update this document with an example of a transformer working with model mesh?
https://kserve.github.io/website/0.9/modelserving/mms/modelmesh/overview/#kserve-integration
This still says transformers do not yet work with ModelMesh.

chinhuang007 · 2022-07-28T18:58:44Z

Sure, will update the doc.

* Add Transformers support for ModelMesh Add Transformers support for ModelMesh Signed-off-by: Chin Huang <chhuang@us.ibm.com> * add configmap prototype Signed-off-by: Chin Huang <chhuang@us.ibm.com> * Add isvc reconcile login and clean up debug Signed-off-by: Chin Huang <chhuang@us.ibm.com> * Rebase and update based on latest ModelMesh Signed-off-by: Chin Huang <chhuang@us.ibm.com> * Change component reconcile to support retry Signed-off-by: Chin Huang <chhuang@us.ibm.com> * Reduce status comparison scope in ModelMesh mode Signed-off-by: Chin Huang <chhuang@us.ibm.com> Signed-off-by: alexagriffith <agriffith96@gmail.com>

kserve-oss-bot requested review from Iamlovingit and njhill April 7, 2022 19:14

chinhuang007 changed the title ~~Support mm transformer~~ Enable transformers to work with ModelMesh Apr 7, 2022

yuzisun reviewed Apr 10, 2022

View reviewed changes

pvaneck reviewed Apr 13, 2022

View reviewed changes

chinhuang007 force-pushed the support-mm-transformer branch from 9ee794b to 9bfae2b Compare April 13, 2022 23:01

njhill reviewed Apr 19, 2022

View reviewed changes

chinhuang007 force-pushed the support-mm-transformer branch from 9bfae2b to b050bf8 Compare April 19, 2022 20:53

njhill reviewed Apr 26, 2022

View reviewed changes

chinhuang007 force-pushed the support-mm-transformer branch from c8bcb52 to 9b880ab Compare April 26, 2022 21:50

pvaneck reviewed Apr 26, 2022

View reviewed changes

chinhuang007 force-pushed the support-mm-transformer branch 3 times, most recently from 6f32e8f to 6ec4554 Compare April 27, 2022 00:38

yuzisun mentioned this pull request May 1, 2022

KServe 0.9 release tracking #2168

Closed

15 tasks

chinhuang007 added 6 commits May 5, 2022 22:07

Add Transformers support for ModelMesh

81f9377

Add Transformers support for ModelMesh Signed-off-by: Chin Huang <chhuang@us.ibm.com>

add configmap prototype

eedbe8f

Signed-off-by: Chin Huang <chhuang@us.ibm.com>

Add isvc reconcile login and clean up debug

fb92047

Signed-off-by: Chin Huang <chhuang@us.ibm.com>

Rebase and update based on latest ModelMesh

bf769b8

Signed-off-by: Chin Huang <chhuang@us.ibm.com>

Change component reconcile to support retry

921cf87

Signed-off-by: Chin Huang <chhuang@us.ibm.com>

Reduce status comparison scope in ModelMesh mode

eaddab3

Signed-off-by: Chin Huang <chhuang@us.ibm.com>

chinhuang007 force-pushed the support-mm-transformer branch from 6ec4554 to eaddab3 Compare May 5, 2022 22:08

pvaneck reviewed May 10, 2022

View reviewed changes

kserve-oss-bot assigned pvaneck May 10, 2022

kserve-oss-bot added the lgtm label May 10, 2022

yuzisun reviewed May 11, 2022

View reviewed changes

kserve-oss-bot assigned njhill May 11, 2022

njhill approved these changes May 11, 2022

View reviewed changes

kserve-oss-bot added the approved label May 12, 2022

kserve-oss-bot merged commit 2533040 into kserve:master May 12, 2022

njhill mentioned this pull request May 23, 2022

Multi-model serving transformers support #1654

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable transformers to work with ModelMesh #2136

Enable transformers to work with ModelMesh #2136

chinhuang007 commented Apr 7, 2022 •

edited

yuzisun Apr 10, 2022

pvaneck Apr 13, 2022 •

edited

njhill Apr 19, 2022

chinhuang007 Apr 19, 2022

chinhuang007 Apr 22, 2022

chinhuang007 commented Apr 11, 2022

pvaneck left a comment

pvaneck Apr 13, 2022 •

edited

pvaneck Apr 13, 2022

pvaneck Apr 13, 2022

pvaneck Apr 13, 2022

chinhuang007 Apr 13, 2022

njhill left a comment

njhill Apr 19, 2022

njhill Apr 19, 2022

njhill Apr 19, 2022

chinhuang007 Apr 20, 2022

njhill Apr 26, 2022

njhill Apr 19, 2022

njhill Apr 19, 2022

chinhuang007 Apr 20, 2022

njhill left a comment

njhill Apr 26, 2022

chinhuang007 Apr 26, 2022

pvaneck Apr 26, 2022

chinhuang007 Apr 27, 2022

pvaneck Apr 26, 2022

pvaneck left a comment

yuzisun May 11, 2022 •

edited

njhill commented May 11, 2022

yuzisun commented May 12, 2022

kserve-oss-bot commented May 12, 2022

nishank-lily commented Jul 28, 2022

chinhuang007 commented Jul 28, 2022

		@@ -223,7 +230,7 @@ func (r InferenceServiceReconciler) updateStatus(desiredService v1beta1api.Inf
		// This is important because the copy we loaded from the informer's

Enable transformers to work with ModelMesh #2136

Enable transformers to work with ModelMesh #2136

Conversation

chinhuang007 commented Apr 7, 2022 • edited

Choose a reason for hiding this comment

pvaneck Apr 13, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chinhuang007 commented Apr 11, 2022

pvaneck left a comment

Choose a reason for hiding this comment

pvaneck Apr 13, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njhill left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njhill left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pvaneck left a comment

Choose a reason for hiding this comment

yuzisun May 11, 2022 • edited

Choose a reason for hiding this comment

njhill commented May 11, 2022

yuzisun commented May 12, 2022

kserve-oss-bot commented May 12, 2022

nishank-lily commented Jul 28, 2022

chinhuang007 commented Jul 28, 2022

chinhuang007 commented Apr 7, 2022 •

edited

pvaneck Apr 13, 2022 •

edited

pvaneck Apr 13, 2022 •

edited

yuzisun May 11, 2022 •

edited