[RHOAIENG-1051] - update opendatahub folder to kustomize #280

spolti · 2024-02-21T14:09:04Z

chore: opendatahub folder have
- quickstart
- docs
- manifests/scripts to support fvt After transition manifests are merged, the way to deploy modelmesh-serving is changed so we should update all related one.

How to test:

checkout https://github.com/spolti/modelmesh-serving/tree/RHOAIENG-1051
access the opendatahub/quickstart/basic
run ./deploy

PR checklist

Checklist items below are applicable for development targeted to both fast and stable branches/tags

Unit tests pass locally
FVT tests pass locally
If the PR adds a new container image or updates the tag of an existing image (not build within cpaas), is the corresponding change made in live-builder and cpaas-midstream to add/update the image tag in the operator CSV? Link the PRs if applicable

Checklist items below are applicable for development targeted to both fast and stable branches/tags

Tested modelmesh serving deployment with odh-manifests and ran odh-manifests-e2e tests locally

Jooho · 2024-03-19T17:39:43Z

@spolti I followed the description and hit errors:

./deploy.sh: line 4: /tmp/modelmesh-serving/opendatahub/quickstart/scripts/utils.sh: No such file or directory
make: *** No rule to make target 'deploy-mm-for-odh'.  Stop.
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
No resources found
Error from server (NotFound): namespaces "model-serving" not found
Now using project "model-serving" on server "https://api.jooho.n1ai.p3.openshiftapps.com:443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app rails-postgresql-example

to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:

    kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.43 -- /agnhost serve-hostname

error: the path "/tmp/modelmesh-serving/opendatahub/quickstart/basic/common_manifests/openvino-serving-runtime.yaml" does not exist
error: the path "/tmp/modelmesh-serving/opendatahub/quickstart/basic/common_manifests/openvino-inference-service.yaml" does not exist
./deploy.sh: line 28: wait_for_pods_ready: command not found
./deploy.sh: line 30: success: command not found

Jooho · 2024-03-19T17:40:50Z

@spolti Basic folder is only one to test or are there any other folders?

spolti · 2024-03-19T19:25:15Z

@Jooho it should be working now, not sure why it happened, but was able to run previously locally.

Jooho · 2024-03-19T19:58:26Z

now, it removed the modelmesh-serving folder itself.

  [RHOAIENG-1051](https://issues.redhat.com//browse/RHOAIENG-1051)  /tmp/modelmesh-serving/opendatahub/quickstart/basic                                                                   15:57:06  jooho 
❯ ./deploy.sh 
./opendatahub/scripts/install_odh.sh 
.. Downloading binaries
.. Creating a bin folder
yq already installed.
Installing kustomize.
curl: (22) The requested URL returned error: 404
tar: /tmp/kustomize.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
mv: cannot stat '/tmp/kustomize': No such file or directory
rm: cannot remove '/tmp/kustomize.tar.gz': No such file or directory
installed kustomize version v5.3.0
Delete the exising /tmp/modelmesh-e2e folder
Creating a /tmp/modelmesh-e2e folder
Already on project "opendatahub" on server "https://api.jooho.n1ai.p3.openshiftapps.com:443".
.. Archiving odh-manifests
cp: cannot stat '/tmp/modelmesh-serving/opendatahub/scripts/../../config': No such file or directory
cp: cannot stat '/tmp/modelmesh-serving/opendatahub/scripts/manifests': No such file or directory
Latest Manifest will be used, fast tag
.. Deploying ModelMesh with kustomize
./opendatahub/scripts/install_odh.sh: line 158: /tmp/modelmesh-serving/config/overlays/odh/params.env: No such file or directory
params.env:
cat: /tmp/modelmesh-serving/config/overlays/odh/params.env: No such file or directory

installing namespaced rbac
2024/03/19 15:57:09 unable to make loader at '.'; not a valid directory: abs path error on '.' : getwd: no such file or directory
error: no objects passed to apply
cat: /tmp/modelmesh-serving/config/namespace-runtimes/kustomization.yaml: No such file or directory
2024/03/19 15:57:09 unable to make loader at '.'; not a valid directory: abs path error on '.' : getwd: no such file or directory
^Cmake: *** [Makefile:297: deploy-mm-for-odh] Interrupt

cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
cd: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory

please double check rm command.

chore: opendatahub folder have - quickstart - docs - manifests/scripts to support fvt After transition manifests are merged, the way to deploy modelmesh-serving is changed so we should update all related one. Signed-off-by: Spolti <fspolti@redhat.com>

Jooho · 2024-04-02T18:14:09Z

/retest

@spolti With this pr, what should I test?
3 quickstarts ?

basic
hpa
PVC

anything else?

spolti · 2024-04-02T18:17:50Z

only these 3 should be fine.

Jooho · 2024-04-02T19:36:45Z

opendatahub/scripts/cleanup.sh

-  kustomize build  ${MANIFESTS_DIR}/runtimes/ |oc delete -f -
-  oc delete pvc,pod --all --force -n modelmesh-serving
-  oc delete ns $namespace
+  kustomize build  "${MANIFESTS_DIR}"/runtimes/ |oc delete -f -


this one should be moved after line number 35 because of crd creation.

Jooho · 2024-04-02T20:16:08Z

basic/hpa worked but pvc failed.

modelmesh-controller failed

{"level":"info","ts":"2024-04-02T20:12:53Z","logger":"setup","msg":"MMesh Configuration","serviceName":"modelmesh-serving","port":8033,"mmeshEndpoint":""}
{"level":"error","ts":"2024-04-02T20:12:53Z","logger":"setup","msg":"In cluster scope mode but controller does not have cluster scope permissions, exiting","stacktrace":"main.main\n\t/go/src/github.com/opendatahub-io/modelmesh-serving/main.go:210\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}

Latest uses go1.22 while we use 1.20 Signed-off-by: Spolti <fspolti@redhat.com>

VedantMahabaleshwarkar · 2024-04-05T18:35:06Z

Is this quickstart still deploying model-mesh?
The current issue with these set of quickstarts, in my opinion, is that the deploy.sh script looks to install model-mesh, which is not what I have found I needed whenever I looked to these quickstarts. Most of the time I find myself looking at these quickstarts, I already have an ODH installation, and what I'm looking for is a 1 line command to deploy a sample inferenceservice.
IMO, these quickstarts should operate under the assumption that ODH is already installed, and only focus on deploying the models and performing inference. Or, at least there should be an option for this in addition to what already exists.

WDYT @Jooho @israel-hdez @spolti @danielezonca

spolti · 2024-04-05T21:16:30Z

FVT is working fine, just one test is failing, seems to be a issue with it:

Scaling of runtime deployments with HPA Autoscaler when there are no predictors Scale all runtimes down after a created test predictor is deleted
/Users/fspolti/data/dev/sources/modelmesh-serving/fvt/hpa/hpa_test.go:149
  2024-04-05T17:45:14-03:00	INFO	Delete all predictors ...
  2024-04-05T17:45:17-03:00	INFO	Watcher got event with object	{"name": "modelmesh-serving-mlserver-1.x", "replicas": 0, "available": 0, "updated": 0}
  2024-04-05T17:45:17-03:00	INFO	deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:false modelmesh-serving-torchserve-0.x:false modelmesh-serving-triton-2.x:false]
  2024-04-05T17:45:17-03:00	INFO	Watcher got event with object	{"name": "modelmesh-serving-ovms-1.x", "replicas": 0, "available": 0, "updated": 0}
  2024-04-05T17:45:17-03:00	INFO	deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:false modelmesh-serving-triton-2.x:false]
  2024-04-05T17:45:17-03:00	INFO	Watcher got event with object	{"name": "modelmesh-serving-torchserve-0.x", "replicas": 0, "available": 0, "updated": 0}
  2024-04-05T17:45:17-03:00	INFO	deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:false]
  2024-04-05T17:45:17-03:00	INFO	Watcher got event with object	{"name": "modelmesh-serving-triton-2.x", "replicas": 0, "available": 0, "updated": 0}
  2024-04-05T17:45:17-03:00	INFO	deployStatusesReady: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:true]
  2024-04-05T17:45:17-03:00	INFO	All deployments are ready: map[modelmesh-serving-mlserver-1.x:true modelmesh-serving-ovms-1.x:true modelmesh-serving-torchserve-0.x:true modelmesh-serving-triton-2.x:true]
  2024-04-05T17:45:27-03:00	INFO	Timed out after 10s without events
  STEP: Creating a test predictor for one Runtime @ 04/05/24 17:45:27.361
  STEP: Creating predictor mlserver-sklearn-mnist-svm-gpqgm @ 04/05/24 17:45:27.361
  STEP: Waiting for predictor mlserver-sklearn-mnist-svm-gpqgm to be 'Loaded' @ 04/05/24 17:45:27.665
  2024-04-05T17:45:27-03:00	INFO	Watcher got event with object	{"name": "mlserver-sklearn-mnist-svm-gpqgm", "status.available": false, "status.activeModelState": "Pending", "status.targetModelState": "", "status.transitionStatus": "UpToDate", "status.lastFailureInfo": null}
  2024-04-05T17:45:27-03:00	INFO	Watcher got event with object	{"name": "mlserver-sklearn-mnist-svm-gpqgm", "status.available": false, "status.activeModelState": "Pending", "status.targetModelState": "", "status.transitionStatus": "UpToDate", "status.lastFailureInfo": {"message":"Waiting for runtime Pod to become available","modelId":"mlserver-sklearn-mnist-svm-gpqgm__ksp-b20a0c5aca","reason":"RuntimeUnhealthy"}}

  [FAILED] in [It] - /Users/fspolti/data/dev/sources/modelmesh-serving/fvt/helpers.go:355 @ 04/05/24 17:47:27.664
  2024-04-05T17:47:27-03:00	INFO	Running command	{"args": "kubectl get predictors -n model-serving"}
  =====================================================================================================================================
  NAME                               TYPE      AVAILABLE   ACTIVEMODEL   TARGETMODEL   TRANSITION   AGE
  mlserver-sklearn-mnist-svm-gpqgm   sklearn   false       Pending                     UpToDate     2m3s

israel-hdez · 2024-04-08T23:57:36Z

Is this quickstart still deploying model-mesh? The current issue with these set of quickstarts, in my opinion, is that the deploy.sh script looks to install model-mesh, which is not what I have found I needed whenever I looked to these quickstarts. Most of the time I find myself looking at these quickstarts, I already have an ODH installation, and what I'm looking for is a 1 line command to deploy a sample inferenceservice. IMO, these quickstarts should operate under the assumption that ODH is already installed, and only focus on deploying the models and performing inference. Or, at least there should be an option for this in addition to what already exists.

WDYT @Jooho @israel-hdez @spolti @danielezonca

Note that I haven't reviewed the code and I think I don't have full context. So, just trying to answer...

In most projects I have played with, quickstarts assume a clean environment and install everything to quickly give you a working setup. Also, usually quickstarts are for trying the project (i.e. non-production, demos). Because of this, a quickstart don't let you customize the setup (that's left to the official installer).

I would agree that deploying a sample model should be left to a different script than the quickstart setup, although it could be part of the same doc page (probably arranged like a tutorial).

That said, I also didn't use too much the quickstarts because what I personally remember is that, rather than providing you with a setup that you can play with, I think it prepared the env more like a demo that is also suited for running FVTs/CI... and I had to spend time "cleaning" the env. This is different from your case: you just want to deploy a sample model on an existent setup, while I wanted to quickly have a base setup (without additional stuff) to try my own models. ... but don't trust me about this (I may be remembering incorrectly why I didn't use the quickstarts that much).

Jooho

/lgtm

I did e2e test and it works

CONTROLLERNAMESPACE=opendatahub NAMESPACE=modelmesh-serving NAMESPACESCOPEMODE=true make e2e-test-for-odh
...
Ginkgo ran 1 suite in 3m56.85221035s
Test Suite Passed
Passed fvt/hpa. Move on the next test
[SUCCESS] FVT Test Passed!

openshift-ci · 2024-04-09T18:34:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jooho, spolti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Jooho,spolti]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

#### Motivation Support for TorchServe was added in opendatahub-io#250 and kserve/modelmesh-runtime-adapter#34. A test should be added for it as well. #### Modifications - Adds basic FVT for load/inference with a TorchServe MAR model using the native TorchServe gRPC API - Disables OVMS runtime and tests to allow TorchServe to be tested due to resource constraints #### Result Closes opendatahub-io#280 Signed-off-by: Rafael Vasquez <raf.vasquez@ibm.com>

spolti requested review from davidesalerno and Jooho February 21, 2024 14:09

spolti marked this pull request as draft February 21, 2024 14:09

openshift-ci bot added the do-not-merge/work-in-progress label Feb 21, 2024

openshift-ci bot requested a review from terrytangyuan February 21, 2024 14:09

openshift-ci bot added the approved label Feb 21, 2024

spolti force-pushed the RHOAIENG-1051 branch from 5006dce to d9062ff Compare March 14, 2024 20:02

spolti marked this pull request as ready for review March 14, 2024 20:04

openshift-ci bot removed the do-not-merge/work-in-progress label Mar 14, 2024

openshift-ci bot requested review from israel-hdez and VedantMahabaleshwarkar March 14, 2024 20:04

spolti force-pushed the RHOAIENG-1051 branch from d9062ff to cea0ff8 Compare March 19, 2024 19:24

spolti force-pushed the RHOAIENG-1051 branch from cea0ff8 to 4a8ab64 Compare March 19, 2024 21:17

spolti force-pushed the RHOAIENG-1051 branch from 4a8ab64 to 77fc435 Compare March 22, 2024 16:45

Jooho reviewed Apr 2, 2024

View reviewed changes

pins k8s env test to concrete version

154657c

Latest uses go1.22 while we use 1.20 Signed-off-by: Spolti <fspolti@redhat.com>

Jooho approved these changes Apr 9, 2024

View reviewed changes

openshift-ci bot assigned Jooho Apr 9, 2024

openshift-ci bot added the lgtm label Apr 9, 2024

openshift-merge-bot bot merged commit 0917661 into opendatahub-io:main Apr 9, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RHOAIENG-1051] - update opendatahub folder to kustomize #280

[RHOAIENG-1051] - update opendatahub folder to kustomize #280

spolti commented Feb 21, 2024 •

edited

Jooho commented Mar 19, 2024

Jooho commented Mar 19, 2024

spolti commented Mar 19, 2024

Jooho commented Mar 19, 2024 •

edited by openshift-ci bot

Jooho commented Apr 2, 2024

spolti commented Apr 2, 2024

Jooho Apr 2, 2024

Jooho commented Apr 2, 2024

VedantMahabaleshwarkar commented Apr 5, 2024

spolti commented Apr 5, 2024

israel-hdez commented Apr 8, 2024

Jooho left a comment

openshift-ci bot commented Apr 9, 2024

[RHOAIENG-1051] - update opendatahub folder to kustomize #280

[RHOAIENG-1051] - update opendatahub folder to kustomize #280

Conversation

spolti commented Feb 21, 2024 • edited

PR checklist

Jooho commented Mar 19, 2024

Jooho commented Mar 19, 2024

spolti commented Mar 19, 2024

Jooho commented Mar 19, 2024 • edited by openshift-ci bot

Jooho commented Apr 2, 2024

spolti commented Apr 2, 2024

Jooho Apr 2, 2024

Choose a reason for hiding this comment

Jooho commented Apr 2, 2024

VedantMahabaleshwarkar commented Apr 5, 2024

spolti commented Apr 5, 2024

israel-hdez commented Apr 8, 2024

Jooho left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Apr 9, 2024

spolti commented Feb 21, 2024 •

edited

Jooho commented Mar 19, 2024 •

edited by openshift-ci bot