kfserver add load/unload endpoint #1082

wengyao04 · 2020-09-04T20:37:17Z

What this PR does / why we need it:
add load/unload endpoint for KFModel which are used from Multi Model Sharing

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # #1043

Special notes for your reviewer:

The load/unload methods only work for single process single thread, we have an issue #1074 to investigate parallelism in KFServing python model server

move sklearn and xgboost model to kfserving package

add class KFModelRepository with the following methods

get_models()
get_model()
update()
async load()
unload()

add class KFModelFactory, which is used to create KFModel based on model suffix
add LoadHandler for endpoint v1/models/${MODEL_NAME}/load
add UnloadHandler for endpoint v1/models/${MODEL_NAME}/unload
Test sklearn/xgboost (run docker locally)
- single model case: does not break
- multi model case:
  - able to load/unload model
  - able to list models
  - able to predict if model exist, and return expected error when model does not exist
Test pytorch (we don't support pythorch in MMS POC):
- single model case: does not break

Release note:

add load/unload endpoint for KFModel which are used from Multi Model Sharing

kubeflow-bot · 2020-09-04T20:37:23Z

This change is

k8s-ci-robot · 2020-09-04T20:37:28Z

Hi @wengyao04. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

yuzliu · 2020-09-04T21:22:44Z

/assign @cliveseldon

yuzisun · 2020-09-04T22:05:10Z

python/kfserving/kfserving/kfmodel_factory.py

@@ -0,0 +1,39 @@
+# Copyright 2019 kubeflow.org.


nit: 2020
you can run the script under hack/boilterplate.sh

yuzisun · 2020-09-04T22:14:31Z

python/kfserving/kfserving/kfmodel_repository.py

+        model_type, model_full_path = get_kfmodel_type(name, self.models_dir)
+
+        model = KFModelFactory.create_model(name, self.models_dir, model_type)
+        model.set_full_model_path(model_full_path)


Can we call set_full_model_path in create_model?

Ditto can line 47 to 50 be aggregated into one line like

model = KFModelFactory.create_model(name, self.models_dir, model_type)

Also should full_model_path be passed in init?

yuzisun · 2020-09-04T22:16:22Z

python/kfserving/kfserving/kfmodels/kfmodel_types.py

+            path = os.path.join(model_dir, model_name + extension)
+            if os.path.exists(path):
+                return model_type, path
+    return None, ""


suggest throw exception instead of returning None

agreed, def throw an exception

yuzisun · 2020-09-04T22:17:53Z

python/kfserving/kfserving/kfserver.py

+    async def post(self, name: str):
+        try:
+            loop = asyncio.get_running_loop()
+            ready = loop.run_until_complete(self.models.load(name))


I thought we wanted to make this async and use await here.

I prefer to get the future till it is finished to ensure model is loaded successfully

@wengyao04 effectively await does that and probably that's also how await is implemented under the hood, the main benefit of async is that it gives the chance for other handlers or tasks to run while itself is on IO. alibi uses that mainly because it is calling a async predict in the sync explainer handler.

yuzisun · 2020-09-04T22:19:20Z

python/kfserving/kfserving/kfserver.py

+        except Exception as e:
+            ex_type, ex_value, ex_traceback = sys.exc_info()
+            raise tornado.web.HTTPError(
+                status_code=503,


this sounds more like 500 instead of 503

yuzisun · 2020-09-04T22:20:19Z

python/kfserving/kfserving/kfserver.py

+                status_code=404,
+                reason="Model with name %s does not exist." % name
+            )
+        self.write(f"succeed to unload model {name}")


suggest a json response with model name instead of string

yuzisun · 2020-09-04T22:20:42Z

python/kfserving/kfserving/kfserver.py

+                status_code=503,
+                reason=f"Model with name {name} is not ready."
+            )
+        self.write(f"succeed to load model {name}")


suggest a json response with model name instead of string

yuzisun · 2020-09-04T22:21:52Z

python/sklearnserver/setup.py

@@ -21,7 +21,7 @@
 ]
 setup(
    name='sklearnserver',
-    version='0.4.0',
+    version='0.4.1',


do not change this for now

yuzisun · 2020-09-04T22:33:55Z

python/xgbserver/xgbserver/__main__.py

+        logging.error(f"fail to load model {args.model_name} from dir {args.model_dir}. "
+                      f"exception type {ex_type}, exception msg: {ex_value}")
+        model.ready = False
+    # if fail to load model, start kfserver with an empty model list


I think here we might want to check if the model repository is empty and start with empty list

yuzisun · 2020-09-04T22:38:06Z

@wengyao04 Liked how you structure the model repo code, thanks for the contribution!

yuzisun · 2020-09-04T22:41:57Z

/ok-to-test

wengyao04 · 2020-09-04T23:39:09Z

Failed at sdk-test

tornado.httpclient.HTTPClientError: HTTP 503: Model with name model is not ready. Error type: <class 'AttributeError'> error msg: module 'asyncio' has no attribute 'get_running_loop'

But I can pass ./test/scripts/sdk-test.sh locally, checking ...

yuzliu · 2020-09-07T17:43:00Z

python/kfserving/kfserving/kfmodels/sklearn.py


-    def load(self):
+    def load_from_model_dir(self):
        model_path = kfserving.Storage.download(self.model_dir)


Do we still need to download model? I think the model agent will handle the download from bcs/gcs and the model server only need to load from local file system.

We still need to use storage initializer for the single mode, let's keep this logic for single model inference service.
The storage initializer download files
https://github.com/kubeflow/kfserving/blob/master/python/storage-initializer/scripts/initializer-entrypoint

The load_from_model_dir will not download again, it does some sanity check and create symlink

yuzliu · 2020-09-07T17:51:23Z

python/kfserving/kfserving/kfmodels/xgboost.py

-    def __init__(self, name: str, model_dir: str, nthread: int, booster: \
-        XGBModel = None):
+    def __init__(self, name: str, model_dir: str, nthread: int = DEFAULT_NTHREAD,
+                 booster: XGBModel = None):


Should we just remove the booster argument? We should always load the model not init a XGBoostModel using an existing booster.

rakelkar · 2020-09-09T01:10:20Z

pkg/agent/storage/provider.go

+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+*/


move this change to a separate PR?

(these header comment changes)

…fserver-model-load/unload

yuzisun · 2020-09-10T00:09:09Z

/retest

…elRepository

yuzisun

@wengyao04 I think we should leave the storage uri as model repository path in the control plane, the change assumes the single model case but it won't work for MMS, for example triton always expects a model repository path so we should keep the same behavior for sklearn and xgboost. When we start the sklearn/xgboost server, the --model-name is always passed in so you can derive the model path for single model case.

pkg/apis/serving/v1beta1/predictor_sklearn.go

pkg/apis/serving/v1alpha2/framework_triton.go

yuzisun · 2020-09-12T23:39:08Z

pkg/webhook/admission/pod/storage_initializer_injector.go

@@ -162,14 +162,19 @@ func (mi *StorageInitializerInjector) InjectStorageInitializer(pod *v1.Pod) erro
 		storageInitializerImage = mi.config.Image
 	}

+	modelLocalMountPath := constants.DefaultModelLocalMountPath
+	if modelName, ok := pod.ObjectMeta.Labels[constants.InferenceServicePodLabelKey]; ok {


modelName maybe not be always same as service name, custom container can pass modelName as an argument

yuzisun · 2020-09-12T23:41:01Z

pkg/webhook/admission/pod/storage_initializer_injector.go

 	securityContext := userContainer.SecurityContext.DeepCopy()
 	// Add an init container to run provisioning logic to the PodSpec
 	initContainer := &v1.Container{
 		Name:  StorageInitializerContainerName,
 		Image: storageInitializerImage,
 		Args: []string{
 			srcURI,
-			constants.DefaultModelLocalMountPath,
+			modelLocalMountPath,


this is likely a non backwards compatible change as some user might depend on the old model path

pkg/apis/serving/v1alpha2/framework_onnx.go

pkg/apis/serving/v1alpha2/explainer_alibi.go

yuzisun · 2020-09-15T15:49:39Z

python/kfserving/kfserving/kfserver.py

 class KFServer:
-    def __init__(self, http_port: int = args.http_port,
+    def __init__(self, registered_models: KFModelRepository = KFModelRepository(),


Can we move registered_models to the last on the argument list to ensure backwards compatibility?

yuzisun · 2020-09-16T21:23:28Z

/retest

yuzisun · 2020-09-16T22:45:03Z

@wengyao04 Thanks for the awesome contribution!

/lgtm
/approve

k8s-ci-robot · 2020-09-16T22:45:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [yuzisun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kfserver add load/unload endpoint

beb90d8

k8s-ci-robot requested review from ifilonenko and Tomcli September 4, 2020 20:37

k8s-ci-robot added the needs-ok-to-test label Sep 4, 2020

k8s-ci-robot added the size/L label Sep 4, 2020

k8s-ci-robot assigned ukclivecox Sep 4, 2020

yuzisun reviewed Sep 4, 2020

View reviewed changes

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Sep 4, 2020

yuzliu reviewed Sep 7, 2020

View reviewed changes

address comments

747cb09

k8s-ci-robot added size/XL and removed size/L labels Sep 9, 2020

rakelkar reviewed Sep 9, 2020

View reviewed changes

revert license

f000b0a

k8s-ci-robot added size/L and removed size/XL labels Sep 9, 2020

wengyao04 added 4 commits September 9, 2020 10:33

Merge branch 'master' of https://github.com/kubeflow/kfserving into k…

b43da59

…fserver-model-load/unload

fix python packages bugs

871975c

revert crd serving.kubeflow.org_inferenceservices

2096c86

add missing packages in setup.py

ba6de16

update package import and trigger build/test

d89996f

k8s-ci-robot added the lgtm label Sep 10, 2020

fix sklearn docker

9b779c1

k8s-ci-robot removed the lgtm label Sep 10, 2020

wengyao04 added 4 commits September 10, 2020 20:48

fix model package init

0d5897b

revert back sklearn/xgboost and add SKLearnModelRepository/XGBoostMod…

ff52445

…elRepository

remove unused file

47b988c

mount directory follows /mnt/models/

11f00a3

k8s-ci-robot added size/XL and removed size/L labels Sep 11, 2020

remove unused files

17f752e

k8s-ci-robot added size/L and removed size/XL labels Sep 11, 2020

wengyao04 added 2 commits September 11, 2020 21:14

create storage destination dir if not exist

a1d8976

model dir -> /mnt/models/

4a3ef7e

k8s-ci-robot added size/XL and removed size/L labels Sep 12, 2020

fix unit test

d262ef4

yuzisun reviewed Sep 12, 2020

View reviewed changes

revert back model repo

efc9ddc

k8s-ci-robot added size/L and removed size/XL labels Sep 13, 2020

pass model dir to KFModelRepository

b83eaab

yuzisun reviewed Sep 15, 2020

View reviewed changes

move registered_models to the last on the argument list

9c426a2

k8s-ci-robot added the lgtm label Sep 16, 2020

k8s-ci-robot added the approved label Sep 16, 2020

k8s-ci-robot merged commit d883c34 into kserve:master Sep 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kfserver add load/unload endpoint #1082

kfserver add load/unload endpoint #1082

wengyao04 commented Sep 4, 2020 •

edited

kubeflow-bot commented Sep 4, 2020

k8s-ci-robot commented Sep 4, 2020

yuzliu commented Sep 4, 2020

yuzisun Sep 4, 2020

yuzisun Sep 4, 2020

yuzliu Sep 7, 2020 •

edited

yuzliu Sep 7, 2020

yuzisun Sep 4, 2020

ifilonenko Sep 8, 2020

yuzisun Sep 4, 2020

wengyao04 Sep 4, 2020

yuzisun Sep 5, 2020 •

edited

yuzisun Sep 4, 2020

yuzisun Sep 4, 2020

yuzisun Sep 4, 2020

yuzisun Sep 4, 2020

yuzisun Sep 4, 2020

yuzisun commented Sep 4, 2020

yuzisun commented Sep 4, 2020

wengyao04 commented Sep 4, 2020

yuzliu Sep 7, 2020

wengyao04 Sep 9, 2020

yuzliu Sep 7, 2020

rakelkar Sep 9, 2020

rakelkar Sep 9, 2020

yuzisun commented Sep 10, 2020

yuzisun left a comment

yuzisun Sep 12, 2020

yuzisun Sep 12, 2020

yuzisun Sep 15, 2020 •

edited

yuzisun commented Sep 16, 2020

yuzisun commented Sep 16, 2020

k8s-ci-robot commented Sep 16, 2020

kfserver add load/unload endpoint #1082

kfserver add load/unload endpoint #1082

Conversation

wengyao04 commented Sep 4, 2020 • edited

kubeflow-bot commented Sep 4, 2020

k8s-ci-robot commented Sep 4, 2020

yuzliu commented Sep 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzliu Sep 7, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzisun Sep 5, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzisun commented Sep 4, 2020

yuzisun commented Sep 4, 2020

wengyao04 commented Sep 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzisun commented Sep 10, 2020

yuzisun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzisun Sep 15, 2020 • edited

Choose a reason for hiding this comment

yuzisun commented Sep 16, 2020

yuzisun commented Sep 16, 2020

k8s-ci-robot commented Sep 16, 2020

wengyao04 commented Sep 4, 2020 •

edited

yuzliu Sep 7, 2020 •

edited

yuzisun Sep 5, 2020 •

edited

yuzisun Sep 15, 2020 •

edited