Support downloading models from S3/Blob or mounts for TensorRT #137

rakelkar · 2019-06-02T06:29:17Z

/kind feature

Describe the solution you'd like
TensorRT spec can only be used with models in GCS. Would be nice to allow models in S3 or azure blobs.. or models that can be mounted?

One possibility is to create an INIT container to download and expose the models to the server as a mount. This would allow us to easily add support for a range of sources in a way that would work for all servers...?

KNative supports PodSpec so this is possible, but will require us to modify the frameworkhandler interface method CreateModelServingContainer to become CreateModelServingPod. User interface (e.g. CustomSpec) will remain unchanged (ie this doesn't mean we allow users to give us pods).

Anything else you would like to add:
Related issue opened on TensorRTIS triton-inference-server/server#324

rakelkar · 2019-06-02T06:29:29Z

#117

ellistarn · 2019-06-03T00:12:09Z

I've been thinking about the model storage problem. Re-downloading from S3/GCS is somewhat expensive. It would be kind of nice to shared a shared model cache that lives somewhere, maybe as a shared mountable volume.

We could then have some process for loading the model to some local volume and then pointing for all model servers to read from there. We could encapsulate TensorRT details onto that volume. Thoughts?

yuzisun · 2019-06-03T04:05:29Z

Looks like there is complication of syncing from S3/GCS model storage, TensorRT does polling and then adds/removes the model versions(https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_repository.html#modifying-the-model-repository).

rakelkar · 2019-06-03T04:39:17Z

Well if we were to copy, then that feature would be disabled as the server would only poll the local unchanging copy... for production this seems like the right thing to do. For interactive this is probably okay. In general we need to think about consistentency of experience across different model types vs exposing the native behaviors of servers. Thoughts?

…

On Sun, Jun 2, 2019 at 9:05 PM Dan Sun ***@***.***> wrote: Looks like there is complication of syncing from S3/GCS model storage, TensorRT does polling and then adds/removes the model versions( https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_repository.html#modifying-the-model-repository ). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#137>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA2CTBICW3NUEZEQD46ENGTPYSKAVANCNFSM4HSB7P4A> .

-- Rakesh Kelkar

yuzisun · 2019-06-03T13:05:34Z

Definitely agreed, we can disable this at this stage, however there is real prod use case for continuous training and active learning where this feature can be useful and manual canary is not an option, the rollout risk is managed somewhere else in the train-serve loop.

rakelkar · 2019-06-05T15:57:48Z

I was thinking of downloading in an init container and exposing a mount to the tensorRT container, however as noted in #129 we need to figure out how to enhance or workaround knative here...

ellistarn · 2019-06-10T17:13:53Z

Why do you need to expose a mount. If you download in the init container, the pod shares a disk, right?

rakelkar · 2019-06-10T20:28:11Z

pod shares a disk, right

They may share the disk on the node if that is what you mean, but each container has its own filesystem so you would still need mount support

rakelkar · 2019-07-02T06:30:59Z

closed via #148

* fix: Fix isvc inference fvt failure Occassionally during FVT runs, InferenceService inference requests would fail if run after the TLS predictor inference tests because the port-forward was not reset. This commit disconnects in the preparation steps for both Predictors and Isvc tests to ensure that a new connection is established. Signed-off-by: Paul Van Eck <pvaneck@us.ibm.com> * update comment Signed-off-by: Paul Van Eck <pvaneck@us.ibm.com>

rakelkar mentioned this issue Jun 2, 2019

add initial support for tensorrt #134

Merged

rakelkar mentioned this issue Jun 9, 2019

KFServing should have a consistent way of supporting model download across inference server implementations #148

Closed

rakelkar closed this as completed Jul 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support downloading models from S3/Blob or mounts for TensorRT #137

Support downloading models from S3/Blob or mounts for TensorRT #137

rakelkar commented Jun 2, 2019

rakelkar commented Jun 2, 2019

ellistarn commented Jun 3, 2019

yuzisun commented Jun 3, 2019

rakelkar commented Jun 3, 2019 via email

yuzisun commented Jun 3, 2019

rakelkar commented Jun 5, 2019

ellistarn commented Jun 10, 2019

rakelkar commented Jun 10, 2019

rakelkar commented Jul 2, 2019

Support downloading models from S3/Blob or mounts for TensorRT #137

Support downloading models from S3/Blob or mounts for TensorRT #137

Comments

rakelkar commented Jun 2, 2019

rakelkar commented Jun 2, 2019

ellistarn commented Jun 3, 2019

yuzisun commented Jun 3, 2019

rakelkar commented Jun 3, 2019 via email

yuzisun commented Jun 3, 2019

rakelkar commented Jun 5, 2019

ellistarn commented Jun 10, 2019

rakelkar commented Jun 10, 2019

rakelkar commented Jul 2, 2019