-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support downloading models from S3/Blob or mounts for TensorRT #137
Comments
I've been thinking about the model storage problem. Re-downloading from S3/GCS is somewhat expensive. It would be kind of nice to shared a shared model cache that lives somewhere, maybe as a shared mountable volume. We could then have some process for loading the model to some local volume and then pointing for all model servers to read from there. We could encapsulate TensorRT details onto that volume. Thoughts? |
Looks like there is complication of syncing from S3/GCS model storage, TensorRT does polling and then adds/removes the model versions(https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_repository.html#modifying-the-model-repository). |
Well if we were to copy, then that feature would be disabled as the server
would only poll the local unchanging copy... for production this seems like
the right thing to do. For interactive this is probably okay.
In general we need to think about consistentency of experience across
different model types vs exposing the native behaviors of servers.
Thoughts?
…On Sun, Jun 2, 2019 at 9:05 PM Dan Sun ***@***.***> wrote:
Looks like there is complication of syncing from S3/GCS model storage,
TensorRT does polling and then adds/removes the model versions(
https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_repository.html#modifying-the-model-repository
).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#137>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA2CTBICW3NUEZEQD46ENGTPYSKAVANCNFSM4HSB7P4A>
.
--
Rakesh Kelkar
|
Definitely agreed, we can disable this at this stage, however there is real prod use case for continuous training and active learning where this feature can be useful and manual canary is not an option, the rollout risk is managed somewhere else in the train-serve loop. |
I was thinking of downloading in an init container and exposing a mount to the tensorRT container, however as noted in #129 we need to figure out how to enhance or workaround knative here... |
Why do you need to expose a mount. If you download in the init container, the pod shares a disk, right? |
They may share the disk on the node if that is what you mean, but each container has its own filesystem so you would still need mount support |
closed via #148 |
* fix: Fix isvc inference fvt failure Occassionally during FVT runs, InferenceService inference requests would fail if run after the TLS predictor inference tests because the port-forward was not reset. This commit disconnects in the preparation steps for both Predictors and Isvc tests to ensure that a new connection is established. Signed-off-by: Paul Van Eck <pvaneck@us.ibm.com> * update comment Signed-off-by: Paul Van Eck <pvaneck@us.ibm.com>
/kind feature
Describe the solution you'd like
TensorRT spec can only be used with models in GCS. Would be nice to allow models in S3 or azure blobs.. or models that can be mounted?
One possibility is to create an INIT container to download and expose the models to the server as a mount. This would allow us to easily add support for a range of sources in a way that would work for all servers...?
KNative supports PodSpec so this is possible, but will require us to modify the frameworkhandler interface method CreateModelServingContainer to become CreateModelServingPod. User interface (e.g. CustomSpec) will remain unchanged (ie this doesn't mean we allow users to give us pods).
Anything else you would like to add:
Related issue opened on TensorRTIS triton-inference-server/server#324
The text was updated successfully, but these errors were encountered: