Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support downloading models from S3/Blob or mounts for TensorRT #137

Closed
rakelkar opened this issue Jun 2, 2019 · 9 comments
Closed

Support downloading models from S3/Blob or mounts for TensorRT #137

rakelkar opened this issue Jun 2, 2019 · 9 comments

Comments

@rakelkar
Copy link
Contributor

rakelkar commented Jun 2, 2019

/kind feature

Describe the solution you'd like
TensorRT spec can only be used with models in GCS. Would be nice to allow models in S3 or azure blobs.. or models that can be mounted?

One possibility is to create an INIT container to download and expose the models to the server as a mount. This would allow us to easily add support for a range of sources in a way that would work for all servers...?

KNative supports PodSpec so this is possible, but will require us to modify the frameworkhandler interface method CreateModelServingContainer to become CreateModelServingPod. User interface (e.g. CustomSpec) will remain unchanged (ie this doesn't mean we allow users to give us pods).

Anything else you would like to add:
Related issue opened on TensorRTIS triton-inference-server/server#324

@rakelkar
Copy link
Contributor Author

rakelkar commented Jun 2, 2019

#117

@ellistarn
Copy link
Contributor

I've been thinking about the model storage problem. Re-downloading from S3/GCS is somewhat expensive. It would be kind of nice to shared a shared model cache that lives somewhere, maybe as a shared mountable volume.

We could then have some process for loading the model to some local volume and then pointing for all model servers to read from there. We could encapsulate TensorRT details onto that volume. Thoughts?

@yuzisun
Copy link
Member

yuzisun commented Jun 3, 2019

Looks like there is complication of syncing from S3/GCS model storage, TensorRT does polling and then adds/removes the model versions(https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/model_repository.html#modifying-the-model-repository).

@rakelkar
Copy link
Contributor Author

rakelkar commented Jun 3, 2019 via email

@yuzisun
Copy link
Member

yuzisun commented Jun 3, 2019

Definitely agreed, we can disable this at this stage, however there is real prod use case for continuous training and active learning where this feature can be useful and manual canary is not an option, the rollout risk is managed somewhere else in the train-serve loop.

@rakelkar
Copy link
Contributor Author

rakelkar commented Jun 5, 2019

I was thinking of downloading in an init container and exposing a mount to the tensorRT container, however as noted in #129 we need to figure out how to enhance or workaround knative here...

@ellistarn
Copy link
Contributor

Why do you need to expose a mount. If you download in the init container, the pod shares a disk, right?

@rakelkar
Copy link
Contributor Author

pod shares a disk, right

They may share the disk on the node if that is what you mean, but each container has its own filesystem so you would still need mount support

@rakelkar
Copy link
Contributor Author

rakelkar commented Jul 2, 2019

closed via #148

@rakelkar rakelkar closed this as completed Jul 2, 2019
rafvasq pushed a commit to rafvasq/kserve that referenced this issue Jul 21, 2023
* fix: Fix isvc inference fvt failure

Occassionally during FVT runs, InferenceService inference requests would fail
if run after the TLS predictor inference tests because the port-forward
was not reset. This commit disconnects in the preparation steps for both
Predictors and Isvc tests to ensure that a new connection is
established.

Signed-off-by: Paul Van Eck <pvaneck@us.ibm.com>

* update comment

Signed-off-by: Paul Van Eck <pvaneck@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants