diff --git a/docs/contents.rst b/docs/contents.rst index a6f23d8d459..5d33041f2f7 100644 --- a/docs/contents.rst +++ b/docs/contents.rst @@ -16,6 +16,7 @@ model_zoo request_envelopes server + mps snapshot sphinx/requirements torchserve_on_win_native diff --git a/docs/index.md b/docs/index.md index 824d7ab259b..523e672b38a 100644 --- a/docs/index.md +++ b/docs/index.md @@ -49,3 +49,4 @@ TorchServe is a performant, flexible and easy to use tool for serving PyTorch ea * [TorchServe on Kubernetes](https://github.com/pytorch/serve/blob/master/kubernetes/README.md#torchserve-on-kubernetes) - Demonstrates a Torchserve deployment in Kubernetes using Helm Chart supported in both Azure Kubernetes Service and Google Kubernetes service * [mlflow-torchserve](https://github.com/mlflow/mlflow-torchserve) - Deploy mlflow pipeline models into TorchServe * [Kubeflow pipelines](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/pytorch-samples) - Kubeflow pipelines and Google Vertex AI Managed pipelines +* [NVIDIA MPS](mps.md) - Use NVIDIA MPS to optimize multi-worker deployment on a single GPU diff --git a/docs/mps.md b/docs/mps.md index 70cd1f93d21..4b10048435b 100644 --- a/docs/mps.md +++ b/docs/mps.md @@ -1,4 +1,4 @@ -# Enabling NVIDIA MPS in TorchServe +# Running TorchServe with NVIDIA MPS In order to deploy ML models, TorchServe spins up each worker in a separate processes, thus isolating each worker from the others. Each process creates its own CUDA context to execute its kernels and access the allocated memory.