From 616de44c9b3564d0d75ef6d7a9eaf1002a620fb3 Mon Sep 17 00:00:00 2001 From: Matthias Reso <13337103+mreso@users.noreply.github.com> Date: Tue, 28 Mar 2023 13:22:40 -0700 Subject: [PATCH 1/2] Add NVIDIA MPS documentation to doc index --- docs/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/index.md b/docs/index.md index 824d7ab259b..523e672b38a 100644 --- a/docs/index.md +++ b/docs/index.md @@ -49,3 +49,4 @@ TorchServe is a performant, flexible and easy to use tool for serving PyTorch ea * [TorchServe on Kubernetes](https://github.com/pytorch/serve/blob/master/kubernetes/README.md#torchserve-on-kubernetes) - Demonstrates a Torchserve deployment in Kubernetes using Helm Chart supported in both Azure Kubernetes Service and Google Kubernetes service * [mlflow-torchserve](https://github.com/mlflow/mlflow-torchserve) - Deploy mlflow pipeline models into TorchServe * [Kubeflow pipelines](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/pytorch-samples) - Kubeflow pipelines and Google Vertex AI Managed pipelines +* [NVIDIA MPS](mps.md) - Use NVIDIA MPS to optimize multi-worker deployment on a single GPU From be58cd50cddefe49787c9c6b0269b373b90aa863 Mon Sep 17 00:00:00 2001 From: Matthias Reso <13337103+mreso@users.noreply.github.com> Date: Tue, 28 Mar 2023 13:43:22 -0700 Subject: [PATCH 2/2] Add mps doc to content + change title --- docs/contents.rst | 1 + docs/mps.md | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/contents.rst b/docs/contents.rst index a6f23d8d459..5d33041f2f7 100644 --- a/docs/contents.rst +++ b/docs/contents.rst @@ -16,6 +16,7 @@ model_zoo request_envelopes server + mps snapshot sphinx/requirements torchserve_on_win_native diff --git a/docs/mps.md b/docs/mps.md index 70cd1f93d21..4b10048435b 100644 --- a/docs/mps.md +++ b/docs/mps.md @@ -1,4 +1,4 @@ -# Enabling NVIDIA MPS in TorchServe +# Running TorchServe with NVIDIA MPS In order to deploy ML models, TorchServe spins up each worker in a separate processes, thus isolating each worker from the others. Each process creates its own CUDA context to execute its kernels and access the allocated memory.