From 616de44c9b3564d0d75ef6d7a9eaf1002a620fb3 Mon Sep 17 00:00:00 2001
From: Matthias Reso <13337103+mreso@users.noreply.github.com>
Date: Tue, 28 Mar 2023 13:22:40 -0700
Subject: [PATCH 1/2] Add NVIDIA MPS documentation to doc index

---
 docs/index.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/index.md b/docs/index.md
index 824d7ab259b..523e672b38a 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -49,3 +49,4 @@ TorchServe is a performant, flexible and easy to use tool for serving PyTorch ea
 * [TorchServe on Kubernetes](https://github.com/pytorch/serve/blob/master/kubernetes/README.md#torchserve-on-kubernetes) -  Demonstrates a Torchserve deployment in Kubernetes using Helm Chart supported in both Azure Kubernetes Service and Google Kubernetes service
 * [mlflow-torchserve](https://github.com/mlflow/mlflow-torchserve) - Deploy mlflow pipeline models into TorchServe
 * [Kubeflow pipelines](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/pytorch-samples) - Kubeflow pipelines and Google Vertex AI Managed pipelines
+* [NVIDIA MPS](mps.md) - Use NVIDIA MPS to optimize multi-worker deployment on a single GPU

From be58cd50cddefe49787c9c6b0269b373b90aa863 Mon Sep 17 00:00:00 2001
From: Matthias Reso <13337103+mreso@users.noreply.github.com>
Date: Tue, 28 Mar 2023 13:43:22 -0700
Subject: [PATCH 2/2] Add mps doc to content + change title

---
 docs/contents.rst | 1 +
 docs/mps.md       | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/contents.rst b/docs/contents.rst
index a6f23d8d459..5d33041f2f7 100644
--- a/docs/contents.rst
+++ b/docs/contents.rst
@@ -16,6 +16,7 @@
   model_zoo
   request_envelopes
   server
+  mps
   snapshot
   sphinx/requirements
   torchserve_on_win_native
diff --git a/docs/mps.md b/docs/mps.md
index 70cd1f93d21..4b10048435b 100644
--- a/docs/mps.md
+++ b/docs/mps.md
@@ -1,4 +1,4 @@
-# Enabling NVIDIA MPS in TorchServe
+# Running TorchServe with NVIDIA MPS
 In order to deploy ML models, TorchServe spins up each worker in a separate processes, thus isolating each worker from the others.
 Each process creates its own CUDA context to execute its kernels and access the allocated memory.