pytorch · msaroufim · Jul 29, 2023 · Jun 28, 2023 · Jun 28, 2023 · Jun 28, 2023
diff --git a/docker/README.md b/docker/README.md
@@ -34,6 +34,7 @@ Use `build_image.sh` script to build the docker images. The script builds the `p
 |-h, --help|Show script help|
 |-b, --branch_name|Specify a branch name to use. Default: master |
 |-g, --gpu|Build image with GPU based ubuntu base image|
+|-bi, --baseimage specify base docker image. Example: nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04|
 |-bt, --buildtype|Which type of docker image to build. Can be one of : production, dev, codebuild|
 |-t, --tag|Tag name for image. If not specified, script uses torchserve default tag names.|
 |-cv, --cudaversion| Specify to cuda version to use. Supported values `cu92`, `cu101`, `cu102`, `cu111`, `cu113`, `cu116`, `cu117`, `cu118`. Default `cu117`|
@@ -52,10 +53,12 @@ Creates a docker image with publicly available `torchserve` and `torch-model-arc
 ./build_image.sh
 ```
 
- - To create a GPU based image with cuda 10.2. Options are `cu92`, `cu101`, `cu102`, `cu111`, `cu113`, `cu116`, `cu117`
+ - To create a GPU based image with cuda 10.2. Options are `cu92`, `cu101`, `cu102`, `cu111`, `cu113`, `cu116`, `cu117`, `cu118`
+
+    - GPU images are built with NVIDIA CUDA base image. If you want to use ONNX, please specify the base image as shown in the next section.
 
   ```bash
-  ./build_image.sh -g -cv cu102
+  ./build_image.sh -g -cv cu117
   ```
 
  - To create an image with a custom tag
@@ -64,6 +67,15 @@ Creates a docker image with publicly available `torchserve` and `torch-model-arc
 ./build_image.sh -t torchserve:1.0
 ```
 
+**NVIDIA CUDA RUNTIME BASE IMAGE**
+
+To make use of ONNX, we need to use [NVIDIA CUDA runtime](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA) as the base image.
+This will increase the size of your Docker Image
+
+```bash
+  ./build_image.sh -bi nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 -g -cv cu117
+  ```
+
 **DEVELOPER ENVIRONMENT IMAGES**
 
 Creates a docker image with `torchserve` and `torch-model-archiver` installed from source.

diff --git a/docker/build_image.sh b/docker/build_image.sh
@@ -7,6 +7,8 @@ BRANCH_NAME="master"
 DOCKER_TAG="pytorch/torchserve:latest-cpu"
 BUILD_TYPE="production"
 BASE_IMAGE="ubuntu:20.04"
+USER_BASE_IMAGE="ubuntu:20.04"
+UPDATE_BASE_IMAGE=false
 USE_CUSTOM_TAG=false
 CUDA_VERSION=""
 USE_LOCAL_SERVE_FOLDER=false
@@ -21,6 +23,7 @@ do
           echo "-h, --help  show brief help"
           echo "-b, --branch_name=BRANCH_NAME specify a branch_name to use"
           echo "-g, --gpu specify to use gpu"
+          echo "-bi, --baseimage specify base docker image. Example: nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 "
           echo "-bt, --buildtype specify to created image for codebuild. Possible values: production, dev, codebuild."
           echo "-cv, --cudaversion specify to cuda version to use"
           echo "-t, --tag specify tag name for docker image"
@@ -47,6 +50,12 @@ do
           CUDA_VERSION="cu117"
           shift
           ;;
+        -bi|--baseimage)
+          USER_BASE_IMAGE="$2"
+          UPDATE_BASE_IMAGE=true
+          shift
+          shift
+          ;;
         -bt|--buildtype)
           BUILD_TYPE="$2"
           shift
@@ -135,6 +144,11 @@ then
   DOCKER_TAG=${CUSTOM_TAG}
 fi
 
+if [ "$UPDATE_BASE_IMAGE" = true ]
+then
+  BASE_IMAGE=${USER_BASE_IMAGE}
+fi
+
 if [ "${BUILD_TYPE}" == "production" ]
 then
   DOCKER_BUILDKIT=1 docker build --file Dockerfile --build-arg BASE_IMAGE="${BASE_IMAGE}" --build-arg CUDA_VERSION="${CUDA_VERSION}"  --build-arg PYTHON_VERSION="${PYTHON_VERSION}" -t "${DOCKER_TAG}" .

diff --git a/docs/performance_guide.md b/docs/performance_guide.md
@@ -16,6 +16,8 @@ At a high level what TorchServe allows you to do is
 2. Load those weights from `base_handler.py` using `ort_session = ort.InferenceSession(self.model_pt_path, providers=providers, sess_options=sess_options)` which supports reasonable defaults for both CPU and GPU inference
 3. Allow you define custom pre and post processing functions to pass in data in the format your onnx model expects with a custom handler
 
+To use ONNX with GPU on TorchServe Docker, we need to build an image with [NVIDIA CUDA runtime](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA) as the base image as shown [here](https://github.com/pytorch/serve/blob/master/docker/README.md#create-torchserve-docker-image)
+
  <h4>TensorRT<h4>
 
 TorchServe also supports models optimized via TensorRT. To leverage the TensorRT runtime you can convert your model by [following these instructions](https://github.com/pytorch/TensorRT) and once you're done you'll have serialized weights which you can load with [`torch.jit.load()`](https://pytorch.org/TensorRT/getting_started/getting_started_with_python_api.html#getting-started-with-python-api).

diff --git a/examples/large_models/deepspeed/Readme.md b/examples/large_models/deepspeed/Readme.md
@@ -44,3 +44,7 @@ torchserve --start --ncs --model-store model_store --models opt.tar.gz
 ```bash
 curl  "http://localhost:8080/predictions/opt" -T sample_text.txt
 ```
+
+### Running using TorchServe Docker Image
+
+To use DeepSpeed with GPU on TorchServe Docker, we need to build an image with [NVIDIA CUDA dev ](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA) as the base image as shown [here](https://github.com/pytorch/serve/blob/master/docker/README.md#create-torchserve-docker-image)