Update instructions to build with nvidia cuda runtime image for ONNX (#…

…2435) * Update instructions to build with nvidia cuda runtime image for docker * updated deepspeed documentation * updated deepspeed documentation * updated deepspeed documentation * added example command * Lint failure * changed variable name * Exit if -bi and -g are specified --------- Co-authored-by: Mark Saroufim <marksaroufim@fb.com>
pytorch · Jul 29, 2023 · 35ef00f · 35ef00f
1 parent e2cd91b
commit 35ef00f
Show file tree

Hide file tree

Showing 4 changed files with 31 additions and 2 deletions.
diff --git a/docker/README.md b/docker/README.md
@@ -34,6 +34,7 @@ Use `build_image.sh` script to build the docker images. The script builds the `p
 |-h, --help|Show script help|
 |-b, --branch_name|Specify a branch name to use. Default: master |
 |-g, --gpu|Build image with GPU based ubuntu base image|
+|-bi, --baseimage specify base docker image. Example: nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04|
 |-bt, --buildtype|Which type of docker image to build. Can be one of : production, dev, ci, codebuild|
 |-t, --tag|Tag name for image. If not specified, script uses torchserve default tag names.|
 |-cv, --cudaversion| Specify to cuda version to use. Supported values `cu92`, `cu101`, `cu102`, `cu111`, `cu113`, `cu116`, `cu117`, `cu118`. Default `cu117`|
@@ -55,8 +56,10 @@ Creates a docker image with publicly available `torchserve` and `torch-model-arc
 
  - To create a GPU based image with cuda 10.2. Options are `cu92`, `cu101`, `cu102`, `cu111`, `cu113`, `cu116`, `cu117`, `cu118`
 
+    - GPU images are built with NVIDIA CUDA base image. If you want to use ONNX, please specify the base image as shown in the next section.
+
   ```bash
-  ./build_image.sh -g -cv cu102
+  ./build_image.sh -g -cv cu117
   ```
 
  - To create an image with a custom tag
@@ -65,6 +68,15 @@ Creates a docker image with publicly available `torchserve` and `torch-model-arc
 ./build_image.sh -t torchserve:1.0
 ```
 
+**NVIDIA CUDA RUNTIME BASE IMAGE**
+
+To make use of ONNX, we need to use [NVIDIA CUDA runtime](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA) as the base image.
+This will increase the size of your Docker Image
+
+```bash
+  ./build_image.sh -bi nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 -g -cv cu117
+  ```
+
 **DEVELOPER ENVIRONMENT IMAGES**
 
 Creates a docker image with `torchserve` and `torch-model-archiver` installed from source.

diff --git a/docker/build_image.sh b/docker/build_image.sh
@@ -7,6 +7,7 @@ BRANCH_NAME="master"
 DOCKER_TAG="pytorch/torchserve:latest-cpu"
 BUILD_TYPE="production"
 BASE_IMAGE="ubuntu:20.04"
+UPDATE_BASE_IMAGE=false
 USE_CUSTOM_TAG=false
 CUDA_VERSION=""
 USE_LOCAL_SERVE_FOLDER=false
@@ -22,6 +23,7 @@ do
           echo "-h, --help  show brief help"
           echo "-b, --branch_name=BRANCH_NAME specify a branch_name to use"
           echo "-g, --gpu specify to use gpu"
+          echo "-bi, --baseimage specify base docker image. Example: nvidia/cuda:11.7.0-cudnn8-runtime-ubuntu20.04 "
           echo "-bt, --buildtype specify to created image for codebuild. Possible values: production, dev, codebuild."
           echo "-cv, --cudaversion specify to cuda version to use"
           echo "-t, --tag specify tag name for docker image"
@@ -49,6 +51,12 @@ do
           CUDA_VERSION="cu117"
           shift
           ;;
+        -bi|--baseimage)
+          BASE_IMAGE="$2"
+          UPDATE_BASE_IMAGE=true
+          shift
+          shift
+          ;;
         -bt|--buildtype)
           BUILD_TYPE="$2"
           shift
@@ -141,6 +149,12 @@ then
   DOCKER_TAG=${CUSTOM_TAG}
 fi
 
+if [[ $UPDATE_BASE_IMAGE == true && $MACHINE == "gpu" ]];
+then
+  echo "Incompatible options: -bi doesn't work with -g option"
+  exit 1
+fi
+
 if [ "${BUILD_TYPE}" == "production" ]
 then
   DOCKER_BUILDKIT=1 docker build --file Dockerfile --build-arg BASE_IMAGE="${BASE_IMAGE}" --build-arg CUDA_VERSION="${CUDA_VERSION}"  --build-arg PYTHON_VERSION="${PYTHON_VERSION}" --build-arg BUILD_NIGHTLY="${BUILD_NIGHTLY}" -t "${DOCKER_TAG}" --target production-image  .

diff --git a/docs/performance_guide.md b/docs/performance_guide.md
@@ -16,6 +16,8 @@ At a high level what TorchServe allows you to do is
 2. Load those weights from `base_handler.py` using `ort_session = ort.InferenceSession(self.model_pt_path, providers=providers, sess_options=sess_options)` which supports reasonable defaults for both CPU and GPU inference
 3. Allow you define custom pre and post processing functions to pass in data in the format your onnx model expects with a custom handler
 
+To use ONNX with GPU on TorchServe Docker, we need to build an image with [NVIDIA CUDA runtime](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA) as the base image as shown [here](https://github.com/pytorch/serve/blob/master/docker/README.md#create-torchserve-docker-image)
+
  <h4>TensorRT<h4>
 
 TorchServe also supports models optimized via TensorRT. To leverage the TensorRT runtime you can convert your model by [following these instructions](https://github.com/pytorch/TensorRT) and once you're done you'll have serialized weights which you can load with [`torch.jit.load()`](https://pytorch.org/TensorRT/getting_started/getting_started_with_python_api.html#getting-started-with-python-api).
@@ -77,7 +79,7 @@ You can find more information on TorchServe benchmarking [here](https://github.c
 
 TorchServe has native support for the PyTorch profiler which will help you find performance bottlenecks in your code.
 
-If you created a custom `handle` or `initialize` method overwriting the BaseHandler, you must define the `self.manifest` attribute to be able to run `_infer_with_profiler`.  
+If you created a custom `handle` or `initialize` method overwriting the BaseHandler, you must define the `self.manifest` attribute to be able to run `_infer_with_profiler`.
 
 ```
 export ENABLE_TORCH_PROFILER=TRUE

diff --git a/ts_scripts/spellcheck_conf/wordlist.txt b/ts_scripts/spellcheck_conf/wordlist.txt
@@ -1065,5 +1065,6 @@ ActionSLAM
 statins
 ci
 chatGPT
+baseimage
 cuDNN
 Xformer