The recommended way to setup environemnt is to create Python virtual environment.
virtualenv -p python3.10 .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
First, download the model from torchhub
and save it as TorchScript and ONNX.
python3 utils/export.py
Next, activate docker conatiner and convert ONNX model to TensorRT.
docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt:22.04-py3
bash utils/convert_to_tensorrt.sh
exit
Move exported models into model-repository
.
bash utils/move_models.sh
Activate docker container with Triton server with mounted model_repository
directory.
docker run -it --gpus all -v /var/run/docker.sock:/var/run/docker.sock -v $(pwd)/model_repository:/models --shm-size 1024m --net=host nvcr.io/nvidia/tritonserver:22.04-py3
Next, install dependencies and run Triton:
pip install pillow torch transformers
tritonserver --model-repository=/models
With the Triton running in another container run the command below to enter
appropriate container and run perf_analyzer
:
docker run -it --gpus all -v /var/run/docker.sock:/var/run/docker.sock -v $(pwd):/workspace --net=host nvcr.io/nvidia/tritonserver:22.04-py3-sdk
perf_analyzer -m model_torchscript -b 1 --concurrency-range 1:4
To use the model-analyzer
shutdown previously started Triton server.
Enter a docker container and run an analysis. All parameters for the
model-analyzer
are described here:
https://github.com/triton-inference-server/model_analyzer/blob/main/docs/config.md.
docker run -it --gpus all -v /var/run/docker.sock:/var/run/docker.sock -v $(pwd):$(pwd) --shm-size 1024m --net=host nvcr.io/nvidia/tritonserver:22.04-py3-sdk
cd /home/... # go to the same path as You were in your host filesystem
model-analyzer profile --model-repository $(pwd)/model_repository --profile-models model_onnx --triton-launch-mode=docker --output-model-repository-path $(pwd)/output/ -f perf.yaml --override-output-model-repository
To generate a report follow instructions from model-analyzer
.
To reproduce a benchmark reported in a blogpost run run_benchmark.sh
script in a utils
directory. Use the same configuration as in the
perf_analyzer
section.