-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
end2end.engine to Triton #465
Comments
@lvhan028 Thanks for your help, really appreciate it! (As for #460, I don't think this is related since that issue only occurs when I attempt to use a dynamic size mmdeploy config, whereas in this example I use a fixed size, which successfully produces an engine file. However you could reproduce that issue using all of the below steps except for using Here's all of the steps to reproduce what I've done (and the Triton failure log at the bottom of this post): 1. MMDeploy docker buildThis is nearly the same as the base MMDeploy GPU image except that I've updated the versions of tensorrt, torch, onnx, mmcv, pplcv and added mmdetection install at the end.
FROM nvcr.io/nvidia/tensorrt:22.04-py3
ARG CUDA=11.3
ARG PYTHON_VERSION=3.8
ARG TORCH_VERSION=1.11.0
ARG TORCHVISION_VERSION=0.12.0
ARG ONNXRUNTIME_VERSION=1.11.1
ARG MMCV_VERSION=1.5.0
ARG PPLCV_VERSION=0.6.3
ENV FORCE_CUDA="1"
ENV DEBIAN_FRONTEND=noninteractive
### change the system source for installing libs
ARG USE_SRC_INSIDE=false
RUN if [ ${USE_SRC_INSIDE} == true ] ; \
then \
sed -i s/archive.ubuntu.com/mirrors.aliyun.com/g /etc/apt/sources.list ; \
sed -i s/security.ubuntu.com/mirrors.aliyun.com/g /etc/apt/sources.list ; \
echo "Use aliyun source for installing libs" ; \
else \
echo "Keep the download source unchanged" ; \
fi
### update apt and install libs
RUN apt-get update &&\
apt-get install -y vim libsm6 libxext6 libxrender-dev libgl1-mesa-glx git wget libssl-dev libopencv-dev libspdlog-dev --no-install-recommends &&\
rm -rf /var/lib/apt/lists/*
RUN curl -fsSL -v -o ~/miniconda.sh -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda install -y python=${PYTHON_VERSION} conda-build pyyaml numpy ipython cython typing typing_extensions mkl mkl-include ninja && \
/opt/conda/bin/conda clean -ya
### pytorch
RUN /opt/conda/bin/conda install pytorch==${TORCH_VERSION} torchvision==${TORCHVISION_VERSION} cudatoolkit=${CUDA} -c pytorch
ENV PATH /opt/conda/bin:$PATH
### install mmcv-full
RUN /opt/conda/bin/pip install mmcv-full==${MMCV_VERSION} -f https://download.openmmlab.com/mmcv/dist/cu${CUDA//./}/torch${TORCH_VERSION}/index.html
WORKDIR /root/workspace
### get onnxruntime
RUN wget https://github.com/microsoft/onnxruntime/releases/download/v${ONNXRUNTIME_VERSION}/onnxruntime-linux-x64-${ONNXRUNTIME_VERSION}.tgz \
&& tar -zxvf onnxruntime-linux-x64-${ONNXRUNTIME_VERSION}.tgz &&\
pip install onnxruntime-gpu==${ONNXRUNTIME_VERSION}
### cp trt from pip to conda
RUN cp -r /usr/local/lib/python${PYTHON_VERSION}/dist-packages/tensorrt* /opt/conda/lib/python${PYTHON_VERSION}/site-packages/
### install mmdeploy
ENV ONNXRUNTIME_DIR=/root/workspace/onnxruntime-linux-x64-${ONNXRUNTIME_VERSION}
ENV TENSORRT_DIR=/workspace/tensorrt
ARG VERSION
RUN git clone https://github.com/open-mmlab/mmdeploy &&\
cd mmdeploy &&\
if [ -z ${VERSION} ] ; then echo "No MMDeploy version passed in, building on master" ; else git checkout tags/v${VERSION} -b tag_v${VERSION} ; fi &&\
git submodule update --init --recursive &&\
mkdir -p build &&\
cd build &&\
cmake -DMMDEPLOY_TARGET_BACKENDS="ort;trt" .. &&\
make -j$(nproc) &&\
cd .. &&\
pip install -e .
### build sdk
RUN git clone https://github.com/openppl-public/ppl.cv.git &&\
cd ppl.cv &&\
git checkout tags/v${PPLCV_VERSION} -b v${PPLCV_VERSION} &&\
./build.sh cuda
ENV BACKUP_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/cuda-11.6/compat/lib.real/:$LD_LIBRARY_PATH
RUN cd /root/workspace/mmdeploy &&\
rm -rf build/CM* build/cmake-install.cmake build/Makefile build/csrc &&\
mkdir -p build && cd build &&\
cmake .. \
-DMMDEPLOY_BUILD_SDK=ON \
-DCMAKE_CXX_COMPILER=g++ \
-Dpplcv_DIR=/root/workspace/ppl.cv/cuda-build/install/lib/cmake/ppl \
-DTENSORRT_DIR=${TENSORRT_DIR} \
-DONNXRUNTIME_DIR=${ONNXRUNTIME_DIR} \
-DMMDEPLOY_BUILD_SDK_PYTHON_API=ON \
-DMMDEPLOY_TARGET_DEVICES="cuda;cpu" \
-DMMDEPLOY_TARGET_BACKENDS="ort;trt" \
-DMMDEPLOY_CODEBASES=all &&\
make -j$(nproc) && make install &&\
cd install/example && mkdir -p build && cd build &&\
cmake -DMMDeploy_DIR=/root/workspace/mmdeploy/build/install/lib/cmake/MMDeploy .. &&\
make -j$(nproc) && export SPDLOG_LEVEL=warn &&\
if [ -z ${VERSION} ] ; then echo "Built MMDeploy master for GPU devices successfully!" ; else echo "Built MMDeploy version v${VERSION} for GPU devices successfully!" ; fi
ENV LD_LIBRARY_PATH="/root/workspace/mmdeploy/build/lib:${BACKUP_LD_LIBRARY_PATH}"
# Add mmdetection
# install mmcv and mmdetection
RUN cd / && \
git clone -b v2.24.1 https://github.com/open-mmlab/mmdetection.git && \
cd mmdetection && \
pip install -r requirements/build.txt && \
pip install -v -e .
Build command I use: nvidia-docker build -f Dockerfile.mmdeploy -t mmdeploy:latest . 2. Engine file creation# create directory and download checkpoint into it
mkdir volume_share
wget https://download.openmmlab.com/mmdetection/v2.0/yolox/yolox_s_8x8_300e_coco/yolox_s_8x8_300e_coco_20211121_095711-4592a793.pth -O volume_share/checkpoint.pth
# run conversion command with checkpoint dir mounted
nvidia-docker run -it -v $(pwd)/volume_share:/volume_share mmdeploy:latest python /root/workspace/mmdeploy/tools/deploy.py \
/root/workspace/mmdeploy/configs/mmdet/detection/detection_tensorrt-fp16_static-800x1344.py \
/mmdetection/configs/yolox/yolox_s_8x8_300e_coco.py \
/volume_share/checkpoint.pth \
/mmdetection/demo/demo.jpg \
--work-dir /volume_share \
--show \
--device cuda:0 For me this results in an 3. Triton server buildI'd like to use Triton with Amazon SageMaker, so I've added the provided FROM nvcr.io/nvidia/tritonserver:22.04-py3
# Get /bin/serve for SageMaker
RUN wget https://raw.githubusercontent.com/triton-inference-server/server/main/docker/sagemaker/serve -P /bin/ && \
chmod +x /bin/serve And here's my build command: nvidia-docker build -f Dockerfile.serve -t triton:latest . 4. Package modelI create a model directory using the following
And run these commands to create a model directory:
Which results in this directory structure for my model contents:
5. Testing serving locallyI test serving by running this command which simulates how it would be run on SageMaker:
ResultAfter going through these steps here's the output error I get:
So it looks like the primary error is:
Maybe I need to somehow identify and then copy over some shared libraries from my mmdeploy docker image into my triton docker image and then prepend If there's any way I can help to make reproducing this issue faster for you, please just let me know. Thanks! Edit: It looks like I can launch the server successfully by adding the following:
I guess I can do the first step with a multi-stage build like: FROM mmdeploy:latest AS mmdeploy
FROM nvcr.io/nvidia/tritonserver:22.04-py3
COPY --from=mmdeploy /root/workspace/mmdeploy/build/lib/libmmdeploy_tensorrt_ops.so /opt/tritonserver/lib/ And I can do the second step with a |
I got it. Let me find out if there is any improvement. |
Hi, @austinmw sorry for replying late. |
Thanks for your response! |
@austinmw i use your pipeline . But on tritonserver it only support batch-size=1. OnSagemaker it cannot receive and send response. Do you face the same problem? |
@manhtd98 You need to set the batch dimension in the mmdeploy config file to the max batch size in |
@austinmw Error when create tensorrt file. here is config file: Error Code 4: Internal Error (input: kMAX dimensions in profile 0 are [128,3,800,1344] but input has static dimensions [1,3,800,1344].)
|
I didn't use an Also probably unrelated, but you may want to increase the |
I can convert yolo but for cascade only batch 1 success. There are error: Error[10]: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[3450 + (Unnamed Layer* 3695) [Shuffle]...Reshape_1887 + Reshape_1889 + Unsqueeze_1890 + Reshape_1905]}.) |
Hmm sorry I'm not sure about that error. Maybe it's a version compatibility issue with one of the libraries in the MMDeploy Dockerfile. |
@manhtd98 Hi |
@leemengwei you need to increase |
@austinmw how did you start the client inference. I success to start sage in port 8080 but cannot send request to this |
I used Amazon SageMaker docker container to run Triton: https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-triton |
@austinmw could you provide same code. I tried many time and stuck in there
|
Hello, I'm trying to convert the model on mmdetection into an. engine file through mmdeploy and deploy it to a triton. But when I make an inference request, the output of the model is all 0 and - 1. I need your help! For detailed description of the problem, please refer to the following: Thank you very much! |
Hi, I built the MMDeploy GPU Dockerfile and installed MMDetection on top of it to generate an
end2end.engine
file.I tried to run this in NVIDIA’s Triton docker image, but got errors due to missing plugins.
I see that there’s some documentation on installing plugins, but it does not seem straight forward to add those instructions to a Dockerfile that uses Triton’s image as a base.
Since this is probably a very common use case, could I request details or a feature request to include a Dockerfile for Triton server with necessary plugins added?
Appreciate any assistance or advice, thanks!
The text was updated successfully, but these errors were encountered: