Add ONNX and ORT support + Docs for TensorRT #1857

msaroufim · 2022-09-12T12:24:53Z

EDIT 11/4: To make this easier to merge I've cut the scope to only ONNX and ORT, will revisit the rest after pytorch/tensorrt when it gets official pypi binaries but this is now ready for review

EDIT 11/6: I'm going to write a brief doc page on using different optimization runtimes

EDIT 11/8: Addressed most feedback, rest will be addressed in future work

This PR

a --serialized-file that's in .onnx format, which will be correctly loaded by the base handler using an ort.InferenceSession() https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html
TensorRT format is a .ts extension which can be loaded via torch.jit.load() https://pytorch.org/TensorRT/getting_started/getting_started_with_python_api.html#getting-started-with-python-api
A hardcoded model_config.json similar to index_to_name.json which would have model specific information which you can get access via `model_config.get("property"). For now I'm not using any special configs
Starting with torch 1.12 nvfuser is now the default backend for torchscript so that's a free performance enhancement we got where we don't need to do anything

Open question

Should we do the conversion from torch to ONNX in the initialize()? My gut is no, we don't ask users to train models in initialize() they should prepare their models and once prepared used serve but I could be convinced otherwise - EDIT: NO
Should I put a schema on model_config.json? I'd like to wait and see what schema gets chosen for Modularize ipex optimization in base_handler.py into ts/utils/ipex_optimization.py. #1664 and then decide
There are other ways of designing the same thing but I felt like this was the simplest, I thought about the here https://gist.github.com/msaroufim/c5aa505fdd7053ca184da921536c26f1

Future work

Add AITemplate support https://ai.facebook.com/blog/gpu-inference-engine-nvidia-amd-open-source/?utm_source=twitter&utm_medium=organic_social&utm_campaign=blog
nvfuser is actually already supported and starting 1.12 has become the default fusing backend for torch.jit.script()

codecov · 2022-09-12T12:58:08Z

Codecov Report

Merging #1857 (0733614) into master (a5fc61f) will increase coverage by 2.99%.
The diff coverage is 25.67%.

❗ Current head 0733614 differs from pull request most recent head 6c9c7c5. Consider uploading reports for the commit 6c9c7c5 to get more accurate results

@@            Coverage Diff             @@
##           master    #1857      +/-   ##
==========================================
+ Coverage   41.67%   44.66%   +2.99%     
==========================================
  Files          55       63       +8     
  Lines        2282     2624     +342     
  Branches        1       56      +55     
==========================================
+ Hits          951     1172     +221     
- Misses       1331     1452     +121

Impacted Files	Coverage Δ
setup.py	`0.00% <0.00%> (ø)`
ts/torch_handler/base_handler.py	`0.00% <0.00%> (ø)`
ts/utils/util.py	`37.50% <ø> (ø)`
...l-archiver/model_archiver/model_packaging_utils.py	`54.66% <54.28%> (ø)`
...el-archiver/model_archiver/model_archiver_error.py	`100.00% <0.00%> (ø)`
workflow-archiver/workflow_archiver/version.py	`100.00% <0.00%> (ø)`
model-archiver/model_archiver/model_packaging.py	`90.00% <0.00%> (ø)`
...w-archiver/workflow_archiver/workflow_packaging.py	`89.65% <0.00%> (ø)`
model-archiver/model_archiver/version.py	`100.00% <0.00%> (ø)`
...hiver/workflow_archiver/workflow_archiver_error.py	`100.00% <0.00%> (ø)`
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

HamidShojanazeri · 2022-09-20T19:54:04Z

Beside the inline comments, we also would need to think of preprocessing for onnx (to avoid custom handlers), as the input need to be converted to numpy. We might be able to have separate util files for onnx, trt or future backends/dynamo or alternatively have one backend utils to keep all the related helper functions there. This may help to keep the base_handler cleaner and more maintainable. cc @lxning on this as well.

model-archiver/model_archiver/arg_parser.py

ts/torch_handler/base_handler.py

model-archiver/model_archiver/arg_parser.py

lxning · 2022-10-04T19:08:03Z

requirements/developer.txt

@@ -13,4 +13,5 @@ pygit2==1.6.1
 pyspelling
 pre-commit
 twine
+onnxruntime


it should be dynamically loaded based on model runtime type.

I think we should do that for our common dependencies but for our developer dependencies we should install everything so we can run tests in CI

ts/torch_handler/base_handler.py

msaroufim · 2022-10-11T03:25:55Z

Just a quick update, the main challenge for this PR will be correctly packaging setup.py with all the correct package versions and I'm seeing all sorts of basic issues even with basic models to setup some E2E test. Some relevant issues

❓ [Question] How to install "tensorrt" package? TensorRT#1233 (comment)
Warning: Checker does not support models with experimental ops: ATen ort#150
Other alternatives like AITemplate still don't have an official release and only support Ampere (A100+) Any plans for official releases? facebookincubator/AITemplate#36

Is linking to specific versions enough? Should I use git submodules? When do I update them? How will docker support work?

EDIT:

The installation experience is now pip install torchserve[tensorrt] if remote or pip install .[tensorrt] which will install the necessary packages and the full set of options is onnx, tensorrt, ipex

msaroufim · 2022-11-05T00:44:25Z

And here's an example of an inference running from my logs. Repro is in test/pytest/test_onnx.py

(base) ubuntu@ip-172-31-17-70:~$ curl -X POST http://127.0.0.1:8080/predictions/onnx --data-binary '1'
[
  -0.28003010153770447
](base) ubuntu@ip-172-31-17-70:~$

ts/torch_handler/base_handler.py

lxning · 2022-11-08T00:23:07Z

requirements/developer.txt

+onnx
+onnx-runtime
+numpy


this would potentially cause docker image (> 10g) too big if we want to support multi-platform. We need figure out a way to package for multi-platform support

That's true, I guess for dev dependencies that seems fine though. I can remove these if we disable the ONNX tests by default but I don't believe we should either

model-archiver/model_archiver/arg_parser.py

ts/torch_handler/base_handler.py

setup.py

HamidShojanazeri

Thanks @msaroufim , I added some comments, but more generally I wonder if we have plans to move all the backend initialization logics + utilities into a backend specific files / utils. This would make the base_handler easier to maintain and more readable. Maybe its a good boot camp task?

ts/torch_handler/base_handler.py

HamidShojanazeri · 2022-11-08T01:25:16Z

ts/torch_handler/base_handler.py


        # Load class mapping for classifiers
        mapping_file_path = os.path.join(model_dir, "index_to_name.json")
        self.mapping = load_label_mapping(mapping_file_path)

        self.initialized = True

+    def _load_onnx_model(self, model_onnx_path):


are we going to replace the above Lines 120-137 with this function?

Yeah this is a typo, will fix

ts/torch_handler/base_handler.py

msaroufim · 2022-11-08T18:41:51Z

Discussed offline with Li and Hamid

To merge this PR

Disable test by default in CI
Remove requirements from developer dependencies
Add documentation to describe how people should pre and post process with ONNX
Update test

Future

automation of pre and post processing
Refactor docker image to pull in necessary runtime dependencies optionally
Refactor base handler

msaroufim · 2022-11-09T04:44:26Z

@HamidShojanazeri @lxning I made all the changes we discussed

Logs

ubuntu@ip-172-31-17-70:~/serve/test/pytest$ curl -X POST http://127.0.0.1:8080/predictions/onnx --data-binary '1'
Prediction Succeeded

Pytest

(ort) ubuntu@ip-172-31-17-70:~/serve/test/pytest$ pytest test_onnx.py 
==================================== test session starts =====================================
platform linux -- Python 3.8.13, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/ubuntu/serve
plugins: mock-3.10.0, cov-4.0.0
collected 5 items                                                                            

test_onnx.py .....                                                                     [100%]

===================================== 5 passed in 1.17s ======================================

Link check is failing because I'm linking to a test file that doesn't exist on master yet https://github.com/pytorch/serve/actions/runs/3425428486/jobs/5706209933#step:5:867 - this wont be a problem after we merge

HamidShojanazeri

Thanks @msaroufim , LGTM. I also suggest for next set of PRs maybe having linting/ formatting as a separate PR makes it easier to focus on the changes.

docs/performance_guide.md

msaroufim added 2 commits September 12, 2022 12:24

[WIP] Add ONNX support

d189865

reqs

43c2ae8

msaroufim added 2 commits September 12, 2022 13:23

update

d1f8aee

push

4dc1d29

msaroufim changed the title ~~[WIP] Add ONNX support~~ [WIP] Add ONNX and TensorRT support Sep 13, 2022

msaroufim added 2 commits September 13, 2022 06:08

push

b14ed5c

push

a2d6937

msaroufim changed the title ~~[WIP] Add ONNX and TensorRT support~~ Add ONNX and TensorRT support Sep 13, 2022

msaroufim mentioned this pull request Sep 14, 2022

Modularize ipex optimization in base_handler.py into ts/utils/ipex_optimization.py. #1664

Closed

HamidShojanazeri self-requested a review September 20, 2022 18:41

HamidShojanazeri reviewed Sep 20, 2022

View reviewed changes

model-archiver/model_archiver/arg_parser.py Show resolved Hide resolved

ts/torch_handler/base_handler.py Outdated Show resolved Hide resolved

ts/torch_handler/base_handler.py Outdated Show resolved Hide resolved

msaroufim added the optimization label Sep 26, 2022

msaroufim mentioned this pull request Sep 26, 2022

[WIP] [DO NOT REVIEW] Adding TRT support for HF T5 #1557

Closed

9 tasks

msaroufim added the p0 high priority label Sep 30, 2022

lxning reviewed Oct 4, 2022

View reviewed changes

msaroufim added 4 commits October 4, 2022 12:54

[skip ci] add optional dependencies in reqs

db5070b

[skip ci] update

cafb97a

Update setup.py

32340a2

Update developer.txt

782afa7

Checkpoint before I remove non TensorRT stuff

ad8c993

msaroufim mentioned this pull request Oct 14, 2022

✨[Feature] PyPi binaries pytorch/TensorRT#1402

Closed

msaroufim changed the title ~~Add ONNX and TensorRT support~~ Add ONNX and ORT support Nov 4, 2022

msaroufim added 4 commits November 4, 2022 22:00

updates

f7b35a9

add tests

9e8a47d

push

9f50d43

lint

6658183

msaroufim and others added 2 commits November 5, 2022 00:46

it works

db24227

Merge branch 'master' into archivdata

4438556

msaroufim commented Nov 6, 2022

View reviewed changes

ts/torch_handler/base_handler.py Outdated Show resolved Hide resolved

Update util.py

49a5c94

msaroufim requested review from HamidShojanazeri and lxning November 8, 2022 00:13

lxning reviewed Nov 8, 2022

View reviewed changes

HamidShojanazeri reviewed Nov 8, 2022

View reviewed changes

msaroufim added 6 commits November 9, 2022 03:05

[skip ci] remove requirements.txt changes

3b827e7

[skip ci] skip test if onnx not found

2fbe23f

[skip ci] skip test if onnx not found

7ea2a0d

[skip ci] updated docs

3d779a0

[skip ci] updated docs

a240134

[skip ci] remove junk

d003cbe

msaroufim and others added 3 commits November 9, 2022 04:45

tests pass

7a23ddb

Merge branch 'master' into archivdata

7e5f5a8

Update wordlist.txt

3bc0747

msaroufim changed the title ~~Add ONNX and ORT support~~ Add ONNX and ORT support + Docs for TensorRT Nov 9, 2022

Update performance_guide.md

f164136

HamidShojanazeri approved these changes Nov 9, 2022

View reviewed changes

docs/performance_guide.md Outdated Show resolved Hide resolved

docs/performance_guide.md Outdated Show resolved Hide resolved

msaroufim added 2 commits November 8, 2022 23:12

Update performance_guide.md

1df2a2c

Update performance_guide.md

6c9c7c5

msaroufim requested a review from lxning November 9, 2022 16:23

lxning approved these changes Nov 9, 2022

View reviewed changes

msaroufim merged commit 53a5613 into master Nov 9, 2022

msaroufim deleted the archivdata branch November 9, 2022 23:34

msaroufim linked an issue Nov 14, 2022 that may be closed by this pull request

Make TorchServe multi framework #1208

Closed

msaroufim mentioned this pull request Nov 14, 2022

AITemplate support #1972

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ONNX and ORT support + Docs for TensorRT #1857

Add ONNX and ORT support + Docs for TensorRT #1857

msaroufim commented Sep 12, 2022 •

edited

codecov bot commented Sep 12, 2022 •

edited

HamidShojanazeri commented Sep 20, 2022 •

edited

lxning Oct 4, 2022

msaroufim Nov 4, 2022

msaroufim commented Oct 11, 2022 •

edited

msaroufim commented Nov 5, 2022

lxning Nov 8, 2022

msaroufim Nov 8, 2022

HamidShojanazeri left a comment

HamidShojanazeri Nov 8, 2022

msaroufim Nov 8, 2022

msaroufim commented Nov 8, 2022 •

edited

msaroufim commented Nov 9, 2022 •

edited

HamidShojanazeri left a comment •

edited

Add ONNX and ORT support + Docs for TensorRT #1857

Add ONNX and ORT support + Docs for TensorRT #1857

Conversation

msaroufim commented Sep 12, 2022 • edited

This PR

Open question

Future work

codecov bot commented Sep 12, 2022 • edited

Codecov Report

HamidShojanazeri commented Sep 20, 2022 • edited

lxning Oct 4, 2022

Choose a reason for hiding this comment

msaroufim Nov 4, 2022

Choose a reason for hiding this comment

msaroufim commented Oct 11, 2022 • edited

msaroufim commented Nov 5, 2022

lxning Nov 8, 2022

Choose a reason for hiding this comment

msaroufim Nov 8, 2022

Choose a reason for hiding this comment

HamidShojanazeri left a comment

Choose a reason for hiding this comment

HamidShojanazeri Nov 8, 2022

Choose a reason for hiding this comment

msaroufim Nov 8, 2022

Choose a reason for hiding this comment

msaroufim commented Nov 8, 2022 • edited

msaroufim commented Nov 9, 2022 • edited

Logs

Pytest

HamidShojanazeri left a comment • edited

Choose a reason for hiding this comment

msaroufim commented Sep 12, 2022 •

edited

codecov bot commented Sep 12, 2022 •

edited

HamidShojanazeri commented Sep 20, 2022 •

edited

msaroufim commented Oct 11, 2022 •

edited

msaroufim commented Nov 8, 2022 •

edited

msaroufim commented Nov 9, 2022 •

edited

HamidShojanazeri left a comment •

edited