Add support for ONNX-only #4291

thiagocrepaldi · 2022-05-31T16:44:19Z

This PR is composed of different fixes to enable and end-to-end ONNX export functionality for detectron2 models

add_export_config API is publicly available exposed even when caffe2 is not compiled along with PyTorch (that is the new default behavior on latest PyTorch). A warning message informing users about its deprecation on future versions is also added
tensor.shape[0] replaces len(tensor) and for idx, img in enumerate(tensors) replaces for tmp_var1, tmp_var2 in zip(tensors, batched_imgs) so that the tracer does not lose reference to the user input on the graphs.
- Before the changes above, the graph (see below) does not have an actual input. Instead, the input is exported as a model weight
- After the fix, the user images are properly acknowledged as model's input (see below) during ONNX export
Added unit tests (tests/torch_export_onnx.py) for detectron2 models
ONNX is added as dependency for the CI to be able to run the aforementioned tests
Added custom symbolic functions to allow CI pipelines to succeed. The symbolics are needed because PyTorch 1.8, 1.9 and 1.10 adopted by detectron2 have several bugs. They can be removed when 1.11+ is adopted by detectron2's CI infra

Fixes #3488
Fixes pytorch/pytorch#69674 (PyTorch repo)

thiagocrepaldi · 2022-06-01T15:35:33Z

@ppwwyyxx @FrancescoMandru splitting up Detectron2 ONNX export in a separate (and clean) PR

ppwwyyxx · 2022-06-01T22:58:25Z

detectron2/export/__init__.py

+def add_export_config(cfg):
+    warnings.warn("add_export_config has been deprecated and behaves as no-op function.")
+    return cfg
+



No more need to move this function if it's not used at all.

This is still a public API and fully working for detectron2 + pytorch < 1.11, so better give users a grace period before removing it. What do you think?

I'm not suggesting removing it. But I suggest to keep the function where it was and it seems it won't break anyone. So there is no need to move the function into __init__.py

For PyTorch >= 1.11, the current detectron2 implementation (aka keeping add_export_config at detectron2/export/api.py) does break existing training scripts. See a full repro below:

conda create -n torch111_py39cu113 python=3.9 numpy conda activate torch111_py39cu113 conda install pytorch torchvision cudatoolkit=11.3 -c pytorch # pytorch 1.11 python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' python -c "from detectron2.export import add_export_config"

The result is:

Traceback (most recent call last): File "<string>", line 1, in <module> ImportError: cannot import name 'add_export_config' from 'detectron2.export' (/anaconda3/envs/torch111_py39cu113/lib/python3.9/site-packages/detectron2/export/__init__.py)

The root cause is that after a "recent" PyTorch BC change, libcaffe2 is no longer distributed along with official PyTorch packages. When detectron2 tries to import anything from detectron2.export.apy namespace, it will try to import caffe2 stuff indirectly, such as from caffe2.proto import caffe2_pb2

This change tries to create an explicit deprecation path by printing a scary warning so that users update their code while it still keeps things working (by moving add_export_config to a caffe2-free spot)

detectron2/export/c10.py

tests/test_export_onnx.py

ppwwyyxx · 2022-06-01T23:08:20Z

tests/test_export_onnx.py

+            "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml", inference_func, batch=2
+        )
+
+    @skipIfUnsupportedMinOpsetVersion(16, STABLE_ONNX_OPSET_VERSION)


@unittest.skipIf( STABLE_ONNX_OPSET_VERSION < 16, "torch version too old")

this should be sufficient.

Also, is it the plan that a newer version of pytorch will support a new-enough onnx opset? If that's the case please add a todo here so this check can be removed in the future?

Opset 16 is the current ONNX opset version and the next PyTorch version already implements it.
We need to skip this test for now because detectron2 is still using older pytorch versions 1.8, 1.9 and 1.10 in the CI

Please add a todo here to update the value of STABLE_ONNX_OPSET_VERSION so this check can be removed in the future.

skipIfUnsupportedMinOpsetVersion is used only once, so it's easier to just write @unittest.skipIf( STABLE_ONNX_OPSET_VERSION < 16, "torch version too old")

It is used once because ONNX export functionality is divided into several PRs.

#4205 uses skipIfUnsupportedMinOpsetVersion at least 2 times (here and here) and possibly more when I add more tests. This PR is on hold until all other 3 are in

setup.py

detectron2/export/README.md

detectron2/structures/image_list.py

detectron2/structures/instances.py

FrancescoMandru · 2022-06-14T09:00:10Z

Updates on this support?

orionr · 2022-06-14T14:50:42Z

We are looking for the right POCs to review. Stay tuned.

orionr · 2022-06-14T14:51:12Z

Also, can you please rebase since there seems to be conflicts? Thanks.

FrancescoMandru · 2022-06-14T14:54:10Z

Also, can you please rebase since there seems to be conflicts? Thanks.

@thiagocrepaldi is the main (unique) contributor on this feature so I tag him.

ppwwyyxx · 2022-06-14T18:46:52Z

detectron2/export/__init__.py

-from .flatten import TracingAdapter
-from .torchscript import scripting_with_instances, dump_torchscript_IR
+
+STABLE_ONNX_OPSET_VERSION = 11


does it make sense to condition this value based on torch.__version__? e.g. set to 16 for newer torch versions

Not really, pytorch and ONNX opset versions is a 1:N relationship. Many model authors experiment performance using different combinations of opset and/or pytorch versions. Many time newer pytorch versions improve implementation of older opsets... Other times, newer opsets have new operators available, which allows more efficient implementation

It is up to the authors pick their preferred (possibly older) opset version on, generally speaking, the latest pytorch version

When ONNX Runtime is integrated on detectron2 CI pipeline and ONNX export is thoroughly tested, I will work on updating the supported opset to 16 and see how they compare to opset 11

detectron2/utils/testing.py

ppwwyyxx · 2022-06-14T18:53:47Z

Thanks @orionr ! I can help with most of the review but will definitely need someone at Meta to help land.

facebook-github-bot · 2022-06-14T20:31:47Z

@zhanghang1989 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-06-23T22:53:18Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-23T23:31:06Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-06-23T23:31:41Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

thiagocrepaldi · 2022-06-24T01:27:47Z

@ppwwyyxx I think I have addressed everything, hopefully it is aligned with your ideas

I will jump to #4315 while you take your time with this one

Thanks for taking the time

FrancescoMandru · 2022-07-04T12:20:23Z

@orionr Hello Sir, any updates on this PR?

thiagocrepaldi · 2022-07-12T13:17:46Z

@zhanghang1989 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Thanks @zhanghang1989 @orionr @wat3rBro The internal test failed after 6h. Could you help me with some backtrace or relevant-non-proprietary log so that i can fix it? I would guess this is a false positive, as Caffe2 export is not currently supported by Detectron2 and the files I changed are only pertinent to Caffe2 export

detectron2/utils/testing.py

facebook-github-bot · 2022-07-14T17:34:20Z

@mcimpoi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mcimpoi · 2022-07-14T17:35:25Z

Re-importing the changes to fbsource.

The test failure was a transient infra error on our end. Re-ran the tests.

facebook-github-bot · 2022-07-14T18:06:15Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-14T18:23:39Z

@mcimpoi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

thiagocrepaldi · 2022-07-15T12:35:03Z

@mcimpoi Cheers Mircea, I saw that the Linter issue is gone after your tip. THANK YOU

The internal Build & Tests failure persists, though. Would you know whether it is a transient infra issue or an actual problem that I could work on? Thank you again

* add_export_config API is publicly available exposed even when caffe2 is not compiled along with PyTorch (the new default behavior on latest PyTorch). A warning message informing users about its deprecation on future versions is also added * `tensor.shape[0]` replaces `len(tensor)` and `for idx, img in enumerate(tensors)` replaces `for tmp_var1, tmp_var2 in zip(tensors, batched_imgs)` so that the tracer does not lose reference to the user input on the graphs. * Added unit tests (tests/torch_export_onnx.py) for detectron2 models * ONNX is added as dependency for the CI to be able to run the aforementioned tests * Added custom symbolic functions to allow CI pipelines to succeed. The symbolics are needed because PyTorch 1.8, 1.9 and 1.10 adopted by detectron2 have several bugs. They can be removed when 1.11+ is adopted by detectron2's CI infra

Although facebookresearch#4120 exported a valid ONNX graph, after running it with ONNX Runtime, I've realized that all model inputs were exported as constants. The root cause was that at detectron2/structures/image_list.py, the padding of batched images were done on a temporary (copy) variable, which the ONNX tracer cannot track the user input, leading in suppressing inputs and storing them as constant during export. Also, len(torch.Tensor) is traced as a constant shape, which poses a problem if the next input has different shape. This PR fixes that by using torch.Tensor.shape[0] instead

facebook-github-bot · 2022-07-15T17:24:05Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-15T17:41:34Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

ppwwyyxx · 2022-07-15T18:08:46Z

Please just disregard " internal Build & Tests failure" on github.

To Meta employee: I had a complaint in the open source support group that internal warnings will show as failures on github.

thiagocrepaldi · 2022-07-15T18:14:45Z

The reason I have rebased this branch was because #4295 was merged (yay!!!!), conflicting with this one (that was expected as I have been working in several PRs simultaneously to keep things going)

facebook-github-bot · 2022-07-15T19:11:17Z

@mcimpoi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mcimpoi

Thanks @thiagocrepaldi for fixing the linter errors!
I imported the changes again to fbsource.

mcimpoi · 2022-07-15T19:38:50Z

Safe to ignore the last internal tools failure -- it was caused because some of the tests were cancelled, due to rebasing.

thiagocrepaldi · 2022-07-16T04:32:42Z

Thank you all for reviewing and accepting this :)

Summary: In order to export `KRCNNConvDeconvUpsampleHead` to ONNX using `torch.jit.script`, changes to both PyTorch and detectron2: - Pytorch has a bug which prevents a tensor wrapped by a list as float. Refer to the required [fix](pytorch/pytorch#81386) - `detectron2/structures/keypoints.py::heatmaps_to_keypoints` internally does advanced indexing on a `squeeze`d tensor. The aforementioned `squeeze` fails rank inference due to the presence of `onnx::If` on its implementation (to support dynamic dims). The fix is replacing `squeeze` by `reshape`. A possible fix to `squeeze` on PyTorch side might be done too (TBD and would take some time), but the proposed change here does not bring any consequence to detectron2 while it enables ONNX support with scriptable `KRCNNConvDeconvUpsampleHead `. After the proposed changes, the `KRCNNConvDeconvUpsampleHead` does include a `Loop` node to represent a for-loop inside the model and `dynamic outputs`, as shown below: ![image](https://user-images.githubusercontent.com/5469809/179559001-f60fb8af-ec79-4758-b271-736467b5d96f.png) This PR has been tested with ONNX Runtime (this [PR](#4205)) to ensure the ONNX output matches PyTorch's for different `gen_input(X, Y)` combinations and it succeeded. The model was converted to ONNX once with a particular input and tested with inputs of different shapes and compared to equality to PyTorch's Depends on: pytorch/pytorch#81386 and #4291 Pull Request resolved: #4315 Reviewed By: newstzpz Differential Revision: D42756423 fbshipit-source-id: dc410df18da07f48c14f4cae9a4a91530a0ec602

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 31, 2022

thiagocrepaldi force-pushed the thiagofc/add-onnx-export-support branch 3 times, most recently from 7e7e074 to 510027f Compare June 1, 2022 14:57

thiagocrepaldi marked this pull request as ready for review June 1, 2022 15:33

ppwwyyxx suggested changes Jun 2, 2022

View reviewed changes

thiagocrepaldi force-pushed the thiagofc/add-onnx-export-support branch from c239e5e to c14e3b6 Compare June 3, 2022 15:43

thiagocrepaldi mentioned this pull request Jun 3, 2022

Add support for Caffe2 ONNX export #4295

Closed

thiagocrepaldi requested a review from ppwwyyxx June 3, 2022 15:51

thiagocrepaldi changed the title ~~Add support for ONNX-only (original PR #4120)~~ Add support for ONNX-only Jun 6, 2022

thiagocrepaldi mentioned this pull request Jun 6, 2022

Add support for ONNX-only and Caffe2 ONNX export #4120

Closed

thiagocrepaldi force-pushed the thiagofc/add-onnx-export-support branch from c14e3b6 to 317346a Compare June 6, 2022 23:00

ppwwyyxx reviewed Jun 14, 2022

View reviewed changes

detectron2/utils/testing.py Outdated Show resolved Hide resolved

thiagocrepaldi force-pushed the thiagofc/add-onnx-export-support branch from 317346a to 2720f3e Compare June 23, 2022 22:53

thiagocrepaldi force-pushed the thiagofc/add-onnx-export-support branch from 2720f3e to 793b649 Compare June 23, 2022 23:31

thiagocrepaldi force-pushed the thiagofc/add-onnx-export-support branch from 793b649 to 99a9ff7 Compare June 23, 2022 23:31

thiagocrepaldi requested a review from ppwwyyxx June 23, 2022 23:32

thiagocrepaldi mentioned this pull request Jul 13, 2022

Add ONNX support to KRCNNConvDeconvUpsampleHead #4315

Closed

mcimpoi reviewed Jul 14, 2022

View reviewed changes

detectron2/utils/testing.py Outdated Show resolved Hide resolved

thiagocrepaldi force-pushed the thiagofc/add-onnx-export-support branch from 27a6209 to 95f05c3 Compare July 14, 2022 18:06

thiagocrepaldi added 5 commits July 15, 2022 13:18

Pin protobuf<4.0 to fix build break

2dde706

Address comments

f97072c

Address comments

8bfe4f4

thiagocrepaldi force-pushed the thiagofc/add-onnx-export-support branch from 95f05c3 to 4bc6262 Compare July 15, 2022 17:24

Fix "Facebook Internal - Linter"'s CI warnings

2626240

thiagocrepaldi force-pushed the thiagofc/add-onnx-export-support branch from 4bc6262 to 2626240 Compare July 15, 2022 17:41

thiagocrepaldi requested a review from mcimpoi July 15, 2022 18:14

mcimpoi approved these changes Jul 15, 2022

View reviewed changes

facebook-github-bot closed this in 48b598b Jul 16, 2022

thiagocrepaldi deleted the thiagofc/add-onnx-export-support branch July 16, 2022 04:31

FrancescoMandru mentioned this pull request Aug 17, 2022

Panoptic Segmentation Can Not be Exported to ONNX #4354

Open

Add support for ONNX-only #4291

Add support for ONNX-only #4291

Conversation

thiagocrepaldi commented May 31, 2022 • edited

thiagocrepaldi commented Jun 1, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thiagocrepaldi Jun 23, 2022 • edited

Choose a reason for hiding this comment

FrancescoMandru commented Jun 14, 2022

orionr commented Jun 14, 2022

orionr commented Jun 14, 2022

FrancescoMandru commented Jun 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ppwwyyxx commented Jun 14, 2022

facebook-github-bot commented Jun 14, 2022

facebook-github-bot commented Jun 23, 2022

facebook-github-bot commented Jun 23, 2022

facebook-github-bot commented Jun 23, 2022

thiagocrepaldi commented Jun 24, 2022

FrancescoMandru commented Jul 4, 2022

thiagocrepaldi commented Jul 12, 2022

facebook-github-bot commented Jul 14, 2022

mcimpoi commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

thiagocrepaldi commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

ppwwyyxx commented Jul 15, 2022

thiagocrepaldi commented Jul 15, 2022

facebook-github-bot commented Jul 15, 2022

mcimpoi left a comment

Choose a reason for hiding this comment

mcimpoi commented Jul 15, 2022

thiagocrepaldi commented Jul 16, 2022

thiagocrepaldi commented May 31, 2022 •

edited

thiagocrepaldi commented Jun 1, 2022 •

edited

thiagocrepaldi Jun 23, 2022 •

edited