Register Torchvision Ops as Cutom Ops #1267

lara-hdr · 2019-08-28T05:49:51Z

Create a library for the custom ops.
Register Roi_Align, Roi_Pool and NMS as PyTorch custom ops.
Implement the ONNX symbolics for Roi_Align, Roi_Pool and NMS.
Add tests for exporting the ops to ONNX.

codecov-io · 2019-08-28T06:38:34Z

Codecov Report

❗ No coverage uploaded for pull request base (master@04f70c1). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master    #1267   +/-   ##
=========================================
  Coverage          ?   65.38%           
=========================================
  Files             ?       75           
  Lines             ?     5818           
  Branches          ?      886           
=========================================
  Hits              ?     3804           
  Misses            ?     1738           
  Partials          ?      276

Impacted Files	Coverage Δ
torchvision/ops/boxes.py	`94.59% <100%> (ø)`
torchvision/ops/roi_pool.py	`70.21% <100%> (ø)`
torchvision/ops/_custom_ops.py	`100% <100%> (ø)`
torchvision/ops/roi_align.py	`68% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04f70c1...6cfb171. Read the comment docs.

lara-hdr · 2019-08-28T06:41:26Z

@fmassa

fmassa

This is looking very good, thanks a lot Lara!

I've made a few comments, let me know what you think.

Also, I'd like @t-vi and @suo to have a look at this PR. @t-vi has worked a bit in the past on having C++ ops with backprop, and @suo might know what is the current status of RegisterOperator supporting backward ops.

test/test_onnx.py

torchvision/ops/roi_align.py

torchvision/ops/roi_pool.py

torchvision/ops/_custom_ops.py

torchvision/csrc/custom_ops/custom_ops.cpp

test/test_onnx.py

torchvision/csrc/custom_ops/custom_ops.cpp

test/test_onnx.py

t-vi · 2019-08-28T19:49:55Z

setup.py

@@ -135,7 +144,14 @@ def get_extensions():
            include_dirs=tests_include_dirs,
            define_macros=define_macros,
            extra_compile_args=extra_compile_args,
-        )
+        ),
+        extension(


I think one key decision to make is whether we want to cover building a "no-python" library here, too.
If we do, building this in Python setup.py (rather than with cmake) is a bit dubious. If we want this, I think we might want to refine this to not build a python module (cpp_extension.load has a is_python_module flag, but the *Extension not.

If not, we might fold the custom ops into the extension module.

This makes a lot of sense, and I imagine that one would want to run those models in C++ as well on torchscript.

One proposal: we can keep this approach as is for this PR, but we open a new issue discussing this?

t-vi · 2019-08-29T09:28:59Z

So the two things for differentiable custom ops I'm aware of (but I haven't really caught up after my vacation) are "classic" adding of a backward and custom "modern" TorchScript source-to-source differentiation (which I have a hunch will take a while yet).

I would recommend enabling the first relatively shortly and then maybe deprecating the old extension module. Adding the source-to-source differentiation will be nice, but should be transparent for users.

fmassa · 2019-09-02T14:07:48Z

So the two things for differentiable custom ops I'm aware of (but I haven't really caught up after my vacation) are "classic" adding of a backward and custom "modern" TorchScript source-to-source differentiation (which I have a hunch will take a while yet).

I would recommend enabling the first relatively shortly and then maybe deprecating the old extension module. Adding the source-to-source differentiation will be nice, but should be transparent for users.

This sounds reasonable to me. @suo what's your take on this?

suo · 2019-09-02T17:00:36Z

@t-vi's take seems right to me. The "classic" way is a bit onerous, but we should do that first to unblock the torchscript compat work, then follow up once symbolic-script style backwards is available.

fmassa · 2019-09-03T09:13:27Z

@suo @t-vi sounds good, thanks for your feedback!

I think to unblock @lara-hdr for now we will follow a simpler first step where we will not support autograd in the custom ops for now. I'll file a new task to address this once this PR gets merged, given that this is a separate task from the work needed by Lara to get those models exportable to ONNX.

@lara-hdr could you address the comments in this issue (apart from the auto differentiation, which will be tackled separately), and then this is good to merge?

Thanks!

t-vi · 2019-09-03T11:09:59Z

One small thing: I wonder if we'd want to name the ops roi_align and roi_pool without the "forward".
In the module, we have those because we expect an autograd.Function wrapper to abstract them, but in script they would show up more directly. (I guess we can have a @script wrapper function, too.)

In a similar vein, do we want to return the argmax from the op?
It would feel more natural to me to match the "end user function".

t-vi · 2019-09-04T12:09:24Z

Based on this PR (at the current state), I wrote https://github.com/t-vi/vision/tree/diff_op which replaces the autograd.Functions with differentiation in the ops. I haven't really found pytorch/pytorch#23572 to be that great a match, so I implemented it directly. I'll give some thoughts to streamlining by subclassing CppNode for the Backward Node, but my feeling is that it quite a few bits of that are not too great a match for us here (I could be wrong with that).

soumith · 2019-09-04T13:24:32Z

cc: @ezyang about Thomas' comment that #23572 wasn't a great match.

t-vi · 2019-09-04T13:39:34Z

Just to clarify: The C++ Function facility is great, I just don't know how to use those bits from within ops (but now that I'm writing this, I should try using Function from inside the op and see if it works).

ezyang · 2019-09-04T14:48:06Z

Yes, you literally just call the Function from within the traditional op binding, and put the actual forward implementation inside of the C++ class.

You'd be the first real user, so bug reports and feature requests helpful.

fmassa · 2019-09-04T16:45:17Z

One small thing: I wonder if we'd want to name the ops roi_align and roi_pool without the "forward".
In the module, we have those because we expect an autograd.Function wrapper to abstract them, but in script they would show up more directly. (I guess we can have a @script wrapper function, too.)

Great point. I think we should have a name that kind of matches what we would have if the operator was instead added directly in PyTorch. In this case, it would probably not have the _forward, and the backward op would maybe have a _backward attached to it?

In a similar vein, do we want to return the argmax from the op?
It would feel more natural to me to match the "end user function".

Another great point. I think we should follow something similar to what max_pool does in PyTorch on the JIT (not sure what it currently is though)

fmassa · 2019-09-04T16:48:57Z

@t-vi

Based on this PR (at the current state), I wrote https://github.com/t-vi/vision/tree/diff_op which replaces the autograd.Functions with differentiation in the ops. I haven't really found pytorch/pytorch#23572 to be that great a match, so I implemented it directly. I'll give some thoughts to streamlining by subclassing CppNode for the Backward Node, but my feeling is that it quite a few bits of that are not too great a match for us here (I could be wrong with that).

This patch is awesome! I think it's worth integrating it either here, or in a follow-up PR.

One question that I have is how often we will have breakages in torchvision due to us using "internal" functionality. @ezyang any ideas?

lara-hdr · 2019-09-06T05:59:28Z

@fmassa, @t-vi, @suo, thanks for the review and all the comments.

@t-vi good point for the no-python library, as @fmassa said, we could start by merging this for now and add it in a following PR.
For argmax, we are discussing adding the argmax as an output of MaxRoiPool in ONNX, so for now we will not support it but it should be added soon.
@t-vi, https://github.com/t-vi/vision/tree/diff_op looks great, we could merge this PR first, and I’ll let you add your changes to preserve your authorship?

t-vi · 2019-09-06T06:04:34Z

I think the flow should optimize handling of the merges, I'm not concerned about authorship.
Using the new autograd::Function should make the patch simpler, so there is an iteration to make yet.

t-vi · 2019-09-06T07:10:30Z

Something seems to want to rename custom_ops.*.so when it already is custom_ops.so for me in python3 setup.py bdist_wheel.
Edit: Ah, no, the second rename fails because the source it doesn't exist. Maybe this is between setup.py develop and bdist_wheel and we could only rename when the file exists. I am not sure that copying is necessarily superior workaround. (This seems to be the CPU CI failure, too, when running setup.py bdist_wheel from a fresh checkout. Keeping the renames and making the second rename in torchvision_dir conditional on the file existing fixes it for me.)

lara-hdr · 2019-09-06T09:00:03Z

Something seems to want to rename custom_ops.*.so when it already is custom_ops.so for me in python3 setup.py bdist_wheel.
Edit: Ah, no, the second rename fails because the source it doesn't exist. Maybe this is between setup.py develop and bdist_wheel and we could only rename when the file exists. I am not sure that copying is necessarily superior workaround. (This seems to be the CPU CI failure, too, when running setup.py bdist_wheel from a fresh checkout. Keeping the renames and making the second rename in torchvision_dir conditional on the file existing fixes it for me.)

You are right about trying to rename custom_ops.*.so to custom_ops.so in certain cases.
I modified the code to search for the pattern "custom_ops*." (like in the first commit) and copy the found file to torchvision repo.
The reason I am copying the file is to avoid constructing the path to the build file in _custom_ops.py in a hacky way (and avoid using glob at runtime like discussed before).
But it is not ideal since the library is a couple of MB...

Maybe going back to using glob at runtime and would make sense? any better ideas?

t-vi · 2019-09-06T09:10:25Z

Looking a the build log, I'm having doubts about the loader code in _custom_ops.py

lib_dir = os.path.join('torchvision')
extension = os.path.basename(torch._C.__file__).rsplit('.', 1)[1]
custom_op_lib = os.path.join(lib_dir, 'custom_ops.' + extension)
torch.ops.load_library(custom_op_lib)

shouldn't lib_dir be relative to __file__ to capture the torchvision installation dir?

It would seem to me that copying the .so to the torchvision dir except for setup.py develop is a red herring.

lara-hdr · 2019-09-06T09:17:31Z

@t-vi, yes, as I said in my last comment, I want to avoid doing a copy. but I would have to access the build directory from _custom_ops.py to access the library (the build directory being something like /vision/build/lib.linux-x86_64-3.7/torchvision).
But I am not sure how to access this repo without using glob at runtime.

t-vi · 2019-09-06T09:26:54Z

Maybe I'm missing something, but to me it looks like the problem is that currently the code expects the library to sit in ./torchvision/custom_ops.so relative to the current working dir. I would think that

-lib_dir = os.path.join('torchvision')
+lib_dir = os.path.join(os.path.dirname(__file__), '..')

makes it look in the right dir - i.e. where torchvision is installed.

t-vi · 2019-09-06T10:01:18Z

Hm. It seems to be looking in the right place now and locally I get it installed in the right place, too.

t-vi · 2019-09-06T12:05:25Z

So after looking at this some more, we do have a no_python_abi_suffix for the BuildExtension but not for the Cpp/CudaExtension "class" in PyTorch.
Apparently getting a custom_ops.so along with the Python extension module without that is terribly hacky, so I have to retract the suggestion to try to get a custom_ops.so without the abi suffix.
Please accept my apologies for putting you on the wrong track here!

torchvision/ops/_custom_ops.py

setup.py

lara-hdr · 2019-09-07T19:25:03Z

thanks @t-vi for all your help!

…idar/register_custom_ops

fmassa · 2019-09-08T10:11:58Z

FYI the CI failures are unrelated, and we are looking into fixing it.

t-vi

From what I can see, I think it's good to merge. Thank you for working on this!

fmassa

LGTM, thanks a lot!

I'll merge this once I get CI fixed, which I hope will be tomorrow

fmassa · 2019-09-09T09:01:38Z

Thanks a lot Lara!

ezyang · 2019-09-09T13:15:34Z

This broke pytorch-master https://circleci.com/gh/pytorch/pytorch/2711933?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console

Sep 09 12:02:26 ======================================================================
Sep 09 12:02:26 ERROR: test_list_entrypoints (__main__.TestHub)
Sep 09 12:02:26 ----------------------------------------------------------------------
Sep 09 12:02:26 Traceback (most recent call last):
Sep 09 12:02:26   File "test_utils.py", line 556, in test_list_entrypoints
Sep 09 12:02:26     entry_lists = hub.list('pytorch/vision', force_reload=True)
Sep 09 12:02:26   File "/opt/conda/lib/python3.6/site-packages/torch/hub.py", line 290, in list
Sep 09 12:02:26     hub_module = import_module(MODULE_HUBCONF, repo_dir + '/' + MODULE_HUBCONF)
Sep 09 12:02:26   File "/opt/conda/lib/python3.6/site-packages/torch/hub.py", line 72, in import_module
Sep 09 12:02:26     spec.loader.exec_module(module)
Sep 09 12:02:26   File "<frozen importlib._bootstrap_external>", line 678, in exec_module
Sep 09 12:02:26   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
Sep 09 12:02:26   File "/var/lib/jenkins/.cache/torch/hub/pytorch_vision_master/hubconf.py", line 4, in <module>
Sep 09 12:02:26     from torchvision.models.alexnet import alexnet
Sep 09 12:02:26   File "/var/lib/jenkins/.cache/torch/hub/pytorch_vision_master/torchvision/__init__.py", line 1, in <module>
Sep 09 12:02:26     from torchvision import models
Sep 09 12:02:26   File "/var/lib/jenkins/.cache/torch/hub/pytorch_vision_master/torchvision/models/__init__.py", line 12, in <module>
Sep 09 12:02:26     from . import detection
Sep 09 12:02:26   File "/var/lib/jenkins/.cache/torch/hub/pytorch_vision_master/torchvision/models/detection/__init__.py", line 1, in <module>
Sep 09 12:02:26     from .faster_rcnn import *
Sep 09 12:02:26   File "/var/lib/jenkins/.cache/torch/hub/pytorch_vision_master/torchvision/models/detection/faster_rcnn.py", line 7, in <module>
Sep 09 12:02:26     from torchvision.ops import misc as misc_nn_ops
Sep 09 12:02:26   File "/var/lib/jenkins/.cache/torch/hub/pytorch_vision_master/torchvision/ops/__init__.py", line 1, in <module>
Sep 09 12:02:26     from .boxes import nms, box_iou
Sep 09 12:02:26   File "/var/lib/jenkins/.cache/torch/hub/pytorch_vision_master/torchvision/ops/boxes.py", line 2, in <module>
Sep 09 12:02:26     import torchvision.ops._custom_ops
Sep 09 12:02:26   File "/var/lib/jenkins/.cache/torch/hub/pytorch_vision_master/torchvision/ops/_custom_ops.py", line 9, in <module>
Sep 09 12:02:26     file, path, description = imp.find_module("_custom_ops", [lib_dir])
Sep 09 12:02:26   File "/opt/conda/lib/python3.6/imp.py", line 297, in find_module
Sep 09 12:02:26     raise ImportError(_ERR_MSG.format(name), name=name)
Sep 09 12:02:26 ImportError: No module named '_custom_ops'

This reverts commit 78f169b.

This reverts commit fe234fc.

fmassa · 2019-09-09T13:42:12Z

I'm sending a follow-up patch making the imports lazy again in #1317

* Revert "Revert "Register Torchvision Ops as Cutom Ops (#1267)" (#1316)" This reverts commit fe234fc. * Make import of C++ extensions lazy * define python initialization functions for extension * Fix lint

lara-hdr added 2 commits August 27, 2019 22:42

Register torchvision ops

58baadc

install ORT only with python 3

1342e54

fmassa requested changes Aug 28, 2019

View reviewed changes

suo self-assigned this Aug 28, 2019

t-vi reviewed Aug 28, 2019

View reviewed changes

fmassa mentioned this pull request Aug 29, 2019

[JIT] Not supported for maskrcnn_resnet50_fpn #1002

Closed

remane lib + address other comments

efbf595

fix lint

99a5898

lara-hdr added 2 commits September 6, 2019 00:54

fix lib copy

44af7cb

find file with pattern instead of suffix

be2c60a

use relative path

02e79cd

fmassa reviewed Sep 6, 2019

View reviewed changes

torchvision/ops/_custom_ops.py Outdated Show resolved Hide resolved

revert rename and use imp to find lib

20297da

t-vi reviewed Sep 7, 2019

View reviewed changes

setup.py Outdated Show resolved Hide resolved

fix typo

8ae5ef7

Merge branch 'master' of https://github.com/lara-hdr/vision into laha…

6cfb171

…idar/register_custom_ops

t-vi approved these changes Sep 8, 2019

View reviewed changes

fmassa approved these changes Sep 8, 2019

View reviewed changes

fmassa closed this Sep 9, 2019

fmassa reopened this Sep 9, 2019

fmassa merged commit 78f169b into pytorch:master Sep 9, 2019

ezyang added a commit that referenced this pull request Sep 9, 2019

Revert "Register Torchvision Ops as Cutom Ops (#1267)"

447fd40

This reverts commit 78f169b.

ezyang mentioned this pull request Sep 9, 2019

Revert "Register Torchvision Ops as Cutom Ops" #1316

Merged

ezyang added a commit that referenced this pull request Sep 9, 2019

Revert "Register Torchvision Ops as Cutom Ops (#1267)" (#1316)

fe234fc

This reverts commit 78f169b.

fmassa added a commit that referenced this pull request Sep 9, 2019

Revert "Revert "Register Torchvision Ops as Cutom Ops (#1267)" (#1316)"

cfb8557

This reverts commit fe234fc.

t-vi mentioned this pull request Sep 10, 2019

[RFC] add is_python_module to CppExtension/CUDAExtension pytorch/pytorch#25945

Closed

fmassa mentioned this pull request Sep 17, 2019

ONNX Exporter Supporting ROIAlign pytorch/pytorch#24817

Closed

Register Torchvision Ops as Cutom Ops #1267

Register Torchvision Ops as Cutom Ops #1267

Uh oh!

Conversation

lara-hdr commented Aug 28, 2019

Uh oh!

codecov-io commented Aug 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lara-hdr commented Aug 28, 2019

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

t-vi Aug 28, 2019

Choose a reason for hiding this comment

Uh oh!

fmassa Aug 29, 2019

Choose a reason for hiding this comment

Uh oh!

t-vi commented Aug 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmassa commented Sep 2, 2019

Uh oh!

suo commented Sep 2, 2019

Uh oh!

fmassa commented Sep 3, 2019

Uh oh!

t-vi commented Sep 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-vi commented Sep 4, 2019

Uh oh!

soumith commented Sep 4, 2019

Uh oh!

t-vi commented Sep 4, 2019

Uh oh!

ezyang commented Sep 4, 2019

Uh oh!

fmassa commented Sep 4, 2019

Uh oh!

fmassa commented Sep 4, 2019

Uh oh!

lara-hdr commented Sep 6, 2019

Uh oh!

t-vi commented Sep 6, 2019

Uh oh!

t-vi commented Sep 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lara-hdr commented Sep 6, 2019

Uh oh!

t-vi commented Sep 6, 2019

Uh oh!

lara-hdr commented Sep 6, 2019

Uh oh!

t-vi commented Sep 6, 2019

Uh oh!

t-vi commented Sep 6, 2019

Uh oh!

t-vi commented Sep 6, 2019

Uh oh!

Uh oh!

Uh oh!

lara-hdr commented Sep 7, 2019

Uh oh!

fmassa commented Sep 8, 2019

Uh oh!

codecov-io commented Aug 28, 2019 •

edited

Loading

t-vi commented Aug 29, 2019 •

edited

Loading

t-vi commented Sep 3, 2019 •

edited

Loading

t-vi commented Sep 6, 2019 •

edited

Loading