-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Register Torchvision Ops as Cutom Ops #1267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lara-hdr
commented
Aug 28, 2019
- Create a library for the custom ops.
- Register Roi_Align, Roi_Pool and NMS as PyTorch custom ops.
- Implement the ONNX symbolics for Roi_Align, Roi_Pool and NMS.
- Add tests for exporting the ops to ONNX.
Codecov Report
@@ Coverage Diff @@
## master #1267 +/- ##
=========================================
Coverage ? 65.38%
=========================================
Files ? 75
Lines ? 5818
Branches ? 886
=========================================
Hits ? 3804
Misses ? 1738
Partials ? 276
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking very good, thanks a lot Lara!
I've made a few comments, let me know what you think.
Also, I'd like @t-vi and @suo to have a look at this PR. @t-vi has worked a bit in the past on having C++ ops with backprop, and @suo might know what is the current status of RegisterOperator
supporting backward ops.
@@ -135,7 +144,14 @@ def get_extensions(): | |||
include_dirs=tests_include_dirs, | |||
define_macros=define_macros, | |||
extra_compile_args=extra_compile_args, | |||
) | |||
), | |||
extension( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one key decision to make is whether we want to cover building a "no-python" library here, too.
If we do, building this in Python setup.py (rather than with cmake) is a bit dubious. If we want this, I think we might want to refine this to not build a python module (cpp_extension.load
has a is_python_module flag, but the *Extension
not.
If not, we might fold the custom ops into the extension module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a lot of sense, and I imagine that one would want to run those models in C++ as well on torchscript.
One proposal: we can keep this approach as is for this PR, but we open a new issue discussing this?
So the two things for differentiable custom ops I'm aware of (but I haven't really caught up after my vacation) are "classic" adding of a backward and custom "modern" TorchScript source-to-source differentiation (which I have a hunch will take a while yet). I would recommend enabling the first relatively shortly and then maybe deprecating the old extension module. Adding the source-to-source differentiation will be nice, but should be transparent for users. |
This sounds reasonable to me. @suo what's your take on this? |
@t-vi's take seems right to me. The "classic" way is a bit onerous, but we should do that first to unblock the torchscript compat work, then follow up once symbolic-script style backwards is available. |
@suo @t-vi sounds good, thanks for your feedback! I think to unblock @lara-hdr for now we will follow a simpler first step where we will not support autograd in the custom ops for now. I'll file a new task to address this once this PR gets merged, given that this is a separate task from the work needed by Lara to get those models exportable to ONNX. @lara-hdr could you address the comments in this issue (apart from the auto differentiation, which will be tackled separately), and then this is good to merge? Thanks! |
One small thing: I wonder if we'd want to name the ops In a similar vein, do we want to return the argmax from the op? |
Based on this PR (at the current state), I wrote https://github.com/t-vi/vision/tree/diff_op which replaces the autograd.Functions with differentiation in the ops. I haven't really found pytorch/pytorch#23572 to be that great a match, so I implemented it directly. I'll give some thoughts to streamlining by subclassing CppNode for the Backward Node, but my feeling is that it quite a few bits of that are not too great a match for us here (I could be wrong with that). |
cc: @ezyang about Thomas' comment that #23572 wasn't a great match. |
Just to clarify: The C++ Function facility is great, I just don't know how to use those bits from within ops (but now that I'm writing this, I should try using Function from inside the op and see if it works). |
Yes, you literally just call the Function from within the traditional op binding, and put the actual forward implementation inside of the C++ class. You'd be the first real user, so bug reports and feature requests helpful. |
Great point. I think we should have a name that kind of matches what we would have if the operator was instead added directly in PyTorch. In this case, it would probably not have the
Another great point. I think we should follow something similar to what |
This patch is awesome! I think it's worth integrating it either here, or in a follow-up PR. One question that I have is how often we will have breakages in torchvision due to us using "internal" functionality. @ezyang any ideas? |
@fmassa, @t-vi, @suo, thanks for the review and all the comments.
|
I think the flow should optimize handling of the merges, I'm not concerned about authorship. |
Something seems to want to rename |
You are right about trying to rename Maybe going back to using glob at runtime and would make sense? any better ideas? |
Looking a the build log, I'm having doubts about the loader code in _custom_ops.py lib_dir = os.path.join('torchvision')
extension = os.path.basename(torch._C.__file__).rsplit('.', 1)[1]
custom_op_lib = os.path.join(lib_dir, 'custom_ops.' + extension)
torch.ops.load_library(custom_op_lib) shouldn't It would seem to me that copying the .so to the torchvision dir except for |
@t-vi, yes, as I said in my last comment, I want to avoid doing a copy. but I would have to access the build directory from _custom_ops.py to access the library (the build directory being something like /vision/build/lib.linux-x86_64-3.7/torchvision). |
Maybe I'm missing something, but to me it looks like the problem is that currently the code expects the library to sit in
makes it look in the right dir - i.e. where torchvision is installed. |
Hm. It seems to be looking in the right place now and locally I get it installed in the right place, too. |
So after looking at this some more, we do have a no_python_abi_suffix for the BuildExtension but not for the Cpp/CudaExtension "class" in PyTorch. |
thanks @t-vi for all your help! |
…idar/register_custom_ops
FYI the CI failures are unrelated, and we are looking into fixing it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can see, I think it's good to merge. Thank you for working on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot!
I'll merge this once I get CI fixed, which I hope will be tomorrow
Thanks a lot Lara! |
This broke pytorch-master https://circleci.com/gh/pytorch/pytorch/2711933?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console
|
I'm sending a follow-up patch making the imports lazy again in #1317 |