New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Tracing / Scripting #138

Open
wants to merge 17 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@t-vi

t-vi commented Nov 10, 2018

With this patch you can get a traced/scripted MaskRCNN model. To facilitate discussion - maybe also to have a test-case for the remaining features to be wanted in the jit - I put this out in it's current state, rather than working in private towards a mergeable patch.
I appreciate your feedback but note that it's not quite ready yet.

We have:

  • demo/trace_model.py produces a torch script model by a mix of tracing/scripting,
  • demo/traced_model.cpp uses the model in C++.

So all of this is very raw, and there are classes of hacks described in the issue #27, in particular

  • call forward directly all the time to avoid errors about tracing tensor/non-tuple functions, but my understanding is that there might be a way to have more structured types with the jit soonish,
  • some script functions with funny indirections to make jit work better (though some might be unneeded by the awesome work of the PyTorch team),
  • paths with alternative calculations, in particular to avoid inplace operations,
  • a few custom ops, e.g. to avoid untraceable variable length loops,

Lots of Todos:

  • there are warnings to be investigated, in particularly for loops and boxlist sizes
  • it needs some PyTorch JIT fixes that are in master (all there, thanks!)
  • clean up model, including reverting unneeded changes,
  • clean up cmake, e.g. proper detection of OpenCV,
  • invoke cmake / build the custom ops from setup.py,
  • try a different model/config,
  • maybe script more to be more flexible with sizes,
  • have a leaner pretrained model.

As for my use case: It also would work on Android if PyTorch was there yet.

Initial stab at tracing
As this is very much work in progress, here are some quick notes:
- demo/trace_model.py has the current state

Lots of todos:
- there are warnings to be investigated, in particularly for loops and
  boxlist sizes
- it needs some PyTorch JIT fixes
- clean up, including reverting unneeded changes
- round off the displaying
- do a C++ app
- try a different model/config
@fmassa

This comment has been minimized.

fmassa commented on maskrcnn_benchmark/modeling/poolers.py in 84af4e9 Nov 7, 2018

I think the indentation in here is off

This comment has been minimized.

Owner

t-vi replied Nov 7, 2018

Thanks, yes!

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Nov 10, 2018

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot

This comment has been minimized.

facebook-github-bot commented Nov 10, 2018

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@Eric-Zhang1990

This comment has been minimized.

Eric-Zhang1990 commented Nov 12, 2018

@t-vi Can you tell me why this error shows when I am running your demo trace_model.py ? Thanks. The error is: OSError: /home/eric/Disk100G/githubProject/maskrcnn-benchmark/maskrcnn_benchmark/csrc/custom_ops/libmaskrcnn_benchmark_customops.so: undefined symbol: _ZN5torch3jit8ListType9ofTensorsEv

@t-vi

This comment has been minimized.

t-vi commented Nov 12, 2018

@Eric-Zhang1990 that symbol (aka torch::jit::ListType::ofTensors()) looks like it might be in libtorch and the libmaskrcnn_benchmark_customops.so should be linked to libtorch if all goes OK. maybe the libtorch you built with isn't the one you use? With the lightning speed of JIT development necessitates my experience is that having matching versions during build and invocation often helps.

@xxradon

This comment has been minimized.

xxradon commented Nov 12, 2018

@t-vi Wondelful work you had done,can you give a tutorial to reproduce your work?I met nearly same error as @Eric-Zhang1990 did. Thanks.

OSError: /home/shining/Projects/github-projects/pytorch-project/maskrcnn-benchmark/maskrcnn_benchmark/libmaskrcnn_benchmark_customops.so: undefined symbol: _ZN3c108demangleEPKc

@t-vi

This comment has been minimized.

t-vi commented Nov 12, 2018

@xxradon I think it's premature to expect this to work without bumps, unfortunately. But of course, I appreciate that you are trying and I'll be very happy for suggestions how to improve the build process.

One key aspect is that you have to make 100% sure that your libtorch exactly matches the one you used to build and in Python and the headers need to be the right ones, too. Try pointing to it with LD_LIBRARY_PATH to torch/lib or so.
In your specific case: For me
objdump -T maskrcnn_benchmark/csrc/custom_ops/build/libmaskrcnn_benchmark_customops.so | c++filt | grep demang
seems to indicate that the lib doesn't require it directly, so apparently whichever libtorch is used doesn't find a matching libc10/libcaffe2...

@xxradon

This comment has been minimized.

xxradon commented Nov 12, 2018

@t-vi Thanks , your suggestion is right,I can run trace_model.py without mistake and got the end_to_end_model.pt file.And when I tested trace_model.cpp,built was ok,but when I ran the demo there are some mistakes :
terminate called after throwing an instance of ' c10::Error ' what(): read_bytes == 8 ASSERT FAILED at ../caffe2/serialize/inline_container.h:182, please report a bug to PyTorch. Expected to read 8 bytes but got %llu bytes0 (read64BitIntegerLittleEndian at ../caffe2/serialize/inline_container.h:182) frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6a (0x7fb099312aaa in /home/shining/Projects/github-projects/pytorch-project/pytorch-build/torch/lib/tmp_install/lib/libc10.so) frame #1: <unknown function> + 0x5b1a86 (0x7fb0aefada86 in /home/shining/Projects/github-projects/pytorch-project/pytorch-build/torch/lib/tmp_install/lib/libtorch.so.1) frame #2: <unknown function> + 0x5b3aef (0x7fb0aefafaef in /home/shining/Projects/github-projects/pytorch-project/pytorch-build/torch/lib/tmp_install/lib/libtorch.so.1) frame #3: <unknown function> + 0x5b490f (0x7fb0aefb090f in /home/shining/Projects/github-projects/pytorch-project/pytorch-build/torch/lib/tmp_install/lib/libtorch.so.1) frame #4: <unknown function> + 0x5ad9cc (0x7fb0aefa99cc in /home/shining/Projects/github-projects/pytorch-project/pytorch-build/torch/lib/tmp_install/lib/libtorch.so.1) frame #5: torch::jit::load(std::istream&) + 0x2f4 (0x7fb0aefad2d4 in /home/shining/Projects/github-projects/pytorch-project/pytorch-build/torch/lib/tmp_install/lib/libtorch.so.1) frame #6: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x3a (0x7fb0aefad49a in /home/shining/Projects/github-projects/pytorch-project/pytorch-build/torch/lib/tmp_install/lib/libtorch.so.1) frame #7: main + 0x2d6 (0x409546 in /home/shining/Projects/github-projects/pytorch-project/cpp-pytorch/Release/traced_model) frame #8: __libc_start_main + 0xf0 (0x7fb095d4a830 in /lib/x86_64-linux-gnu/libc.so.6) frame #9: _start + 0x29 (0x409c39 in /home/shining/Projects/github-projects/pytorch-project/cpp-pytorch/Release/traced_model)

And I know why this error happen,because the libmaskrcnn_benchmark_customops.so is not added into link file,but I added LINK_DIRECTORIES and target_link_libraries in CMakelist.txt,it not worked either.
So can you give me a tutorial of your CMakelist.txt to run trace_model.cpp?
Thanks a lot!!!

@t-vi

This comment has been minimized.

t-vi commented Nov 12, 2018

Oh dear. I forgot to include that. I didn't actually use cmake, just

g++ traced_model.cpp -lopencv_core -lopencv_highgui -lopencv_imgcodecs -I /usr/local/lib/python3.6/dist-packages/torch/lib/include/  -L /usr/local/lib/python3.6/dist-packages/torch/lib/ -ltorch -lcaffe2 -lc10 -lopencv_imgproc -L ../maskrcnn_benchmark/csrc/custom_ops/build/ -lmaskrcnn_benchmark_customops

and then I called it with LD_LIBRARY_PATH=/usr/local/lib/python3.6/dist-packages/torch/lib/:../maskrcnn_benchmark/csrc/custom_ops/build/ ./a.out.

It less than fancy, sorry about that.

@xxradon

This comment has been minimized.

xxradon commented Nov 12, 2018

Oh dear. I forgot to include that. I didn't actually use cmake, just

g++ traced_model.cpp -lopencv_core -lopencv_highgui -lopencv_imgcodecs -I /usr/local/lib/python3.6/dist-packages/torch/lib/include/  -L /usr/local/lib/python3.6/dist-packages/torch/lib/ -ltorch -lcaffe2 -lc10 -lopencv_imgproc -L ../maskrcnn_benchmark/csrc/custom_ops/build/ -lmaskrcnn_benchmark_customops

and then I called it with LD_LIBRARY_PATH=/usr/local/lib/python3.6/dist-packages/torch/lib/:../maskrcnn_benchmark/csrc/custom_ops/build/ ./a.out.

It less than fancy, sorry about that.

Thanks for your reply,but I got the same mistake...Can you try use CMakelist.txt instead,or give a more specific tutorial?
Thanks a lot.

@t-vi

This comment has been minimized.

t-vi commented Nov 12, 2018

Can you actually read the .pt from torch?
(This is the exact error message you get when libtorch cannot open the file/it doesn't exist - I'll put up a PR for a better message.)

@Eric-Zhang1990

This comment has been minimized.

Eric-Zhang1990 commented Nov 13, 2018

@t-vi Now I can trace and save model, but when I run your trace_model.cpp file, it shows an error about nms, which is:
Starting /home/eric/Disk100G/githubProject/trace_model_cpp_test/build/traced_model...
terminate called after throwing an instance of 'load model ok
torch::jit::script::ErrorReport'
what():
Schema not found for node. File a bug report.
Node: %3556 : Dynamic = maskrcnn_benchmark::nms(%3550, %3554, %3555)

Input types:Dynamic, Dynamic, float
candidates were:
.
Can you tell me how I can fix it?
Thanks a lot.

@t-vi

This comment has been minimized.

t-vi commented Nov 13, 2018

@Eric-Zhang1990 Did you find and link to the libmaskrcnn_benchmark_customops.so (see the g++ above - I'm looking to provide cmake)...

@t-vi

This comment has been minimized.

t-vi commented Nov 13, 2018

With the latest set of changes you should get the custom ops built for you. The C++ demo now has a CMakeLists.txt, but it isn't built for now (for lack of a good place to put the binary).

t-vi added some commits Nov 13, 2018

Cleanup: nms
With recent JIT improvements, we can use the op directly
Show resolved Hide resolved setup.py Outdated
Python 2 compatibility
Thank you, Francisco, for the hint!

@t-vi t-vi referenced this pull request Nov 13, 2018

Closed

Jit load error msg #13894

self.spatial_scale = spatial_scale
self.sampling_ratio = sampling_ratio
def forward(self, input, rois):
if torch._C._get_tracing_state(): # we cannot currently trace through the autograd function
return torch.ops.maskrcnn_benchmark.roi_align_forward(

This comment has been minimized.

@fmassa

fmassa Nov 13, 2018

Contributor

question: do we need to register the backwards for the op in C++ then so that we can perform training properly using the same codepath?

This comment has been minimized.

@t-vi

t-vi Nov 13, 2018

The current obstacle here is that we cannot trace through the Python autograd function.
I'm not aware of a way to register the derivative so we could go through the op directly during training as well.

This comment has been minimized.

@fmassa

fmassa Nov 13, 2018

Contributor

Thanks! @goldsborough do you know if we can register derivatives for custom ops?

This comment has been minimized.

@t-vi

t-vi Nov 14, 2018

Peter responded elsewhere, and "not quite yet", but it is a to-do on his list.

@t-vi

This comment has been minimized.

t-vi commented Nov 13, 2018

So I have a cleanup todo in `box_head/inference.py. Done.

For the .forward elimination. We'd have to wait on @zdevito for advice whether we can patch torch.nn.Module tracing to allow our boxlist's by using _get_tensors on inputs and outputs (or we could do our own subclass).

The other changes to the main modules I saw are in the detailed comments above.

facebook-github-bot added a commit to pytorch/pytorch that referenced this pull request Nov 13, 2018

Jit load error msg (#13894)
Summary:
When loading a non-existant / non-openeable file, the current error message is
```
Expected to read 8 bytes but got %llu bytes0
```

This
- fixes two ASSERTM formatting calls (including the above),
- throws a more specific error message if the ifstream constructor sets `.fail`.

Here is someone apparently confused by the current message: facebookresearch/maskrcnn-benchmark#138 (comment)
Pull Request resolved: #13894

Differential Revision: D13043228

Pulled By: soumith

fbshipit-source-id: b348b482c66d5e420874ae6e101b834106b89e82
@xxradon

This comment has been minimized.

xxradon commented Nov 14, 2018

@t-vi Thanks for your contribution,I can run demo/traced_model.cpp and get the right result.But everything is cpu mode,I know the current method to build libmaskrcnn_benchmark_customops only with gcc ,so is there a way to run our serialized .pt file in GPU mode.
I have tried to add nvcc compilation to build libmaskrcnn_benchmark_customops.so,but failed with some mistakes,and I am working on it.

@t-vi

This comment has been minimized.

t-vi commented Nov 14, 2018

There isn't currently, but it should be not too hard. We'd need GPU support for unifying the code paths, so if you have it, I'd be happy to use it, if you don't, I'll try to get to it todayish.

t-vi added some commits Nov 14, 2018

More cleanup
- Pooler's merge_levels can move from C++ to JIT script, but
  the enumeration over module list needs to remain in tracing
  for now.
- Remove demo put_text custom op
- Leave but comment on the only slightly odd full_like use.
More cleanups for bounding_box
Use @fmassa's trick for only two clamp calls (thanks!) and comment on clamp vs. clamp_.

t-vi added a commit to t-vi/pytorch that referenced this pull request Nov 14, 2018

allow cooperative structured objects to be passed modules in tracing
Before this patch, the JIT does not allow Module's forward to take
structured objects.
This patch allows cooperative objects to do so.
Cooperative means:
- It has a method self._jit_unwrap() that returns (a list/tuple of)
  tensors. These are then used in _iter_tensors.
- It has a method self._jit_wrap(flattened_input) that takes
  (a list/tuple?) the flattened_unput (potentially more than it needs)
  and returns itself (updated) and the unconsumed flattened_inputs.
  This is then used in the _unflatten mechanism.

This is all it takes to permit maskrcnn-benchmark to use
its structured BoxList/ImageList types and trace it without calling
the .forward directly.

facebookresearch/maskrcnn-benchmark#138

t-vi added some commits Nov 14, 2018

Loose the .forward workaround
This means that you won't be able to trace the module without the
PyTorch PR #13961.
But it looks so clean!
@t-vi

This comment has been minimized.

t-vi commented Nov 14, 2018

Just so noone is disappointed too much: tracing on GPU won't work currently due to #13969 (at least).
I think we might just wait for that to be fixed tracing on GPU is supported. The obvious workarounds are scripting, but I'm sure it's good to fix PyTorch here and there if we can.

facebook-github-bot added a commit to pytorch/pytorch that referenced this pull request Nov 16, 2018

Allow cooperative structured objects to be passed modules in tracing (#…
…13961)

Summary:
Before this patch, the JIT does not allow Module's forward to take
structured objects.
This patch allows cooperative objects to do so.
Cooperative means:
- It has a method self._jit_unwrap() that returns (a list/tuple of)
  tensors. These are then used in _iter_tensors.
- It has a method self._jit_wrap(flattened_input) that takes
  (a list/tuple?) the flattened_unput (potentially more than it needs)
  and returns itself (updated) and the unconsumed flattened_inputs.
  This is then used in the _unflatten mechanism.

This is all it takes to permit maskrcnn-benchmark to use
its structured BoxList/ImageList types and trace it without calling
the .forward directly.
I'll push a model working with this patch in
facebookresearch/maskrcnn-benchmark#138

I must admit I haven't fully checked whether there are ONNX changes needed before it, too, can profit, but I would be hopeful that anything currently usable remains so.

fmassa zdevito

So the main downside that I'm aware of is that people will later want to use more elaborate mechanisms, but I think this could be done by just amending what wrap/unwrap are returning / consuming.
Pull Request resolved: #13961

Differential Revision: D13103927

Pulled By: soumith

fbshipit-source-id: 2cbc724cc4b53197388b662f75d9e601a495c087
@kleisauke

Added some code suggestions to solve the compatibility of tracing/scripting on Python 2.

@torch.jit.script
def my_paste_mask(mask, bbox, height: int, width: int, threshold: float=0.5, padding: int=1, contour: bool=True, rectangle: bool=False):

This comment has been minimized.

@kleisauke

kleisauke Nov 20, 2018

Suggested change Beta
def my_paste_mask(mask, bbox, height: int, width: int, threshold: float=0.5, padding: int=1, contour: bool=True, rectangle: bool=False):
def my_paste_mask(mask, bbox, height, width, threshold=0.5, padding=1, contour=True, rectangle=False):
# type: (Tensor, Tensor, int, int, float, int, bool, bool) -> Tensor
@torch.jit.script
def add_annotations(image, labels, scores, bboxes, class_names: str=','.join(coco_demo.CATEGORIES), color=torch.tensor([255, 255, 255], dtype=torch.long)):

This comment has been minimized.

@kleisauke

kleisauke Nov 20, 2018

Suggested change Beta
def add_annotations(image, labels, scores, bboxes, class_names: str=','.join(coco_demo.CATEGORIES), color=torch.tensor([255, 255, 255], dtype=torch.long)):
def add_annotations(image, labels, scores, bboxes, class_names=','.join(coco_demo.CATEGORIES), color=torch.tensor([255, 255, 255], dtype=torch.long)):
# type: (Tensor, Tensor, Tensor, Tensor, str, Tensor) -> Tensor
@torch.jit.script
def combine_masks(image, labels, masks, scores, bboxes, threshold: float=0.5, padding: int=1, contour: bool=True, rectangle: bool=False, palette=torch.tensor([33554431, 32767, 2097151])):

This comment has been minimized.

@kleisauke

kleisauke Nov 20, 2018

Suggested change Beta
def combine_masks(image, labels, masks, scores, bboxes, threshold: float=0.5, padding: int=1, contour: bool=True, rectangle: bool=False, palette=torch.tensor([33554431, 32767, 2097151])):
def combine_masks(image, labels, masks, scores, bboxes, threshold=0.5, padding=1, contour=True, rectangle=False, palette=torch.tensor([33554431, 32767, 2097151])):
# type: (Tensor, Tensor, Tensor, Tensor, Tensor, float, int, bool, bool, Tensor) -> Tensor
# this could be merged with the for loop at the end of Pooler.forward
# into a single script_method
@torch.jit.script
def merge_levels(levels, unmerged_results: List[torch.Tensor]):

This comment has been minimized.

@kleisauke

kleisauke Nov 20, 2018

Suggested change Beta
def merge_levels(levels, unmerged_results: List[torch.Tensor]):
def merge_levels(levels, unmerged_results):
# type: (Tensor, List[Tensor]) -> Tensor
@@ -0,0 +1,169 @@
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
import numpy

This comment has been minimized.

@kleisauke

kleisauke Nov 20, 2018

Suggested change Beta
import numpy
from __future__ import division
import numpy

This comment has been minimized.

@t-vi

t-vi Dec 11, 2018

Thanks for the reminder that there still is Python 2. :) I'll update the PR in due course.

@t-vi

This comment has been minimized.

t-vi commented Dec 11, 2018

pytorch/pytorch#14992 is keeping the scripted version of the merge_levels from working with 1.0. The obvious work-around would be to move it into a custom op (as it was in the initial version).

On the up side, we might be closer to tracing with GPU, as the creation tracing is fixed now.

I you think it makes sense to split out the move to custom ops and also the "uncontroversial" parts of the model changes into separate PRs, I'd do that.

@fmassa

This comment has been minimized.

Contributor

fmassa commented Dec 11, 2018

I think there are a few parts of this PR that can be sent separately and could be merged straightaway:

  • PR1: change signature of the functions so that they can be custom ops (float -> double etc)
  • PR2: out-of-place ops via torch.stack / torch.cat instead of in-place modification (although in-place modifications are now supposed to work on the JIT I believe)
  • PR3: minor refactoring detections_to_keep into a function for easier JIT
  • PR4: misc improvements to make the jit happier (add tuple for the return argument, etc).

Then we will be almost there for the JIT, and the misc changes can be discussed afterwards.
But I'd like to keep this PR open for now.

What do you think?

@t-vi

This comment has been minimized.

t-vi commented Dec 11, 2018

Sounds good, Thanks Francisco! I'll get those parts going and also check with inplace ops (even though I'm a bit sceptical as my repro for that pytorch/pytorch#14992 seems to also work with indexed assignments - while the ops probably work, there seems to be some optimization going on that might not be desirable for us).
Would you also take the switch to custom ops itself?

@fmassa

This comment has been minimized.

Contributor

fmassa commented Dec 11, 2018

Would you also take the switch to custom ops itself?
Not for the next few PRs, given that it's not really clear to me that it is working 100% for backwards computation, and I'd like to be cautious about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment