Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward/backward hooks for C++ torch::nn modules #25888

Open
yf225 opened this issue Sep 10, 2019 · 29 comments
Open

Forward/backward hooks for C++ torch::nn modules #25888

yf225 opened this issue Sep 10, 2019 · 29 comments
Labels
module: autograd Related to torch.autograd, and the autograd engine in general module: cpp Related to C++ API triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@yf225
Copy link
Contributor

yf225 commented Sep 10, 2019

馃殌 Feature

As part of the Python/C++ API parity work, we would like to provide forward/backward hooks for C++ torch::nn modules:
(left: Python API, right: C++ API)

  • torch.nn.Module.register_forward_pre_hook -> torch::nn::Module::register_forward_pre_hook
  • torch.nn.Module.register_forward_hook -> torch::nn::Module::register_forward_hook
  • torch.nn.Module.register_backward_hook -> torch::nn::Module::register_backward_hook

Possible Steps

  • Implement torch::utils::hooks::RemovableHandle in C++ API, which mirrors torch.utils.hooks.RemovableHandle in Python API.
  • Implement register_forward_pre_hook, register_forward_hook and register_backward_hook methods for torch::nn::Module.
    • We might need to add _backward_hooks, _forward_pre_hooks and _forward_hooks to torch::nn::Module (torch/csrc/api/include/torch/nn/module.h).
  • Add appropriate C++ tests for the new API, similar to Python tests.

cc @ezyang @ssnl @albanD @zou3519 @yf225

@yf225 yf225 added module: cpp Related to C++ API module: autograd Related to torch.autograd, and the autograd engine in general labels Sep 10, 2019
@yf225
Copy link
Contributor Author

yf225 commented Sep 10, 2019

cc. @pbelevich

@albanD
Copy link
Collaborator

albanD commented Sep 10, 2019

Hi,

You might want to wait for the backward_hooks for the python version to be fixed and then only implement the equivalent of the fixed version in cpp?

@yf225
Copy link
Contributor Author

yf225 commented Sep 10, 2019

@albanD That sounds great. Do you know if there is a tracking issue for fixing the backward hooks for the Python version?

@ezyang
Copy link
Contributor

ezyang commented Sep 10, 2019

Note that @malvika2147 implemented hooks on variables.

@albanD
Copy link
Collaborator

albanD commented Sep 10, 2019

@yf225 I would say #598 is the base issue for backward hooks on nn.Modules.

@fmassa fmassa added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 13, 2019
@ssbotelh
Copy link

The authors of this paper (https://arxiv.org/abs/1706.02677) claim to achieve great scaling on ResNet50 training by, among other things, overlapping gradient computation and communication:

"as soon as the gradient for a layer is computed, it is aggregated across workers, while gradient computation for the next layer continues".

If I understand it correctly, this would only be possible using something like register_backward_hook on modules. Since it seems like it'll be a while before this feature is fixed in Python (#598 and #12331) and makes its way to libtorch, can anyone think of an alternative way of being notified when the gradient computation is done for a module? Thanks in advance.

@albanD
Copy link
Collaborator

albanD commented Oct 21, 2019

Hi,

I don't really see why the hooks on Module could help you do that.
But both DataParallel and DistributedDataParallel already overlap computation and communication if I remember correctly.

@ssbotelh
Copy link

ssbotelh commented Oct 21, 2019

Hi @albanD, thanks for your comment.

DataParallel only works on GPUs, which is not what I'm interested in. Besides, it still doesn't seem to work correctly on C++ libtorch, probably due to lack of NCCL integration -- see last comment in #18837.

I'm curious about your first statement. If a module hook can notify me that the gradient computation is done for layer N, I can dispatch the task of communicating that gradient among processes to a separate thread, while the main thread continues with the gradient for layer N+1. I assume that's how the authors of the above paper must have done in order to overlap computation and communication.

@albanD
Copy link
Collaborator

albanD commented Oct 21, 2019

Hi,

The same thing can be done using hooks on Tensors with register_hook(). These work as expected.

@ssbotelh
Copy link

Is this available on the C++ API? Could you please point me to the documentation, I haven't found any. Thanks!

@albanD
Copy link
Collaborator

albanD commented Oct 21, 2019

The PR for it is here: #26568

@yf225
Copy link
Contributor Author

yf225 commented Nov 21, 2019

@albanD Curious is register_forward_pre_hook also blocked by the need to fix backward_hooks, or are they independent of each other? Thanks!

@albanD
Copy link
Collaborator

albanD commented Nov 21, 2019

Hi,

No they are independent.
Both forward_hook and forward_pre_hook are easy to do and not problematic :)

@ly1231cn
Copy link

@albanD Is register_forward_pre_hook available in the nightly build?

@albanD
Copy link
Collaborator

albanD commented Dec 26, 2019

I don't think it has been added to the cpp api yet (only available in python). cc @yf225 ?

@nuka137
Copy link
Contributor

nuka137 commented Jun 22, 2020

Is this feature discarded?
I need register_forward_hook in order to output model summary (number of parameter, tensor shape, etc...).
It helps us to debug model easily.
There is already Python version which uses torch.nn.Module.register_forward_hook.
I think similar API is also needed in C++.

@albanD
Copy link
Collaborator

albanD commented Jun 22, 2020

It is not. We did not had time to get around to do it.
We would be very happy to help prepare and accept a PR adding this feature though !

@nuka137
Copy link
Contributor

nuka137 commented Jun 22, 2020

@albanD

Thanks for your reply.
I'm interested in adding register_forward_hook to C++ API.
BTW, register_backward_hook seems to be blocked by #25888 (comment) .
So, is it acceptable if I implement only register_forward_hook ?

@albanD
Copy link
Collaborator

albanD commented Jun 22, 2020

Yes only forward hook and pre_forward hook would be sufficient.
More work needs to be done for the backward hooks I'm afraid.

@nuka137
Copy link
Contributor

nuka137 commented Jun 30, 2020

From works for a week, I found it is difficult to expose hook function as simple as Python.

I think we need to add hook function call to this method .
But inputs/outputs of forward function are not consistent, so we need to expose API arguments explicitly (or use template?).
(ex. AMSOutput which is an output of AdaptiveLogSoftmaxWithLoss while Tensor which is an output of DropoutImpl and so on)

This hook function should be similar to Python hook function, but I could not have good idea to realize this.
I'm appreciate to give me an advice about it.

@albanD
Copy link
Collaborator

albanD commented Jun 30, 2020

My guess is that some template magic will be needed here yes.
The main features that are needed here are:

  • Call the user hooks before and after the forward call. One takes the Args... as input and the other one takes torch::detail::return_type_of_forward_t<Contained, Args...>.
  • Allow the user to change the values in these hooks. This one I'm not sure how to do.

cc @yf225 do you have an idea of how this can be done?

@yf225
Copy link
Contributor Author

yf225 commented Jun 30, 2020

I had some draft PR that might be able to do it: #30484. Haven't got it to the finish line, but maybe can serve as inspiration :) also cc. @glaringlee

@glaringlee
Copy link
Contributor

glaringlee commented Jun 30, 2020

@yf225 Thanks, I'll take a look at it and continue working on it.
@nuka137
I will prioritize this and hope we can finish this pre fwd hook later this year. We currently prioritize the work for adding transformer layer into pytorch c++ apis.

@nuka137
Copy link
Contributor

nuka137 commented Jun 30, 2020

@glaringlee

Thanks! I hope this feature to land in future.

@FlyinTeller
Copy link

@glaringlee @yf225 Is there any plan to finish this functionality in the near future? I would much like to use spectral normalization from the C++ API, but this is depending on register_forward_pre_hook to be available

@bjaeger1
Copy link

is there any progress on the implementation of register_forward_hook in the C++ API? I would need it to extract feature maps of certain layers of a trained model.

@spiegelball
Copy link

I am also very curious about this.

@ezyang
Copy link
Contributor

ezyang commented Feb 10, 2023

There are currently no plans to add this from core team. If someone can come up with a short PR that adds the feature I can help review it, but it may not be so easy to play the same tricks you can do in Python in C++

@Tihaw
Copy link

Tihaw commented Jun 28, 2023

Does backward hooks supported by Torchscript module by now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: autograd Related to torch.autograd, and the autograd engine in general module: cpp Related to C++ API triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

13 participants