Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any progress for pytorch 1.5? #6

Closed
LuChengTHU opened this issue Jul 19, 2020 · 4 comments
Closed

Any progress for pytorch 1.5? #6

LuChengTHU opened this issue Jul 19, 2020 · 4 comments

Comments

@LuChengTHU
Copy link

Hi, do you have any ideas for running the code for pytorch v1.5 with data parallel?

@jerrybai1995
Copy link
Member

Yup, I have pushed a branch named "pytorch-1.5" for the repo. Please pull the repo, do git checkout pytorch-1.5 and train the model there. Also, see the updated README on what's been changed.

Let me know if it works!

@LuChengTHU
Copy link
Author

Thanks! And I'm confused about the 'func_copy' model. It seems that we need to use 2x GPU memory because of this implementation. Is there a more efficient way of implementing the backward method?

@jerrybai1995
Copy link
Member

Yes, that is a design choice due to PyTorch's nn.DataParallel. If you use only 1 GPU (i.e., no nn.DataParallel), then you are able to do the actual implicit differentiation all in the backward() in deq.py and without func_copy. You can simply do it through one layer, as we hoped.

However, the weird thing we found was, once nn.DataParallel was invoked, the parameter gradients on the replica will all vanish. In other words, the gradients computed in the backward() will disappear. This happened in PyTorch 1.4, I'm not so sure about 1.5. But anyway, that was the rationale behind this design choice; we found no good choice but to leave a func_copy there for the Jacobian-vector product part computation.

Indeed, once we are able to solve this issue, we will have much better memory efficiency than the ones reported in our paper.

@LuChengTHU
Copy link
Author

Thanks! I'm waiting for the better implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants