-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Move io_same_device hook to before attach_align_device hook on cpu_offload and disk_offload. #768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…fload and disk_offload. That way we can keep the changes on forward method for the whole module without deleting the hook we want to keep: the one with execution device and configurations on how to move the tensors between devices.
|
The documentation is not available anymore as the PR was closed or merged. |
|
The fix is not exactly right: by doing so, the hook that ensures the input and output of the model are on the same device is now erased. In your code sample in #767, since
This is slightly more advanced than the current PR, so let me know if you'd prefer for me to do it :-) |
|
@sgugger I would like to try if that's ok to you. What do you think of creating an |
|
That works for me, though the name of the argument could simply be |
|
@sgugger |
|
@sgugger it is ready for review, I've also added the tests. |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks! Left a couple of nits, and I think you should still put the hook with io first: just tested locally and we still have the same issue of net(x) being on the wrong device since it runs second and the input was already moved.
@sgugger I've just addressed your nits and moved the io hook to the top. Thanks for the review! I've tested it locally on the snippet of the bug report and it brings the tensor back to CPU after the inference. Thanks! |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks! Re-tested locally and got the expected results for the code sample you shared in the issue.
|
@sgugger, there is a test step that failed due to an http error when installing a lib. I've created and empty commit to try running it again. |
Solves #767.
Moves the AlignDevicesHook with
io_same_devicefromaccelerate.cpu_offloadandaccelerate.disk_offloadto before theattach_align_device_hook.That way we can keep the changes on forward method for the whole module without deleting the hook we want to keep: the one with execution device and configurations on how to move the tensors between devices.