-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use storage.cpu() for moving storage to CPU in serialization. #46028
The head ref may contain hidden characters: "we_\u2764\uFE0F_stas_and_\u{1F917}"
Conversation
As reported in pytorch#46020, something seems to go wrong with the storage._write_file method used with a BytesIO and a GPU buffer. Given that we were going to create the BytesIO intermediate buffer anyway, we might as well use storage.cpu() to move the storage to the CPU. This appears to work better. This is a hot fix, further investigation is highly desirable.
I could be wrong, but on further inspection it would seem that the new method might even be a bit of an optimization: going via write_file incurs an additional copy compared to the method used here: In writeFileRaw the cudaMemcpy to a new cpu buffer and then another by using BytesIO.write, while now we just copy to CPU. |
I added commentary on the root problem to the bug report. This indicates that I indeed fix the bug here and as discussed above, I think this saves a memory copy, too, so I'd suggest to use this fix. |
Codecov Report
@@ Coverage Diff @@
## master #46028 +/- ##
=======================================
Coverage 68.28% 68.29%
=======================================
Files 410 410
Lines 53609 53606 -3
=======================================
Hits 36608 36608
+ Misses 17001 16998 -3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seemethere has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
really nice digging! I don't really see how this avoids the copy -- isn't the writeFile buffer copy just moved to the |
For a reproduction -- maybe we should just fix the error checking of |
I haven't verified this yet, but I'm guessing #46036 will blow up the serialization tests, so that and this together should convince us that the issue is fixed. |
Yeah, but we skip the other copy from cpu to writing into ByteIO. I'm reasonably certain this is the exact right thing, but you could also just add the False.
I had this fix before doing the analysis posted to the issue.
|
The thing I was wondering about is the change of assumption, i.e. now we assume that |
Good catch, but based on the test results, anything that has serialization tests would have .cpu(). |
@gchanan keep or trash this? I still think it is superior to eliminate the BytesIO additional copy, but obviously your fix might be more conservative. |
Well, looks like we keep the copying. |
@t-vi I think we should merge this (I wanted to get the more conservative one in first so we could cherry-pick it to the release branch). |
do you want to fix the conflict or should I? |
OK, thanks. I'll update the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gchanan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Nice, thanks for this! |
As reported in #46020, something seems to go wrong with the storage._write_file method used with a BytesIO and a GPU buffer.
Given that we were going to create the intermediate buffer (currently via BytesIO) anyway, we might as well use storage.cpu() to move the storage to the CPU. This appears to work better.
This is a hot fix, further investigation is highly desirable. In particular, I don't have a reproducing test to show.
Fixes #46020