-
Notifications
You must be signed in to change notification settings - Fork 22.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
caffe2: fix PinnedCPUAllocator cudaHostRegister() leak #16340
Conversation
Thanks a lot, and nice catch. Just fresh in master, there is a much more compact way to do an equivalent thing: on DataPtr use the method |
cc @jerryzh168 |
Thank you for having a look. Sure; we'll try a |
Nice catch! Thanks! @hartb |
caffe2/core/context_gpu.h
Outdated
@@ -357,14 +357,13 @@ struct CAFFE2_CUDA_API PinnedCPUAllocator final : public at::Allocator { | |||
at::DataPtr data_ptr; | |||
std::lock_guard<std::mutex> lock(CUDAContext::mutex()); | |||
if (IsNUMAEnabled()) { | |||
data_ptr = baseAllocator_.allocate(nbytes); | |||
data = data_ptr.get(); | |||
data = baseAllocator_.naked_allocate(nbytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we are in the process of merge pytorch and caffe2 allocator and we'll be using Allocator*
for baseAllocator_
, see: https://github.com/pytorch/pytorch/pull/14517/files#diff-6286b32ea83ee15c66db129928f27c42R343
@jerryzh168 A complication with the So I think for pinned allocator to use compare_exchange, it has to first know about ReportAndDelete() (as well as just Delete()). And then in the ReportAndDelete() case has to choose to either exchange the deleter (leaking Default's Reporter.New()) or not exhange the deleter (leaking Pinned's cudaHostRegister()). Do I have that right? |
Hmm, yes, you're right. I guess you have two options:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@hartb Are you planning to look at this, or should one of us adopt this patch? Thanks! |
I hope to update the PR with the suggestion above. I'm thinking maybe use It's taking a bit longer than I'd hoped to swap my build setup from 1.0.0 to master; I'll let you know if that derails me for some reason. |
@ezyang Here's a proposed fix (one commit) implementing the above based on PR 14517 (as @jerryzh168 ) mentioned above: https://github.com/hartb/pytorch/tree/hartb-pr14517-add Would you like to pick this over to that PR, or should modify that to base on master via this PR? Note my worry above about leaking or breaking the Reporting case isn't an issue. The Pinned Delete() calls the base Allocator's delete to finish the job, so we get base/Default allocator clean-up that way. |
@hartb Feel free to force push! Sorry about the delay responding. |
Will update this PR once tested the fix rebased to master. |
Allocations returned by the PinnedCPUAllocator must carry that Allocator's Delete() function or cudaHostRegistrations() made by the pinned allocator will be leaked. Ensure that in the NUMA case by swapping in the pinned allocator's Delete() in place of the baseAllocator_'s deleter. The swap should succeed unless something else already swapped the deleter (in which case developer attention is required). In the swap case, the pinned allocator's Delete() will call baseAllocator_'s deleter explicitly, so any tear down actions to be done there are preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet and simple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
In the NUMA case, PinnedCPUAllocator's allocate() would return a
DataPtr constructed by DefaultCPUAllocator, which would reference
the Default... Delete() rather than the Pinned... Delete(). That
meant Pinned... Delete() would never run, so cudaHostUnregister()
would never be called when regions were freed.
See: #16280
This change adds a 'naked_allocate()' method to the Default allocator
that just returns a pointer to the allocated memory rather than
wrapping it in a DataPtr. Pinned allocator uses that then constructs
a DataPtr with reference to its own Delete().