-
Notifications
You must be signed in to change notification settings - Fork 22.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
record_stream() for shifted view tensors #27371
Conversation
243d15d
to
7c61173
Compare
# Create a new tensor and check its address. | ||
# It should not be allocated to the storage above. | ||
try_realloc = torch.cuda.FloatTensor([10, 10]) | ||
self.assertNotEqual(try_realloc.data_ptr(), data_ptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How reliably does this test fails on current master? Is it flaky or 100% reliable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice question!
Originally, I tried to make it more reliable with torch.cuda.empty_cache()
at the beginning. Even though I always saw failure on the current master, I'm not sure that it's 100% reliable.
On second thought, it seems to be better to isolate block pools by a separate stream. I updated the test. And it will reliably fail without this patch unless CPU is slower than 50ms of GPU sleep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
I think the more correct fix for #21449 is to stop distributed from unconditionally calling recordStream on all cuda tensors without first checking they come from the caching allocator, see #21449 (comment) I see @mrshenli thumbs up'ed my comment but I guess the follow up here never happened. |
Created #27405 to track. |
As @colesbury mentioned, this silent failure was introduced in PyTorch 1.2.0. As 1.1.0,
For other users, I suggest a compact workaround in Python-side: tmp = tensor.new_empty([0]).set_(tensor.storage())
tmp.record_stream(stream) |
May I know the next process on this pull request? |
@albanD don't forget to land it :) |
I was waiting for the result of the discussion and then was offline travelling. Happening now ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Issue: pytorch#27366 The address of a view tensor might be shifted from the head of the storage. ```python >>> x = torch.rand(10, 10, device=0, requires_grad=True) >>> y = x[2:] >>> hex(x.data_ptr()) '0x7f1b15c00000' >>> hex(y.data_ptr()) '0x7f1b15c00050' ``` Currently, `Tensor.record_stream()` silently ignores shifted view tensors, because `CUDACachingAllocator` cannot find the block from the shifted address. ```c++ void recordStream(void* ptr, cuda::CUDAStream stream) { if (ptr) { std::lock_guard<std::recursive_mutex> lock(mutex); Block* block = find_allocated_block(ptr); if (block) { ... } // 'block' is nullptr if 'ptr' is shifted. } } ``` So we cannot protect shifted view tensor which is used to compute or copy in an arbitrary stream against unexpected reallocation. Once we call `record_stream()` on a tensor, our intention is to protect the storage behind the tensor against reallocation until all works in the stream finish. This rule should be consistent regardless of the type of tensors including the view. We can retrieve the head of the address from any types of tensors by `tensor.storage().data_ptr()`. Hence, I've thought it's better to pass to `recordStream()` rather than `tensor.data_ptr()` for consistent behavior. Pull Request resolved: pytorch#27371 Reviewed By: ezyang Differential Revision: D17768558 Pulled By: albanD fbshipit-source-id: 7705f52b0177625168edb6f71c07a029df471bc5
Issue: #27366
The address of a view tensor might be shifted from the head of the storage.
Currently,
Tensor.record_stream()
silently ignores shifted view tensors, becauseCUDACachingAllocator
cannot find the block from the shifted address.So we cannot protect shifted view tensor which is used to compute or copy in an arbitrary stream against unexpected reallocation. Once we call
record_stream()
on a tensor, our intention is to protect the storage behind the tensor against reallocation until all works in the stream finish. This rule should be consistent regardless of the type of tensors including the view.We can retrieve the head of the address from any types of tensors by
tensor.storage().data_ptr()
. Hence, I've thought it's better to pass torecordStream()
rather thantensor.data_ptr()
for consistent behavior.