Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record_stream() for shifted view tensors #27371

Closed
wants to merge 3 commits into from

Conversation

@sublee
Copy link
Contributor

commented Oct 4, 2019

Issue: #27366

The address of a view tensor might be shifted from the head of the storage.

>>> x = torch.rand(10, 10, device=0, requires_grad=True)
>>> y = x[2:]
>>> hex(x.data_ptr())
'0x7f1b15c00000'
>>> hex(y.data_ptr())
'0x7f1b15c00050'

Currently, Tensor.record_stream() silently ignores shifted view tensors, because CUDACachingAllocator cannot find the block from the shifted address.

void recordStream(void* ptr, cuda::CUDAStream stream)
{
  if (ptr) {
    std::lock_guard<std::recursive_mutex> lock(mutex);
    Block* block = find_allocated_block(ptr);
    if (block) {
      ...
    }
    // 'block' is nullptr if 'ptr' is shifted.
  }
}

So we cannot protect shifted view tensor which is used to compute or copy in an arbitrary stream against unexpected reallocation. Once we call record_stream() on a tensor, our intention is to protect the storage behind the tensor against reallocation until all works in the stream finish. This rule should be consistent regardless of the type of tensors including the view.

We can retrieve the head of the address from any types of tensors by tensor.storage().data_ptr(). Hence, I've thought it's better to pass to recordStream() rather than tensor.data_ptr() for consistent behavior.

@sublee sublee force-pushed the sublee:record-stream-view branch from 243d15d to 7c61173 Oct 4, 2019
@soumith soumith requested a review from albanD Oct 4, 2019
# Create a new tensor and check its address.
# It should not be allocated to the storage above.
try_realloc = torch.cuda.FloatTensor([10, 10])
self.assertNotEqual(try_realloc.data_ptr(), data_ptr)

This comment has been minimized.

Copy link
@albanD

albanD Oct 4, 2019

Contributor

How reliably does this test fails on current master? Is it flaky or 100% reliable?

This comment has been minimized.

Copy link
@sublee

sublee Oct 4, 2019

Author Contributor

Thanks for the nice question!

Originally, I tried to make it more reliable with torch.cuda.empty_cache() at the beginning. Even though I always saw failure on the current master, I'm not sure that it's 100% reliable.

On second thought, it seems to be better to isolate block pools by a separate stream. I updated the test. And it will reliably fail without this patch unless CPU is slower than 50ms of GPU sleep.

@albanD
albanD approved these changes Oct 4, 2019
Copy link
Contributor

left a comment

LGTM
Thanks for the PR

Copy link
Contributor

left a comment

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@colesbury

This comment has been minimized.

Copy link
Member

commented Oct 4, 2019

This seems good, but I'm troubled that we no longer raise an error message when the pointer isn't found. This was introduced in June in #21449 (@ezyang @mrshenli)

@ezyang

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2019

I think the more correct fix for #21449 is to stop distributed from unconditionally calling recordStream on all cuda tensors without first checking they come from the caching allocator, see #21449 (comment) I see @mrshenli thumbs up'ed my comment but I guess the follow up here never happened.

@mrshenli

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2019

Created #27405 to track.

@sublee

This comment has been minimized.

Copy link
Contributor Author

commented Oct 5, 2019

As @colesbury mentioned, this silent failure was introduced in PyTorch 1.2.0. As 1.1.0, record_stream() on a shifted view tensor throws RuntimeError.

RuntimeError: invalid device pointer: %p0x7f30b7a00004

For other users, I suggest a compact workaround in Python-side:

tmp = tensor.new_empty([0]).set_(tensor.storage())
tmp.record_stream(stream)
@sublee

This comment has been minimized.

Copy link
Contributor Author

commented Oct 8, 2019

May I know the next process on this pull request?

@ezyang

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2019

@albanD don't forget to land it :)

@albanD

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2019

I was waiting for the result of the discussion and then was offline travelling. Happening now !

Copy link
Contributor

left a comment

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2019

@albanD merged this pull request in c1c176d.

@sublee

This comment has been minimized.

Copy link
Contributor Author

commented Oct 9, 2019

@albanD @ezyang Thanks for merging it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.