Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record_stream() for shifted view tensors #27371

Closed
wants to merge 3 commits into from

Conversation

sublee
Copy link
Contributor

@sublee sublee commented Oct 4, 2019

Issue: #27366

The address of a view tensor might be shifted from the head of the storage.

>>> x = torch.rand(10, 10, device=0, requires_grad=True)
>>> y = x[2:]
>>> hex(x.data_ptr())
'0x7f1b15c00000'
>>> hex(y.data_ptr())
'0x7f1b15c00050'

Currently, Tensor.record_stream() silently ignores shifted view tensors, because CUDACachingAllocator cannot find the block from the shifted address.

void recordStream(void* ptr, cuda::CUDAStream stream)
{
  if (ptr) {
    std::lock_guard<std::recursive_mutex> lock(mutex);
    Block* block = find_allocated_block(ptr);
    if (block) {
      ...
    }
    // 'block' is nullptr if 'ptr' is shifted.
  }
}

So we cannot protect shifted view tensor which is used to compute or copy in an arbitrary stream against unexpected reallocation. Once we call record_stream() on a tensor, our intention is to protect the storage behind the tensor against reallocation until all works in the stream finish. This rule should be consistent regardless of the type of tensors including the view.

We can retrieve the head of the address from any types of tensors by tensor.storage().data_ptr(). Hence, I've thought it's better to pass to recordStream() rather than tensor.data_ptr() for consistent behavior.

@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: pybind Related to our Python bindings / interactions with other Python libraries labels Oct 4, 2019
# Create a new tensor and check its address.
# It should not be allocated to the storage above.
try_realloc = torch.cuda.FloatTensor([10, 10])
self.assertNotEqual(try_realloc.data_ptr(), data_ptr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How reliably does this test fails on current master? Is it flaky or 100% reliable?

Copy link
Contributor Author

@sublee sublee Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice question!

Originally, I tried to make it more reliable with torch.cuda.empty_cache() at the beginning. Even though I always saw failure on the current master, I'm not sure that it's 100% reliable.

On second thought, it seems to be better to isolate block pools by a separate stream. I updated the test. And it will reliably fail without this patch unless CPU is slower than 50ms of GPU sleep.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks for the PR

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@colesbury
Copy link
Member

This seems good, but I'm troubled that we no longer raise an error message when the pointer isn't found. This was introduced in June in #21449 (@ezyang @mrshenli)

@ezyang
Copy link
Contributor

ezyang commented Oct 4, 2019

I think the more correct fix for #21449 is to stop distributed from unconditionally calling recordStream on all cuda tensors without first checking they come from the caching allocator, see #21449 (comment) I see @mrshenli thumbs up'ed my comment but I guess the follow up here never happened.

@mrshenli
Copy link
Contributor

mrshenli commented Oct 4, 2019

Created #27405 to track.

@sublee
Copy link
Contributor Author

sublee commented Oct 5, 2019

As @colesbury mentioned, this silent failure was introduced in PyTorch 1.2.0. As 1.1.0, record_stream() on a shifted view tensor throws RuntimeError.

RuntimeError: invalid device pointer: %p0x7f30b7a00004

For other users, I suggest a compact workaround in Python-side:

tmp = tensor.new_empty([0]).set_(tensor.storage())
tmp.record_stream(stream)

@sublee
Copy link
Contributor Author

sublee commented Oct 8, 2019

May I know the next process on this pull request?

@ezyang
Copy link
Contributor

ezyang commented Oct 8, 2019

@albanD don't forget to land it :)

@albanD
Copy link
Collaborator

albanD commented Oct 8, 2019

I was waiting for the result of the discussion and then was offline travelling. Happening now !

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@albanD merged this pull request in c1c176d.

@sublee
Copy link
Contributor Author

sublee commented Oct 9, 2019

@albanD @ezyang Thanks for merging it!

thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Issue: pytorch#27366

The address of a view tensor might be shifted from the head of the storage.

```python
>>> x = torch.rand(10, 10, device=0, requires_grad=True)
>>> y = x[2:]
>>> hex(x.data_ptr())
'0x7f1b15c00000'
>>> hex(y.data_ptr())
'0x7f1b15c00050'
```

Currently, `Tensor.record_stream()` silently ignores shifted view tensors, because `CUDACachingAllocator` cannot find the block from the shifted address.

```c++
void recordStream(void* ptr, cuda::CUDAStream stream)
{
  if (ptr) {
    std::lock_guard<std::recursive_mutex> lock(mutex);
    Block* block = find_allocated_block(ptr);
    if (block) {
      ...
    }
    // 'block' is nullptr if 'ptr' is shifted.
  }
}
```

So we cannot protect shifted view tensor which is used to compute or copy in an arbitrary stream against unexpected reallocation. Once we call `record_stream()` on a tensor, our intention is to protect the storage behind the tensor against reallocation until all works in the stream finish. This rule should be consistent regardless of the type of tensors including the view.

We can retrieve the head of the address from any types of tensors by `tensor.storage().data_ptr()`. Hence, I've thought it's better to pass to `recordStream()` rather than `tensor.data_ptr()` for consistent behavior.
Pull Request resolved: pytorch#27371

Reviewed By: ezyang

Differential Revision: D17768558

Pulled By: albanD

fbshipit-source-id: 7705f52b0177625168edb6f71c07a029df471bc5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: pybind Related to our Python bindings / interactions with other Python libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants