record_stream() for shifted view tensors #27371

sublee · 2019-10-04T16:39:41Z

The address of a view tensor might be shifted from the head of the storage.

>>> x = torch.rand(10, 10, device=0, requires_grad=True)
>>> y = x[2:]
>>> hex(x.data_ptr())
'0x7f1b15c00000'
>>> hex(y.data_ptr())
'0x7f1b15c00050'

Currently, Tensor.record_stream() silently ignores shifted view tensors, because CUDACachingAllocator cannot find the block from the shifted address.

void recordStream(void* ptr, cuda::CUDAStream stream)
{
  if (ptr) {
    std::lock_guard<std::recursive_mutex> lock(mutex);
    Block* block = find_allocated_block(ptr);
    if (block) {
      ...
    }
    // 'block' is nullptr if 'ptr' is shifted.
  }
}

So we cannot protect shifted view tensor which is used to compute or copy in an arbitrary stream against unexpected reallocation. Once we call record_stream() on a tensor, our intention is to protect the storage behind the tensor against reallocation until all works in the stream finish. This rule should be consistent regardless of the type of tensors including the view.

We can retrieve the head of the address from any types of tensors by tensor.storage().data_ptr(). Hence, I've thought it's better to pass to recordStream() rather than tensor.data_ptr() for consistent behavior.

albanD · 2019-10-04T16:58:22Z

test/test_cuda.py

+        # Create a new tensor and check its address.
+        # It should not be allocated to the storage above.
+        try_realloc = torch.cuda.FloatTensor([10, 10])
+        self.assertNotEqual(try_realloc.data_ptr(), data_ptr)


How reliably does this test fails on current master? Is it flaky or 100% reliable?

Thanks for the nice question!

Originally, I tried to make it more reliable with torch.cuda.empty_cache() at the beginning. Even though I always saw failure on the current master, I'm not sure that it's 100% reliable.

On second thought, it seems to be better to isolate block pools by a separate stream. I updated the test. And it will reliably fail without this patch unless CPU is slower than 50ms of GPU sleep.

albanD

LGTM
Thanks for the PR

facebook-github-bot

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

colesbury · 2019-10-04T20:23:09Z

This seems good, but I'm troubled that we no longer raise an error message when the pointer isn't found. This was introduced in June in #21449 (@ezyang @mrshenli)

ezyang · 2019-10-04T20:54:53Z

I think the more correct fix for #21449 is to stop distributed from unconditionally calling recordStream on all cuda tensors without first checking they come from the caching allocator, see #21449 (comment) I see @mrshenli thumbs up'ed my comment but I guess the follow up here never happened.

mrshenli · 2019-10-04T21:15:36Z

Created #27405 to track.

sublee · 2019-10-05T04:25:25Z

As @colesbury mentioned, this silent failure was introduced in PyTorch 1.2.0. As 1.1.0, record_stream() on a shifted view tensor throws RuntimeError.

RuntimeError: invalid device pointer: %p0x7f30b7a00004

For other users, I suggest a compact workaround in Python-side:

tmp = tensor.new_empty([0]).set_(tensor.storage())
tmp.record_stream(stream)

sublee · 2019-10-08T06:52:26Z

May I know the next process on this pull request?

ezyang · 2019-10-08T14:48:17Z

@albanD don't forget to land it :)

albanD · 2019-10-08T18:33:11Z

I was waiting for the result of the discussion and then was offline travelling. Happening now !

facebook-github-bot

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-10-08T20:35:11Z

@albanD merged this pull request in c1c176d.

sublee · 2019-10-09T04:24:38Z

@albanD @ezyang Thanks for merging it!

Summary: Issue: pytorch#27366 The address of a view tensor might be shifted from the head of the storage. ```python >>> x = torch.rand(10, 10, device=0, requires_grad=True) >>> y = x[2:] >>> hex(x.data_ptr()) '0x7f1b15c00000' >>> hex(y.data_ptr()) '0x7f1b15c00050' ``` Currently, `Tensor.record_stream()` silently ignores shifted view tensors, because `CUDACachingAllocator` cannot find the block from the shifted address. ```c++ void recordStream(void* ptr, cuda::CUDAStream stream) { if (ptr) { std::lock_guard<std::recursive_mutex> lock(mutex); Block* block = find_allocated_block(ptr); if (block) { ... } // 'block' is nullptr if 'ptr' is shifted. } } ``` So we cannot protect shifted view tensor which is used to compute or copy in an arbitrary stream against unexpected reallocation. Once we call `record_stream()` on a tensor, our intention is to protect the storage behind the tensor against reallocation until all works in the stream finish. This rule should be consistent regardless of the type of tensors including the view. We can retrieve the head of the address from any types of tensors by `tensor.storage().data_ptr()`. Hence, I've thought it's better to pass to `recordStream()` rather than `tensor.data_ptr()` for consistent behavior. Pull Request resolved: pytorch#27371 Reviewed By: ezyang Differential Revision: D17768558 Pulled By: albanD fbshipit-source-id: 7705f52b0177625168edb6f71c07a029df471bc5

pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: pybind Related to our Python bindings / interactions with other Python libraries labels Oct 4, 2019

sublee added 2 commits October 5, 2019 01:42

Test record_stream() on a shifted view tensor

0d2fcc4

Use storage address for record_stream()

7c61173

sublee force-pushed the record-stream-view branch from 243d15d to 7c61173 Compare October 4, 2019 16:42

soumith requested a review from albanD October 4, 2019 16:43

albanD reviewed Oct 4, 2019

View reviewed changes

Make test reliable using isolated stream

06ff65a

albanD approved these changes Oct 4, 2019

View reviewed changes

facebook-github-bot reviewed Oct 4, 2019

View reviewed changes

facebook-github-bot reviewed Oct 8, 2019

View reviewed changes

facebook-github-bot closed this in c1c176d Oct 8, 2019

facebook-github-bot added the merged label Oct 8, 2019

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

record_stream() for shifted view tensors #27371

record_stream() for shifted view tensors #27371

sublee commented Oct 4, 2019

albanD Oct 4, 2019

sublee Oct 4, 2019 •

edited

albanD left a comment

facebook-github-bot left a comment

colesbury commented Oct 4, 2019

ezyang commented Oct 4, 2019

mrshenli commented Oct 4, 2019

sublee commented Oct 5, 2019

sublee commented Oct 8, 2019

ezyang commented Oct 8, 2019

albanD commented Oct 8, 2019

facebook-github-bot left a comment

facebook-github-bot commented Oct 8, 2019

sublee commented Oct 9, 2019

record_stream() for shifted view tensors #27371

record_stream() for shifted view tensors #27371

Conversation

sublee commented Oct 4, 2019

albanD Oct 4, 2019

Choose a reason for hiding this comment

sublee Oct 4, 2019 • edited

Choose a reason for hiding this comment

albanD left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

colesbury commented Oct 4, 2019

ezyang commented Oct 4, 2019

mrshenli commented Oct 4, 2019

sublee commented Oct 5, 2019

sublee commented Oct 8, 2019

ezyang commented Oct 8, 2019

albanD commented Oct 8, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 8, 2019

sublee commented Oct 9, 2019

sublee Oct 4, 2019 •

edited