Implement Tensor.new_empty_strided(sizes, strides, *, dtype, device, requires_grad) #47225

zou3519 · 2020-11-02T21:16:09Z

Stack from ghstack:

Batched gradient support for view+inplace operations #47227 Batched gradient support for view+inplace operations
Add batching rule for torch.clone(tensor, torch.contiguous_format) #47365 Add batching rule for torch.clone(tensor, torch.contiguous_format)
as_strided batching rule #47364 as_strided batching rule
Batching rule for Tensor.new_empty_strided #47226 Batching rule for Tensor.new_empty_strided
*Implement Tensor.new_empty_strided(sizes, strides, *, dtype, device, requires_grad) #47225 Implement Tensor.new_empty_strided(sizes, strides, , dtype, device, requires_grad)

Summary

This PR implements Tensor.new_empty_strided. Many of our torch.* factory
functions have a corresponding new_* method (e.g., torch.empty and
torch.new_empty), but there is no corresponding method to
torch.empty_strided. This PR adds one.

Motivation

The real motivation behind this is for vmap to be able to work through
CopySlices. CopySlices shows up a lot in double backwards because a lot
of view functions have backward formulas that perform view+inplace.

pytorch/torch/csrc/autograd/functions/tensor.cpp

Lines 78 to 106 in e0fd590

    
           auto result = at::empty_strided(base.sizes(), base.strides(), grad.options()); 
        
           result.copy_(grad); 
        
           at::Tensor grad_slice; 
        
           if (view_fn.has_value()) { 
        
             auto fn = view_fn.value(); 
        
             grad_slice = fn(result); 
        
           } else { 
        
             auto offset = view.storage_offset() - base.storage_offset(); 
        
             grad_slice = result.as_strided(view.sizes(), view.strides(), offset); 
        
           } 
        
           // TODO: We clone grad_slice because we modify it below and "fn" might save 
        
           // it for the backward of res. We might be able to avoid the clone() if 
        
           // double-backprop is disabled. 
        
           auto res = (*fn)({ grad_slice.clone(at::MemoryFormat::Contiguous) }); 
        
           variable_list grad_inputs(num_outputs()); 
        
           for (size_t i = 0; i < res.size(); i++) { 
        
             if (should_compute_output(i)) { 
        
               AT_ASSERT(res[i].defined()); 
        
               if (i == 0) { 
        
                 grad_slice.copy_(res[i]); 
        
                 grad_inputs[i] = std::move(result); // NOLINT(bugprone-use-after-move) 
        
               } else { 
        
                 grad_inputs[i] = std::move(res[i]); 
        
               } 
        
             } 
        
           }

To support vmap through CopySlices, the approach in this stack is to:

add Tensor.new_empty_strided and replace empty_strided in
CopySlices with that so that we can propagate batch information.
Make some slight modifications to AsStridedBackward (and add
as_strided batching rule)

Please let me know if it would be better if I squashed everything related to
supporting vmap over CopySlices together into a single big PR.

Test Plan

New tests.

Differential Revision: D24741688

…requires_grad) Summary ------- This PR implements Tensor.new_empty_strided. Many of our torch.* factory functions have a corresponding new_* method (e.g., torch.empty and torch.new_empty), but there is no corresponding method to torch.empty_strided. This PR adds one. Motivation ---------- The real motivation behind this is for vmap to be able to work through CopySlices. CopySlices shows up a lot in double backwards because a lot of view functions have backward formulas that perform view+inplace. https://github.com/pytorch/pytorch/blob/e0fd590ec950cb1e65ea0431c9e765f8cda27908/torch/csrc/autograd/functions/tensor.cpp#L78-L106 To support vmap through CopySlices, the approach in this stack is to: - add `Tensor.new_empty_strided` and replace `empty_strided` in CopySlices with that so that we can propagate batch information. - Make some slight modifications to AsStridedBackward (and add as_strided batching rule) Please let me know if it would be better if I squashed everything related to supporting vmap over CopySlices together into a single big PR. Test Plan --------- - New tests. [ghstack-poisoned]

…e, device, requires_grad)" Summary ------- This PR implements Tensor.new_empty_strided. Many of our torch.* factory functions have a corresponding new_* method (e.g., torch.empty and torch.new_empty), but there is no corresponding method to torch.empty_strided. This PR adds one. Motivation ---------- The real motivation behind this is for vmap to be able to work through CopySlices. CopySlices shows up a lot in double backwards because a lot of view functions have backward formulas that perform view+inplace. https://github.com/pytorch/pytorch/blob/e0fd590ec950cb1e65ea0431c9e765f8cda27908/torch/csrc/autograd/functions/tensor.cpp#L78-L106 To support vmap through CopySlices, the approach in this stack is to: - add `Tensor.new_empty_strided` and replace `empty_strided` in CopySlices with that so that we can propagate batch information. - Make some slight modifications to AsStridedBackward (and add as_strided batching rule) Please let me know if it would be better if I squashed everything related to supporting vmap over CopySlices together into a single big PR. Test Plan --------- - New tests. [ghstack-poisoned]

dr-ci · 2020-11-02T23:58:44Z

💊 CI failures summary and remediations

As of commit 7ea21b5 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build (1/1)

Step: "Checkout pytorch/builder repo" (full log | diagnosis details | 🔁 rerun)

fatal: reference is not a tree: cd5a9b73c3028d2496666201588111a8c8d84878

+ sleep 2 
+ git submodule update --init --recursive 
fatal: reference is not a tree: cd5a9b73c3028d2496666201588111a8c8d84878 
Unable to checkout 'cd5a9b73c3028d2496666201588111a8c8d84878' in submodule path 'third_party/nccl/nccl' 
+ sleep 4 
+ git submodule update --init --recursive 
fatal: reference is not a tree: cd5a9b73c3028d2496666201588111a8c8d84878 
Unable to checkout 'cd5a9b73c3028d2496666201588111a8c8d84878' in submodule path 'third_party/nccl/nccl' 
+ sleep 8 
+ git submodule update --init --recursive 
fatal: reference is not a tree: cd5a9b73c3028d2496666201588111a8c8d84878 
Unable to checkout 'cd5a9b73c3028d2496666201588111a8c8d84878' in submodule path 'third_party/nccl/nccl'

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 32 times.

albanD

lgtm

aten/src/ATen/native/TensorFactories.cpp

…e, device, requires_grad)" Summary ------- This PR implements Tensor.new_empty_strided. Many of our torch.* factory functions have a corresponding new_* method (e.g., torch.empty and torch.new_empty), but there is no corresponding method to torch.empty_strided. This PR adds one. Motivation ---------- The real motivation behind this is for vmap to be able to work through CopySlices. CopySlices shows up a lot in double backwards because a lot of view functions have backward formulas that perform view+inplace. https://github.com/pytorch/pytorch/blob/e0fd590ec950cb1e65ea0431c9e765f8cda27908/torch/csrc/autograd/functions/tensor.cpp#L78-L106 To support vmap through CopySlices, the approach in this stack is to: - add `Tensor.new_empty_strided` and replace `empty_strided` in CopySlices with that so that we can propagate batch information. - Make some slight modifications to AsStridedBackward (and add as_strided batching rule) Please let me know if it would be better if I squashed everything related to supporting vmap over CopySlices together into a single big PR. Test Plan --------- - New tests. [ghstack-poisoned]

albanD · 2020-11-04T21:16:05Z

Note that CI failures are real because __torch_function__ is not properly handled by the current implementation.

…e, device, requires_grad)" Summary ------- This PR implements Tensor.new_empty_strided. Many of our torch.* factory functions have a corresponding new_* method (e.g., torch.empty and torch.new_empty), but there is no corresponding method to torch.empty_strided. This PR adds one. Motivation ---------- The real motivation behind this is for vmap to be able to work through CopySlices. CopySlices shows up a lot in double backwards because a lot of view functions have backward formulas that perform view+inplace. https://github.com/pytorch/pytorch/blob/e0fd590ec950cb1e65ea0431c9e765f8cda27908/torch/csrc/autograd/functions/tensor.cpp#L78-L106 To support vmap through CopySlices, the approach in this stack is to: - add `Tensor.new_empty_strided` and replace `empty_strided` in CopySlices with that so that we can propagate batch information. - Make some slight modifications to AsStridedBackward (and add as_strided batching rule) Please let me know if it would be better if I squashed everything related to supporting vmap over CopySlices together into a single big PR. Test Plan --------- - New tests. [ghstack-poisoned]

ezyang

Separate is better. Thanks!

…e, device, requires_grad)" Summary ------- This PR implements Tensor.new_empty_strided. Many of our torch.* factory functions have a corresponding new_* method (e.g., torch.empty and torch.new_empty), but there is no corresponding method to torch.empty_strided. This PR adds one. Motivation ---------- The real motivation behind this is for vmap to be able to work through CopySlices. CopySlices shows up a lot in double backwards because a lot of view functions have backward formulas that perform view+inplace. https://github.com/pytorch/pytorch/blob/e0fd590ec950cb1e65ea0431c9e765f8cda27908/torch/csrc/autograd/functions/tensor.cpp#L78-L106 To support vmap through CopySlices, the approach in this stack is to: - add `Tensor.new_empty_strided` and replace `empty_strided` in CopySlices with that so that we can propagate batch information. - Make some slight modifications to AsStridedBackward (and add as_strided batching rule) Please let me know if it would be better if I squashed everything related to supporting vmap over CopySlices together into a single big PR. Test Plan --------- - New tests. Differential Revision: [D24741688](https://our.internmc.facebook.com/intern/diff/D24741688) [ghstack-poisoned]

facebook-github-bot · 2020-11-09T19:31:38Z

@zou3519 merged this pull request in 59aca02.

facebook-github-bot added the cla signed label Nov 2, 2020

zou3519 requested review from albanD and ezyang November 2, 2020 22:49

albanD approved these changes Nov 3, 2020

View reviewed changes

aten/src/ATen/native/TensorFactories.cpp Outdated Show resolved Hide resolved

zou3519 added 4 commits November 3, 2020 10:09

This was referenced Nov 4, 2020

as_strided batching rule #47364

Closed

Add batching rule for torch.clone(tensor, torch.contiguous_format) #47365

Closed

ezyang approved these changes Nov 5, 2020

View reviewed changes

facebook-github-bot closed this in 59aca02 Nov 9, 2020

facebook-github-bot added the Merged label Nov 9, 2020

facebook-github-bot deleted the gh/zou3519/324/head branch November 13, 2020 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Tensor.new_empty_strided(sizes, strides, *, dtype, device, requires_grad) #47225

Implement Tensor.new_empty_strided(sizes, strides, *, dtype, device, requires_grad) #47225

zou3519 commented Nov 2, 2020 •

edited

dr-ci bot commented Nov 2, 2020 •

edited

albanD left a comment

albanD commented Nov 4, 2020

ezyang left a comment

facebook-github-bot commented Nov 9, 2020

	auto result = at::empty_strided(base.sizes(), base.strides(), grad.options());
	result.copy_(grad);

	at::Tensor grad_slice;
	if (view_fn.has_value()) {
	auto fn = view_fn.value();
	grad_slice = fn(result);
	} else {
	auto offset = view.storage_offset() - base.storage_offset();
	grad_slice = result.as_strided(view.sizes(), view.strides(), offset);
	}

	// TODO: We clone grad_slice because we modify it below and "fn" might save
	// it for the backward of res. We might be able to avoid the clone() if
	// double-backprop is disabled.
	auto res = (*fn)({ grad_slice.clone(at::MemoryFormat::Contiguous) });

	variable_list grad_inputs(num_outputs());
	for (size_t i = 0; i < res.size(); i++) {
	if (should_compute_output(i)) {
	AT_ASSERT(res[i].defined());
	if (i == 0) {
	grad_slice.copy_(res[i]);
	grad_inputs[i] = std::move(result); // NOLINT(bugprone-use-after-move)
	} else {
	grad_inputs[i] = std::move(res[i]);
	}
	}
	}

Implement Tensor.new_empty_strided(sizes, strides, *, dtype, device, requires_grad) #47225

Implement Tensor.new_empty_strided(sizes, strides, *, dtype, device, requires_grad) #47225

Conversation

zou3519 commented Nov 2, 2020 • edited

Summary

Motivation

Test Plan

dr-ci bot commented Nov 2, 2020 • edited

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build (1/1)

albanD left a comment

Choose a reason for hiding this comment

albanD commented Nov 4, 2020

ezyang left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Nov 9, 2020

zou3519 commented Nov 2, 2020 •

edited

dr-ci bot commented Nov 2, 2020 •

edited