[cudnn nhwc support] #23861

jjsjann123 · 2019-08-06T08:27:19Z

Added nhwc support for:

cudnn_batch_norm & cudnn_batch_norm_backward
cudnn_convolution_forward & cudnn_convolution_backward
cudnn_convolution_transpose & cudnn_convolution_transpose_backward

patching suggest_memory_format for convolution

suggest_memory_format has ambiguous meaning for two cases:

tensor with NCHW where C = 1.
we could use stride of C as a hint to tell the intended memory format.
tensor with NCHW where H == W == 1.
there's no way to identify the intended memory format from strides.

Currently we fallback to NCHW whenever we see contiguous tensor. Hence avoiding
ambiguity for some of the special cases.

Added cudnn nhwc support for: 1. batch norm 2. convolution 3. convolution_transpose

jjsjann123 · 2019-08-06T08:33:21Z

This is to support #23403 Passing my local tests and triggering correct nhwc kernels.
cc'ing @csarofeen @VitalyFedyunin @ptrblck for visibility

Saw some breaking conv tests. Will handle that in this PR as well.
Also need to put in tests / code cleaning.

suggest_memory_format has ambiguous meaning for two cases: 1. tensor with NCHW where C = 1. we could use stride of C as a hint to tell the intended memory format. 2. tensor with NCHW where H == W == 1. there's no way to identify the intended memory format from strides. Currently we fallback to NCHW whenever we see contiguous tensor. Hence avoiding ambiguity for some of the special cases.

jjsjann123 · 2019-08-07T17:23:58Z

Code should be good to review. Can't get any useful info from previous failing tests.

VitalyFedyunin

Still reviewing cpp parts, but better tests will simplify process.

test/test_nn.py

VitalyFedyunin · 2019-08-08T15:49:31Z

aten/src/ATen/cudnn/Descriptors.cpp

+  // b. Tensor with both spatial size == 1
+  // It causes mismatch memory format for data & filter in convolution. Hence we
+  // check for contiguous here to fallback to NCHW in those cases.
+  if (!t.is_contiguous() && t.suggest_memory_format() == at::MemoryFormat::ChannelsLast) {


No need to check !t.is_contiguous() as it is mutually exclusive with t.suggest_memory_format() == at::MemoryFormat::ChannelsLast

This is the special case as the comment above suggests.
is_contiguous()=True and suggest_memory_format()==ChannelsLast are not ~~mutually exclusive~~ identical (I believe you were actually saying this) in the two cases listed above.

I agree that it would be a better ideal to not have it here and put it inside TensorImpl instead, we could have the tag mutually exclusive here: https://github.com/pytorch/pytorch/blob/master/c10/core/TensorImpl.h#L1514-L1515

Is this an oversight or I missed something, we are not copying is_channels_last_contiguous_ & is_channels_last_:

pytorch/c10/core/TensorImpl.h

Lines 1524 to 1546 in 32efb43

static void copy_tensor_metadata(

const TensorImpl* src_impl,

TensorImpl* dest_impl,

const c10::VariableVersion& version_counter,

bool allow_tensor_metadata_change) {

dest_impl->storage_ = src_impl->storage_;

dest_impl->sizes_ = src_impl->sizes_;

dest_impl->strides_ = src_impl->strides_;

dest_impl->storage_offset_ = src_impl->storage_offset_;

dest_impl->data_type_ = src_impl->data_type_;

dest_impl->device_opt_ = src_impl->device_opt_;

dest_impl->type_id_ = src_impl->type_id_;

dest_impl->is_contiguous_ = src_impl->is_contiguous_;

dest_impl->is_wrapped_number_ = src_impl->is_wrapped_number_;

dest_impl->reserved_ = src_impl->reserved_;

dest_impl->set_version_counter(version_counter);

dest_impl->set_allow_tensor_metadata_change(allow_tensor_metadata_change);

#ifdef BUILD_NAMEDTENSOR

if (src_impl->named_tensor_meta_ != nullptr) {

dest_impl->named_tensor_meta_ = src_impl->named_tensor_meta_->clone();

}

#endif

}

.
But even after patching that my fix is still not working. :/ For some reason, calling contiguous(channels_last) on a non-contiguous (NC11) tensor ended up with both flags to be True. Tracing this down the code path, I am stepping on empty_like, which I know you are working on in a different thread. I'll double check the issue I mentioned earlier after your #23899 .

Put a second thought on this, the flags are not necessarily exclusive. I fallback to your suggestion earlier to update fallback to ChannelsLast in suggest_memory_format instead (just like my code is doing right now)
But this means that we cannot represent NC11 kernel in NHWC flag. I'll open an issue to track this in case we want to revisit our design later.

aten/src/ATen/native/Convolution.cpp

aten/src/ATen/native/Normalization.cpp

aten/src/ATen/native/cudnn/Conv.cpp

jjsjann123 · 2019-08-09T06:21:37Z

ROCM doesn't seem to support nhwc output :/
Should/how do I disable nhwc on rocm tests?

aten/src/ATen/core/Tensor.h

.

VitalyFedyunin · 2019-08-13T17:38:30Z

Looks good, let's hold merging it until we land controls from #23899

VitalyFedyunin · 2019-08-27T17:02:41Z

Fails during backward pass within #25102 branch here:

pytorch/aten/src/ATen/native/cudnn/Conv.cpp

Line 999 in 358fb51

Workspace workspace = chooseAlgorithm(args, benchmark, &bwdDataAlgPerf);

With inputs

weight strides [512, 1, 1, 1] sizes [2048, 512, 1, 1] contiguous 1 channels last 1
grad_output strides [100352, 1, 14336, 2048] sizes [64, 2048, 7, 7] contiguous 0 channels last 1
grad_input strides [25088, 1, 3584, 512] sizes [64, 512, 7, 7] contiguous 0 channels last 1
Padding [0, 0]
stride [1, 1]
dilation [1, 1]
groups 1
benchmark 1
deterministic 0

jjsjann123 · 2019-10-30T04:49:45Z

Failed test seems to be related to some core changes. Looks like cuda.comm is calling is_contiguous on sparse tensorin. Doesn't see it on other changes, I'll merge master to see if it goes away.

jjsjann123 · 2019-10-30T07:39:04Z

As @VitalyFedyunin mentioned earlier, BC breakage to support additional arguments required by cudnn Ex API is getting flagged by CI test:

Oct 30 07:17:39 processing existing schema:  aten::_batch_norm_impl_index(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor, Tensor, Tensor, int)
Oct 30 07:17:39 Can NOT find backward compatible schemas after changes for schema aten::_batch_norm_impl_index(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor, Tensor, Tensor, int) from the following candidates:
Oct 30 07:17:39 [
Oct 30 07:17:39 aten::_batch_norm_impl_index(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor, Tensor, Tensor, Tensor, int)

How do we want to proceed here?

VitalyFedyunin · 2019-10-30T15:22:03Z

Let me verify it, adding arguments should not trigger BC breakage tests as it is (usually) only FC breaking.

VitalyFedyunin · 2019-10-30T19:16:54Z

Can you please rebase to master

VitalyFedyunin · 2019-10-30T22:04:32Z

Also you need to whitelist _batch_norm_impl_index and _batch_norm_impl_index_backward inside of check_backward_compatibility.py

jjsjann123 · 2019-10-30T22:43:48Z

Will do.
BTW, the failure on calling is_contiguous on sparse tensor looks real. Is there a PR in your pipeline that's fixing it?

jjsjann123 · 2019-10-30T22:48:46Z

Quick note here to myself.
Current implementation in Conv respects the layout of input. (the layout of input tensor determines the layout of output tensor).

I should update this behavior in a future PR to have the layout of weight to be dominant instead. The updated behavior would facilitate the conversion of ones model from NCHW to NHWC.

…uments

VitalyFedyunin · 2019-10-31T14:52:25Z

should update this behavior in a future PR to have the layout of weight to be dominant instead. The updated behavior would facilitate the conversion of ones model from NCHW to NHWC.

I recommend to add expectedFailure test for it, which will cover edge case and explain what we are trying to archive.

test/backward_compatibility/check_backward_compatibility.py

Sorry, wrong button, still need to figure out what is going on with Sparse and how it is even got affected

aten/src/ATen/templates/TensorBody.h

1. fixing BC compatiblity check date; 2. fixing is_contiguous call on sparse tensor; 3. added expectedFailure test to show intended behavior for Conv2d layer with mixed layout for input/weight;

facebook-github-bot

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

VitalyFedyunin · 2019-11-01T18:27:41Z

Everything looks good, the only thing that we want to add is a TORCH_WARN_ONCE warning if we dispatched channels last to old cudnn function.

Summary: Added nhwc support for: 1. cudnn_batch_norm & cudnn_batch_norm_backward 2. cudnn_convolution_forward & cudnn_convolution_backward 3. cudnn_convolution_transpose & cudnn_convolution_transpose_backward patching suggest_memory_format for convolution suggest_memory_format has ambiguous meaning for two cases: 1. tensor with NCHW where C = 1. we could use stride of C as a hint to tell the intended memory format. 2. tensor with NCHW where H == W == 1. there's no way to identify the intended memory format from strides. Currently we fallback to NCHW whenever we see contiguous tensor. Hence avoiding ambiguity for some of the special cases. Pull Request resolved: pytorch/pytorch#23861 Differential Revision: D18263434 Pulled By: VitalyFedyunin fbshipit-source-id: dd9f69576ec12fec879cd87a3d446931371360d9

facebook-github-bot · 2019-11-04T21:40:38Z

@VitalyFedyunin merged this pull request in 8160f39.

gchanan · 2019-11-05T17:52:22Z

@jjsjann123 can you not put the title of the PR in brackets "[]" -- it doesn't display in the github UI (e.g. https://github.com/pytorch/pytorch/commits/master), it only shows the commit number, which makes it much more difficult to figure out what changes are involved.

jjsjann123 · 2019-11-05T21:39:28Z

@gchanan noted. I'll update that for my other inflight PR and do so for my future PRs.

```python x = torch.randn(192,16,50).cuda() x = x.permute(0,2,1).contiguous().permute(0,2,1) m = torch.nn.Conv1d( in_channels=16, out_channels=32, kernel_size=2, bias=True, ).cuda() m(x) ``` This reverts commit 8160f39.

Summary: Broken case: ```python x = torch.randn(192,16,50).cuda() x = x.permute(0,2,1).contiguous().permute(0,2,1) m = torch.nn.Conv1d( in_channels=16, out_channels=32, kernel_size=2, bias=True, ).cuda() m(x) ``` This reverts commit 8160f39. Pull Request resolved: #29329 Differential Revision: D18357674 Pulled By: VitalyFedyunin fbshipit-source-id: cdd7e77e8dcbfc5f2ab3df54eb53ccfbf703b245

[nhwc support]

397934d

Added cudnn nhwc support for: 1. batch norm 2. convolution 3. convolution_transpose

pytorchbot added module: cudnn Related to torch.backends.cudnn, and CuDNN support module: operators labels Aug 6, 2019

ezyang added the open source label Aug 6, 2019

pytorchbot added the module: nn Related to torch.nn label Aug 7, 2019

nhwc test cases added for batchnorm & conv

a07dc6b

code cleaning and comments

6ec5905

li-roy assigned VitalyFedyunin and unassigned VitalyFedyunin Aug 7, 2019

li-roy requested a review from VitalyFedyunin August 7, 2019 23:38

li-roy added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 7, 2019

VitalyFedyunin suggested changes Aug 8, 2019

View reviewed changes

test/test_nn.py Outdated Show resolved Hide resolved

test/test_nn.py Outdated Show resolved Hide resolved

VitalyFedyunin mentioned this pull request Aug 8, 2019

Memory Format support for Resnet models #23403

Closed

10 tasks

VitalyFedyunin reviewed Aug 8, 2019

View reviewed changes

aten/src/ATen/native/Convolution.cpp Outdated Show resolved Hide resolved

VitalyFedyunin reviewed Aug 8, 2019

View reviewed changes

aten/src/ATen/native/Normalization.cpp Show resolved Hide resolved

VitalyFedyunin reviewed Aug 8, 2019

View reviewed changes

aten/src/ATen/native/cudnn/Conv.cpp Outdated Show resolved Hide resolved

pytorchbot added the module: internals Related to internal abstractions in c10 and ATen label Aug 8, 2019

jjsjann123 added 2 commits August 8, 2019 16:45

addressing review comments

012d341

removing obsolete change

52f0239

disable nhwc tests for rocm

c1aea56

apaszke previously requested changes Aug 9, 2019

View reviewed changes

aten/src/ATen/core/Tensor.h Outdated Show resolved Hide resolved

fixing pylint issue

e63a34a

Merge remote-tracking branch 'origin/master' into nhwc_support_PR

628910d

jjsjann123 requested a review from VitalyFedyunin October 30, 2019 07:20

Merge remote-tracking branch 'origin/master' into nhwc_support_PR

3e6f6e8

adding white_list on updated BN APIs which changes return value & arg…

d41d19f

…uments

VitalyFedyunin previously approved these changes Oct 31, 2019

View reviewed changes

test/backward_compatibility/check_backward_compatibility.py Outdated Show resolved Hide resolved

VitalyFedyunin reviewed Oct 31, 2019

View reviewed changes

aten/src/ATen/templates/TensorBody.h Outdated Show resolved Hide resolved

address review comments:

afda320

1. fixing BC compatiblity check date; 2. fixing is_contiguous call on sparse tensor; 3. added expectedFailure test to show intended behavior for Conv2d layer with mixed layout for input/weight;

facebook-github-bot reviewed Nov 1, 2019

View reviewed changes

facebook-github-bot closed this in 8160f39 Nov 4, 2019

facebook-github-bot added the merged label Nov 4, 2019

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cudnn nhwc support] #23861

[cudnn nhwc support] #23861

jjsjann123 commented Aug 6, 2019 •

edited

jjsjann123 commented Aug 6, 2019

jjsjann123 commented Aug 7, 2019

VitalyFedyunin left a comment

VitalyFedyunin Aug 8, 2019

jjsjann123 Aug 8, 2019 •

edited

jjsjann123 Aug 8, 2019 •

edited

jjsjann123 commented Aug 9, 2019

VitalyFedyunin commented Aug 13, 2019

VitalyFedyunin commented Aug 27, 2019

jjsjann123 commented Oct 30, 2019

jjsjann123 commented Oct 30, 2019

VitalyFedyunin commented Oct 30, 2019

VitalyFedyunin commented Oct 30, 2019

VitalyFedyunin commented Oct 30, 2019

jjsjann123 commented Oct 30, 2019

jjsjann123 commented Oct 30, 2019

VitalyFedyunin commented Oct 31, 2019

facebook-github-bot left a comment

VitalyFedyunin commented Nov 1, 2019

facebook-github-bot commented Nov 4, 2019

gchanan commented Nov 5, 2019

jjsjann123 commented Nov 5, 2019

	static void copy_tensor_metadata(
	const TensorImpl* src_impl,
	TensorImpl* dest_impl,
	const c10::VariableVersion& version_counter,
	bool allow_tensor_metadata_change) {
	dest_impl->storage_ = src_impl->storage_;
	dest_impl->sizes_ = src_impl->sizes_;
	dest_impl->strides_ = src_impl->strides_;
	dest_impl->storage_offset_ = src_impl->storage_offset_;
	dest_impl->data_type_ = src_impl->data_type_;
	dest_impl->device_opt_ = src_impl->device_opt_;
	dest_impl->type_id_ = src_impl->type_id_;
	dest_impl->is_contiguous_ = src_impl->is_contiguous_;
	dest_impl->is_wrapped_number_ = src_impl->is_wrapped_number_;
	dest_impl->reserved_ = src_impl->reserved_;
	dest_impl->set_version_counter(version_counter);
	dest_impl->set_allow_tensor_metadata_change(allow_tensor_metadata_change);
	#ifdef BUILD_NAMEDTENSOR
	if (src_impl->named_tensor_meta_ != nullptr) {
	dest_impl->named_tensor_meta_ = src_impl->named_tensor_meta_->clone();
	}
	#endif
	}

[cudnn nhwc support] #23861

[cudnn nhwc support] #23861

Conversation

jjsjann123 commented Aug 6, 2019 • edited

jjsjann123 commented Aug 6, 2019

jjsjann123 commented Aug 7, 2019

VitalyFedyunin left a comment

Choose a reason for hiding this comment

VitalyFedyunin Aug 8, 2019

Choose a reason for hiding this comment

jjsjann123 Aug 8, 2019 • edited

Choose a reason for hiding this comment

jjsjann123 Aug 8, 2019 • edited

Choose a reason for hiding this comment

jjsjann123 commented Aug 9, 2019

VitalyFedyunin commented Aug 13, 2019

VitalyFedyunin commented Aug 27, 2019

jjsjann123 commented Oct 30, 2019

jjsjann123 commented Oct 30, 2019

VitalyFedyunin commented Oct 30, 2019

VitalyFedyunin commented Oct 30, 2019

VitalyFedyunin commented Oct 30, 2019

jjsjann123 commented Oct 30, 2019

jjsjann123 commented Oct 30, 2019

VitalyFedyunin commented Oct 31, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

VitalyFedyunin commented Nov 1, 2019

facebook-github-bot commented Nov 4, 2019

gchanan commented Nov 5, 2019

jjsjann123 commented Nov 5, 2019

jjsjann123 commented Aug 6, 2019 •

edited

jjsjann123 Aug 8, 2019 •

edited

jjsjann123 Aug 8, 2019 •

edited