New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Format support for Resnet models #23403
Comments
CC @ezyang @ifedan @zou3519 @csarofeen @dzhulgakov @soumith |
Question about the reshape / view comment above: does this perform any permutation on the data to put it in NCHW? I suppose not, as it would be very intuitive. |
At this moment I changed |
Exactly same applies to Alexnet |
What if we replace the occurrences of reshape with flatten, and leave the possibility to specify some kind of memory format in flatten itself? |
Well ultimately Good news are:
|
To be honest, this point is the most questionable, but at same time easiest to solve one way or another, and we can move it out from performance test task. As I just wanted to write it down, as question might appear. |
Flatten is a subset of reshape: it only merges dimensions together, and cant create new dimensions. I think it can be very mysterious to users to switch from But I need to think a bit more about all this |
I would be very careful about Why should we bother with layouts in |
I'm not worried about the |
The worst case for memory format is that there will be an extra transpose and copy. memory formats can never give wrong results. |
@soumith I think I misunderstood what was going to happen.
So now inside of In this case, there will be indeed no problems for the case I was thinking. But this could also be confusing to the users, as now the semantics of But if the solution that we are proposing is to keep the behavior of Imagine the two situations: x = x.contiguous().reshape(x.shape[0], -1)
# or
x = x.reshape(x.shape[0], -1) currently, the two are equivalent and one can just use the But with memory layouts, this doesn't hold true anymore. x = x.contiguous().reshape(x.shape[0], -1)
# is equivalent to
x = x.contiguous(memory_format=torch. contiguous_format).reshape(x.shape[0], -1)
# but not to
x = x.reshape(x.shape[0], -1) |
@fmassa |
While pytorch/aten/src/ATen/native/TensorShape.cpp Lines 360 to 362 in 7e31c02
If layout-preserving .clone() also preserves zero strides, then gradient accumulation will break here
As far as I can tell at a quick glance SpectralOps rely on output of .clone() being contiguous.
|
Thanks @ngimel, hence my hacky implementation was preserving |
Just to put it out for tracking who's working on what, so that we are not duplicating the effort. I'm currently looking at the last two thing on the list:
|
As discussed with Vitaly, I'm taking on these now:
|
Moving progress tracking and extending scope in #28619 |
We define 4D tensor as stored in channels last memory format, when dimensions order is NCHW and
C-strides < W-strides < H-strides < N-strides
(If size of any dimension is equal to 1, this dimension strides value is not taken into account).Channels last contiguous tensor is channel last tensor which occupies contiguous memory block. So
x.is_contiguous(memory_format=torch.channels_last)
checks if tensor is channels last contiguous.The goal of the experiment is to use channels last memory format in all Resnet (https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py) model's operators and to measure performance gains on the Volta devices with CudNN library available.
This experiment requires:
To avoid changing the model itself and more importantly, introduce this optimization to the existing saved models. We need to introduce the next changes:
to
operator should preserve memory format.copy_device_to_device
should be memory format aware. (empty_like
,to
,resize_as_
andclone
now preserve memory format #23899)empty_like
operator should preserve memory format by default. (empty_like
,to
,resize_as_
andclone
now preserve memory format #23899)resize_as_
operator should be memory format aware. (empty_like
,to
,resize_as_
andclone
now preserve memory format #23899)clone
operator should preserve memory format. (empty_like
,to
,resize_as_
andclone
now preserve memory format #23899)scatter
andgather
functions should be memory format aware. ([WIP] Scatter gather memory format #24121)TensorIterator
based point-wise operators should preserve memory format ([WIP] Add Tensor Iterator and some cuda functions memory propagation #24038).adaptive_avg_pool2d_cuda
andadaptive_avg_pool2d_backward_cuda
should have channel last optimized kernels. ([nhwc support for adaptive_avg_pool2d & adaptive_avg_pool2d_backward] #24396)max_pool2d_with_indices_cuda
andmax_pool2d_with_indices_backward_cuda
should have channel last optimized kernels (max_pool2d cuda should have channel last optimized kernels[Performance improvement] #24872).cudnn_batch_norm
andcudnn_batch_norm_backward
should support channels last memory format. ([cudnn nhwc support] #23861)cudnn_convolution_forward
andcudnn_convolution_backward
should support channels last memory format. ([cudnn nhwc support] #23861)Writing memory format aware operators require special functions introduced in #23391
Notes
x = x.reshape(x.size(0), -1)
before linear layers, we are going to updatereshape
andview
code and convert tensor's memory format totorch.contiguous_format
at this step.empty_like
( and all _like operators ) will return channels last tensor if input is channels last, similar will apply toto
,clone
,resize_as
. We are thinking about the ability to control suggest_memory_format behaviors by the global variable.The text was updated successfully, but these errors were encountered: