Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] Smarter version of torch.reshape (can avoid realloc in some cases) #28090

Open
vadimkantorov opened this issue Oct 16, 2019 · 8 comments
Labels
function request A request for a new function or the addition of new arguments/modes to an existing function. module: viewing and reshaping triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@vadimkantorov
Copy link
Contributor

vadimkantorov commented Oct 16, 2019

Imagine a following situation:

a = torch.rand(2,3,4)
a_ = a.transpose(-1, -2)
b_ = someltwiseinplacefunc(a_.reshape(2, -1)) # reshape seems to reallocate (checked by data_ptr), view will error out
b = b_.view_as(a) # or b_.view_as(a_) (currently view_as copies dimension order, disregarding the strides, though)

Currently a_.reshape(2, -1) seems to reallocate, but it's not necessary for all operations, especially elementwise ones (given that frequently will reshape back the result). These things happen in C++ sometimes. I guess currently the solution is manual handling of strides.

If it doesn't reallocate, the semantics seems different, but it may still be worthwhile to allow for flattening of contiguous chunks of memory without reallocation and ignoring the striding dimension order.

@smessmer smessmer added module: operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Oct 17, 2019
@vadimkantorov
Copy link
Contributor Author

as far as I understand, part of the question is semantics of view_as (does it take into account strides or not). and the accompanying question of treatment of -1 (whether C-contiguity of dimensions is enforced regardless of strides)

@albanD
Copy link
Collaborator

albanD commented Oct 24, 2019

Hi,

  • View definitely does not take strides into account. a view that takes strides into account is a transpose right?
  • The -1 is simply computed such that the number of elements in the Tensor remains the same: nb_elements / (product of all specified dims)`.
  • I see your point that reshape in this case could be smarter and realize that reallocation is not necessary, I'm not sure how complex this new detection algorithm would be though.

@vadimkantorov
Copy link
Contributor Author

  • not sure. if view_as takes strides into account it's sth else, more like some new stride_as (that does as_strided(other.strides, other.shape) )
  • how does reshape decide if it needs to realloc or not for case -1? i think it checks for C-contiguity in the sense that the last dimension should have stride 1 (e.g. after viewing back with the dimensions we should have logically same tensor. in my example it's not true, unless one takes strides into account)
  • i think maybe even better to have another method or have keyword argument to reshape, reshape_as that would take strides into account (and have slightly different semantics wrt memory contiguity wrt dimension order)

@vadimkantorov
Copy link
Contributor Author

e.g. currently: (a_ == a_.reshape(...).reshape_as(a_)).all() == True. If reshape is smartened without smartening reshape_as, this invariant will break

@albanD
Copy link
Collaborator

albanD commented Oct 24, 2019

Ho your point was about view_as. Yes it is currently just a shortcut for view: b.view_as(a) == b.view(a.size()).
More generally, the *_as() functions always ignore strides.

The related attempt that exist is the new memory format system that more and more functions support and that is preserved by some operations.

And I'm going to go back on what I said: In your case, the reshape here cannot be done without copy:

  • If the view was allowed, it would give you the wrong result. (Not proper ordering of the elements in the new dimension)
  • The Tensor that reshape must return can only have two sizes and strides. So there is no way to make this work inplace.

I get your point that if you only do this to perform element-wise inplace operation on it this is sub optimal.
But in that case, you could apply the element-wise op without the reshape no?
Do you have a concrete example where this is forcing you to make a copy (whatever way you rearrange your code) and you could in theory do without the copy?

@vadimkantorov
Copy link
Contributor Author

vadimkantorov commented Oct 24, 2019

I think we are saying the same thing: existing reshape / reshape_as have semantics that does not use strides. If one uses strides to be smart and avoid realloc, semantics are different (so maybe a new name or at least kwarg would be needed).

In practice I stumbled upon this in https://github.com/pytorch/pytorch/pull/27621/files#diff-7541a840cb6478a343d371533cd95a7aL678 that does reshape in order to do element-wise ops later (and then reshape back at the very end). It does reshape in order to simplify code and deal with only one dimension always. If input is transposed and is not logically contiguous in the dimensions that are being flattened, a copy will happen.

If this new semantics exists, probably some methods that use reshape for convenience (or generalising 1d/2d/3d ops) would support the other memory format with fewer reallocations.

@albanD
Copy link
Collaborator

albanD commented Oct 24, 2019

Is the following what you want?

a = torch.rand(2,3,4)
a_ = a.transpose(-1, -2)
a_strided_view = a_.as_strided((2, 12), (12, 1)) # Note that the data inside a_strided_view are impossible to interpret except element-wise !
b_ = someltwiseinplacefunc_(a_strided_view) # This has to be an inplace operation
# If it returns a new contiguous Tensor, the next line will output nonsense.
b = b_.as_strided(a_.size(), a_.stride()) # Make these data sensible again by re-using the proper strides.

The problem with this approach is that you need to make sure that the intermediary function does not do any copy otherwise the result won't be correct.
I can't really thing of an api that would allow users to do this "safely". Do you have a proposition?

@vadimkantorov
Copy link
Contributor Author

vadimkantorov commented Oct 25, 2019

A scope reduction: I am meaning only the stride consideration when the flattened area is in fact contiguous in memory. If we can check it is contiguous, then it's only order of dimensions that is important (not the strides actual values), i.e. we can sort the dimensions by strides first, then do conventional view, then apply whatever operation (including a copy one) we wish, then reshape back, then permute back.

bad pseudo code:

# if the dimensions that we want to flat out with -1 are in fact contiguous in memory, we can permute them, then view should succeed

b_ = a_.permute(0, ....sorted order by strides...)
b__ = view(2, -1)
c = somefuncionincludingcopying(b__)
c_ = c.view_as(b_).permute(...some good order restoring dimension order of a...)

i'll try to write a better snippet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
function request A request for a new function or the addition of new arguments/modes to an existing function. module: viewing and reshaping triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants