Skip to content

Conversation

VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin commented Jul 18, 2019

API operators now routed to at::native::resize_as_*_ and at::native::clone accordingly.
Internal THTensor_(resizeAs), THCTensor_(resizeAs), THTensor_(newClone) and THCTensor_(newClone) remains to support older TH code.

@pytorchbot pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels Jul 18, 2019
@VitalyFedyunin VitalyFedyunin changed the title [WIP] Port resize_as_ from TH to Aten [WIP] Port resize_as_ and clone from TH to Aten Jul 18, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin VitalyFedyunin changed the title [WIP] Port resize_as_ and clone from TH to Aten Port resize_as_ and clone from TH to Aten Jul 18, 2019
@VitalyFedyunin VitalyFedyunin requested review from gchanan and izdeby July 18, 2019 18:30
@izdeby
Copy link
Contributor

izdeby commented Jul 18, 2019

Internal THTensor_(resizeAs), THCTensor_(resizeAs), THTensor_(newClone) and THCTensor_(newClone) remains to support older TH code.

Is it possible to replace those calls with ATen implementation?

return self;
}

Tensor& resize_as_cpu_(Tensor& self, const Tensor& the_template) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that its not your change but im wondering why this file isn't located in native/cpu to follow the cuda example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

native/cpu compiles 3 times for old CPU, old with AVX and AVX2. This code cannot be optimized with AVX

{
THCTensor *tensor = THCTensor_(new)(state);
THCTensor_(resizeAs)(state, tensor, self);
at::Tensor tensor_wrap = THTensor_wrap(tensor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see that CPU version does wrapping, was it a bug that CUDA wasn't doing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just partial porting, as

void THCTensor_(copy)(THCState* state, THCTensor* dst, THCTensor* src) {
does wrapping. They will eventually go away as we deprecate all call sites.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused what the strategy is here.

You don't just dispatch to at::resize_as in THCTensor_(resizeAs), but you call at::resize_as here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me follow-up with two more PRs, completly removing TH[C]Tensor_(resizeAs) and TH[C]Tensor_(newClone) code. I just afraid having hundreds line code update here, will be hard to review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya that's totally fine, I just wanted to understand the strategy.

@VitalyFedyunin
Copy link
Contributor Author

VitalyFedyunin commented Jul 18, 2019

Is it possible to replace those calls with ATen implementation?

Yes, but it is about 100 call sites with additional wrapping. TH is already beaten up, have a mercy!

@VitalyFedyunin VitalyFedyunin requested a review from izdeby July 18, 2019 21:15
@VitalyFedyunin
Copy link
Contributor Author

Fixed failing tests. @izdeby might be interested to see.

}

Tensor& resize_as_cpu_(Tensor& self, const Tensor& the_template) {
return resize_cpu_(self, the_template.sizes());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this happen to fix #11665?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: torch.randn(3).resize_as_(torch.ones(2, dtype=torch.uint8))                                                                            
Out[1]: tensor([-0.5847,  1.3565])

after porting


void THTensor_(resizeAs)(THTensor *self, THTensor *src)
{
// already available in Aten as at::resize_as_()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not call resize_as_ here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, because I had no reason to change this function as we going to nuke it all together as call sites goes away.

y = x.clone()
if (device == 'cuda' and dt == torch.bfloat16):
self.assertRaises(RuntimeError, lambda: x.clone())
self.assertRaises(RuntimeError, lambda: self.assertEqual(x, y))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain what's going on here? I presume this assert was in to check that clone threw an exception with cuda+bfloat16, but now it's throwing an exception on the comparison? That seems weird?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is funny one, as assert equals not yet functional for bfloat16, however because I fixed x.clone(), I had to update assertRaises piece.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, the first part makes sense because the assert is there so you have to update the test, so we are sure things get tested (otherwise these exceptional cases just live in the code forever and we never actually get test coverage).

But the second part doesn't make sense to me -- why does assertEqual throw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because x -y throws =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I think @izdeby has a fix for that in another PR (#22851). There might be a conflict between those two, just a heads up.

@VitalyFedyunin VitalyFedyunin requested a review from gchanan July 25, 2019 15:46
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin VitalyFedyunin added better-engineering Relatively self-contained tasks for better engineering contributors module: porting Issues related to porting TH/THNN legacy to ATen native labels Jul 25, 2019
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 30, 2019
Summary:
API operators now routed to `at::native::resize_as_*_` and `at::native::clone` accordingly.
Internal `THTensor_(resizeAs)`, `THCTensor_(resizeAs)`, `THTensor_(newClone)` and `THCTensor_(newClone)` remains to support older TH code.
Pull Request resolved: pytorch/pytorch#23027

Differential Revision: D16362304

Pulled By: VitalyFedyunin

fbshipit-source-id: 4c1e8516da685f3fdea632ff791d143f27aeebeb
@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in 401fbb0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
better-engineering Relatively self-contained tasks for better engineering contributors Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: porting Issues related to porting TH/THNN legacy to ATen native
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants