-
Notifications
You must be signed in to change notification settings - Fork 559
use codegen'd inplace kernels, and delete manually written inplace ke… #2962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9e91c22
to
a285fcb
Compare
49791eb
to
c223040
Compare
a285fcb
to
dc43fe2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, some minor questions.
test/cpp/test_tensor.cpp
Outdated
at::Tensor input = at::zeros({32, 20, 4, 4}, at::TensorOptions(at::kFloat)); | ||
at::Tensor one = at::tensor(1.0, at::TensorOptions(at::kFloat)); | ||
at::Tensor output = input.view({-1, 320}); | ||
at::Tensor output = input.view({-1, 8}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why change 320->8. Is it just a performance issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that was a mistake, I changed it when I was debugging and forgot to put it back 😛 it's fixed now.
dc43fe2
to
5990200
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bdhirsh !
…e kernels when possible
f60faf9
to
c967eb1
Compare
This PR deletes the
inplace
implementations of all lowerings that can be auto-generated from the codegen, removing ~1000 LoC.It applies to any operator that XLA already has a functional lowering for (e.g.
add.Tensor
), and that has a "trio" of operators in the pytorch codebase (in this case,add.Tensor
,add_.Tensor
, andadd.out
are all valid pytorch ops). For example, it doesn't apply torelu_
, because we don't have arelu.out
operator in PyTorch.@JackCaoG Other than a small test change that I had to make, all of the tests are still passing. One thing that's worth double checking after this lands is that you don't see any major perf divergences in the performance dashboard. I don't think we should expect a perf change, since the generated kernels are all implemented by just calling the functional operator, followed by a call to
at::_copy_from()
to move the result intoself
. But definitely worth confirming.For each inplace lowering, I removed:
xla_native_functions.yaml
aten_xla_type.cpp
tensor.h
tensor_methods.cpp
You can see the full list of inplace kernels that I removed in
xla_native_functions.yaml
, but the list is: