-
Notifications
You must be signed in to change notification settings - Fork 560
Make codegen backend agnostic minus fallbacks #2944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make codegen backend agnostic minus fallbacks #2944
Conversation
535ff1a
to
6e49287
Compare
e2fe1f4
to
1db52f3
Compare
6e49287
to
fc9f54d
Compare
…nus CPU fallback)" **Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
**Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
…nus CPU fallback)" **Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
**Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
- convolution_overrideable | ||
- convolution_backward_overrideable | ||
- _copy_from | ||
- _copy_from_and_resize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, not actually sure why these are showing up in the diff, since they're already there in master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
c8c7681
to
d2ba37d
Compare
…nus CPU fallback)" **Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
**Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
xla_native_functions.yaml
Outdated
backend: XLA | ||
cpp_namespace: torch_xla | ||
per_op_log: XLA_FN_TRACK(3) | ||
per_argument_log: TF_VLOG(3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ailzhang @JackCaoG I'm actually hoping that I can just completely remove this, but I wanted to confirm with you. Right now, the auto-generated add_out()
kernel for XLA has the same logging behavior as the original XLA codegen, and looks like this:
at::Tensor wrapper_Tensor_add(const at::Tensor &self, const at::Tensor &other,
const at::Scalar &alpha) {
// This is the actual XLA add kernel
return torch_xla::add(self, other, alpha);
}
} // anonymous namespace
at::Tensor &wrapper_out_add_out(const at::Tensor &self, const at::Tensor &other,
const at::Scalar &alpha, at::Tensor &out) {
XLA_FN_TRACK(3);
TF_VLOG(3) << "XLA wrapper_out_add_out :"
<< " self=" << self.toString() << " other=" << other.toString()
<< " out=" << out.toString();
auto wrapper_out_add_out_tmp = wrapper_Tensor_add(self, other, alpha);
at::_copy_from_and_resize(wrapper_out_add_out_tmp, out);
return out;
}
That logging information in the generated add_out
kernel seems unnecessary though, since it ends up calling directly into torch_xla::add
, which has its own logging too..
Passing C++ macros directly into yaml so the codegen can plop them into a C++ function is also pretty fragile; what if another backend wants some other more custom logging.
So I think it would be cleaner to kill that code-generated logging and just expect all logging to be done in the functional kernel lowerings, but I wanted to see what your thoughts are!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that auto-generated add_out()
should not have logging and counter if the actual kernel has the logging and counter already.
…nus CPU fallback)" **Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
**Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
…nus CPU fallback)" **Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
**Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
…nus CPU fallback)" **Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
**Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, some minor nits.
OP_LOWERING_GUIDE.md
Outdated
2. `aten_xla_type.h/.cpp` are entry points of PyTorch to the pytorch_xla world. `aten_xla_type.h` is auto-generated through a combination of `xla_native_functions.yaml` and the PyTorch core `native_functions.yaml` file, and contains declarations for kernels that need to be defined in `aten_xla_type.cpp`. The kernels written here need to construct 'XLATensor' using the input `at::Tensor` and other parameters. The resulting `XLATensor` needs to be converted back to the `at::Tensor` before returning to the PyTorch world. | ||
1. `xla_native_functions.yaml` contains the list of all operators that are lowered. Each operator name must directly match a pytorch operator listed in [native_functions.yaml](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml). This file serves as the interface to adding new xla operators, and is an input to PyTorch's [codegen machinery](https://github.com/pytorch/pytorch/blob/master/tools/codegen/gen_backend_stubs.py). It generates the below 3 files: `XLANativeFunctions.h`, `aten_xla_type_default.h`, and `aten_xla_type_default.cpp` | ||
2. `XLANativeFunctions.h` and `aten_xla_type.cpp` are entry points of PyTorch to the pytorch_xla world, and contain the manually written lowerings to XLA for each operator. `XLANativeFunctions.h` is auto-generated through a combination of `xla_native_functions.yaml` and the PyTorch core `native_functions.yaml` file, and contains declarations for kernels that need to be defined in `aten_xla_type.cpp`. The kernels written here need to construct 'XLATensor' using the input `at::Tensor` and other parameters. The resulting `XLATensor` needs to be converted back to the `at::Tensor` before returning to the PyTorch world. | ||
2. `RegisterXLA.cpp` and `RegisterAutogradXLA.cpp` are auto-generated files that register all lowerings to the PyTorch Dispatcher. They also include auto-generated wrapper implementations of `out=` and `inplace` operators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change 2.
to 3.
and update the order in below.
OP_LOWERING_GUIDE.md
Outdated
2. `RegisterXLA.cpp` and `RegisterAutogradXLA.cpp` are auto-generated files that register all lowerings to the PyTorch Dispatcher. They also include auto-generated wrapper implementations of `out=` and `inplace` operators. | ||
3. `aten_xla_type_default.h/.cpp` are also auto-generated, and contain our default implementation of the PyTorch operations which simply fall back to the underlying CPU implementation. Functions in here will be used if lowering is not explicitly defined in `xla_native_functions.yaml` + `aten_xla_type.cpp`. | ||
4. `tensor.h` contains the `XLATensor` declarations. These declarations are one to one mapping of the `at::Tensor` nodes we declared in `aten_xla_type.h` | ||
4. `tensor.h` contains the `XLATensor` declarations. These declarations are one to one mapping of the `at::Tensor` nodes we declared in `XLANativeFunctions.h` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly related to your change but could you update "These declarations are one to one mapping" to something similar to "These declarations are usually one to one mapping". We sometimes reuse XLATensor
for multiple atenXlaType functions.
Fixed the docs. I also removed the logging from the generated inplace/out wrappers, which simplifies some of the codegen and makes the yaml more robust. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bdhirsh !
…e kernels when possible
49791eb
to
c223040
Compare
…nus CPU fallback)" **Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
**Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` [ghstack-poisoned]
Summary: Pull Request resolved: #58064 **Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28711095 Pulled By: bdhirsh fbshipit-source-id: 90a48440f2e865a948184e2fb167ea240ada47bb
…nus CPU fallback)" **Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` Differential Revision: [D28711095](https://our.internmc.facebook.com/intern/diff/D28711095) [ghstack-poisoned]
**Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` Differential Revision: [D28711095](https://our.internmc.facebook.com/intern/diff/D28711095) [ghstack-poisoned]
…58064) Summary: Pull Request resolved: pytorch#58064 **Summary** This PR tries to remove all xla-specific logic from the codegen except for two places: - renaming the `aten_xla_type.h/cpp` template files; Going to do that in a separate PR just to make the diff easier to understand - CPU fallback logic (everything in `aten_xla_type_default.h/cpp` and `gen_external_aten_fallbacks.py`). I'm trying to kill all of that logic in a subsequent PR by making the CPU fallback a boxed kernel, so it felt unnecessary to go through it all and remove the xla references here. **Notable changes** The xla codegen includes some custom logging in each kernel wrapper, so I added a few new knobs to the external yaml, that we now test. I have a corresponding [xla-side PR](pytorch/xla#2944) with the new yaml changes, which look like this: ``` per_op_log: XLA_FN_TRACK(3) per_argument_log: TF_VLOG(3) cpu_fallback_counter: XLA_COUNTER("aten::{name}", 1) extra_headers: > #include <tensorflow/compiler/xla/xla_client/debug_macros.h> #include <tensorflow/compiler/xla/xla_client/metrics.h> #include <tensorflow/compiler/xla/xla_client/tf_logging.h> #include <torch_xla/csrc/function_call_tracker.h> #include <torch_xla/csrc/aten_xla_type.h> #include <torch_xla/csrc/aten_xla_type_default.h> ``` Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28711095 Pulled By: bdhirsh fbshipit-source-id: 90a48440f2e865a948184e2fb167ea240ada47bb
This PR removes all xla-specific logic from the pytorch codegen, except for the CPU fallback logic (working on in a later PR here. The corresponding pytorch-side changes are pytorch/pytorch#58064 and pytorch/pytorch#58568.
Main changes:
aten_xla_type.h
->XLANativeFunctions.h
, which is a little more in-line with how we name other in-tree generated files. Also updated the corresponding documentation.InitializeAtenBindings()
mostly because it looks like legacy (it's a no-op), and it would be an extra hoop to force every external backend to implement the same functionAtenXlaType
class. XLA lowering are now just defined undertorch_xla::{op}
.out
/inplace
wrappers, but those wrapper just call into the handwritten xla functional lowering, which also has its own logging information.