Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CODEGEN_MIGRATION_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ When you work on your first few codegens, we generally recommend you to start wi
```
if (!IsSupportedAdaptivePool(XlaHelpers::I64List(self.sizes()),
output_size_list, /*pool_dim=*/3)) {
return at::native::call_fallback_fn<&xla_cpu_fallback, ATEN_OP(_adaptive_avg_pool3d)>::call(self, output_size);
return at::native::call_fallback_fn<&xla_fallback, ATEN_OP(_adaptive_avg_pool3d)>::call(self, output_size);
}
```
2. Results in dynamic shape as these ops are WIP and may evolve over time. At some future point, we may bring the ops into codegen.
Expand Down
4 changes: 2 additions & 2 deletions OP_LOWERING_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,15 @@ export PJRT_DEVICE=CPU
```

## Understanding the operation
You can find the definition of the C++ ATen operations in [native_functions.yaml](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/native_functions.yaml). After you build Pytorch/XLA from source, you will also find our default implementation (a boxed kernel which forwards calls to PyTorch native CPU) in `xla/torch_xla/csrc/aten_cpu_fallback.h/cpp`. Pytorch operations can usually be mapped to [PyTorch tensor api](https://pytorch.org/docs/stable/index.html) easily. If that is not the case searching the PyTorch native implementation under [PyTorch repo](https://github.com/pytorch/pytorch) is recommended. The goal is to lower the PyTorch operations into a sequence of XLA operations defined in [here](https://www.tensorflow.org/xla/operation_semantics).
You can find the definition of the C++ ATen operations in [native_functions.yaml](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/native_functions.yaml). After you build Pytorch/XLA from source, you will also find our default implementation (a boxed kernel which forwards calls to either PyTorch native kernels) in `xla/torch_xla/csrc/aten_fallback.h/cpp`. Pytorch operations can usually be mapped to [PyTorch tensor api](https://pytorch.org/docs/stable/index.html) easily. If that is not the case searching the PyTorch native implementation under [PyTorch repo](https://github.com/pytorch/pytorch) is recommended. The goal is to lower the PyTorch operations into a sequence of XLA operations defined in [here](https://www.tensorflow.org/xla/operation_semantics).

## File structure
All file mentioned below lives under the `xla/torch_xla/csrc` folder, with the exception of `codegen/xla_native_functions.yaml`

1. `xla_native_functions.yaml` contains the list of all operators (from the [Core Aten list](https://pytorch.org/docs/stable/torch.compiler_ir.html)) that are explicitly lowered. Composed operators are not listed here. Each operator name here must directly match a pytorch operator listed in [native_functions.yaml](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/native_functions.yaml). This file serves as the interface to adding new xla operators, and is an input to PyTorch's [codegen machinery](https://github.com/pytorch/pytorch/blob/main/torchgen/gen_backend_stubs.py). It generates the below 3 files: `XLANativeFunctions.h`, `RegisterXLA.cpp`, and `RegisterAutogradXLA.cpp`
2. `XLANativeFunctions.h` and `aten_xla_type.cpp` are entry points of PyTorch to the pytorch_xla world, and contain the manually written lowerings to XLA for each operator. `XLANativeFunctions.h` is auto-generated through a combination of `xla_native_functions.yaml` and the PyTorch core `native_functions.yaml` file, and contains declarations for kernels that need to be defined in `aten_xla_type.cpp`. The kernels written here need to construct 'XLATensor' using the input `at::Tensor` and other parameters. The resulting `XLATensor` needs to be converted back to the `at::Tensor` before returning to the PyTorch world.
3. `RegisterXLA.cpp` and `RegisterAutogradXLA.cpp` are auto-generated files that register all lowerings to the PyTorch Dispatcher. They also include auto-generated wrapper implementations of `out=` and `inplace` operators.
4. `aten_cpu_fallback.h/.cpp` contain our boxed fallback implementation to CPU. The boxed fallback kernel will be used if a lowering is not explicitly defined in `xla_native_functions.yaml` + `aten_xla_type.cpp`, and the operator is not composite.
4. `aten_fallback.h/.cpp` contain our boxed fallback implementation. The boxed fallback kernel will be used if a lowering is not explicitly defined in `xla_native_functions.yaml` + `aten_xla_type.cpp`, and the operator is not composite.
5. `tensor_methods.h` contains the `XLATensor` declarations. These declarations are usually a one to one mapping of the `at::Tensor` nodes we declared in `XLANativeFunctions.h`
6. `tensor_methods.cpp` contains the implementation of `XLATensor node` defined in `tensor_methods.h`. We constructed the corresponding `ir::op` from the parameter’s `ir::Value` and wrapped it inside a `XLATensor`. Ir stands for intermediate representation.
7. `ops/` directory contains all `ir::ops` declaration and definition. Smaller nodes can be put in `ops/ops.h/.cpp`. More complicated nodes can be put into a separate file. All ops inherit from `ir::ops::Node` and provide a way to lower input `ir::Value` to a sequence of `XlaOp`.
Expand Down
5 changes: 1 addition & 4 deletions benchmarks/experiment_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -645,9 +645,6 @@ def _collect_cuda_cpu_metrics_individual_ops(
def is_aten_op(op_name):
return 'aten::' in op_name

def get_xla_cpu_fallback_ops(met):
return set(name for name in met.counter_names() if is_aten_op(name))

extract_prof_info = lambda event: {
"self_cpu_time_s": us_to_s(event.self_cpu_time_total),
"self_cuda_time_s": us_to_s(event.self_cuda_time_total),
Expand All @@ -657,7 +654,7 @@ def get_xla_cpu_fallback_ops(met):
}

if benchmark_experiment.xla:
unlowered_ops = get_xla_cpu_fallback_ops(met)
unlowered_ops = met.executed_fallback_ops()
if not unlowered_ops:
return
if "xla_unlowered_ops" not in metrics:
Expand Down
5 changes: 5 additions & 0 deletions configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -394,3 +394,8 @@ variables:
your code.
type: bool
default_value: false
XLA_FALLBACK_CPU:
description:
- Forces CPU OpenXLA fallback. By default, PyTorch/XLA will run any operation
that doesn't have a lowering using PyTorch CUDA as fallback. Setting this
flag will force PyTorch/XLA to use PyTorch CPU as fallback.
4 changes: 2 additions & 2 deletions torch_xla/csrc/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ ptxla_cc_library(
name = "tensor",
srcs = [
"aten_autograd_ops.cpp",
"aten_cpu_fallback.cpp",
"aten_fallback.cpp",
"aten_xla_bridge.cpp",
"aten_xla_type.cpp",
"autocast_mode.cpp",
Expand Down Expand Up @@ -75,7 +75,7 @@ ptxla_cc_library(
] + glob(["ops/*.cpp"]),
hdrs = [
"aten_autograd_ops.h",
"aten_cpu_fallback.h",
"aten_fallback.h",
"aten_cuda_functions.h",
"aten_xla_bridge.h",
"batch_norm.h",
Expand Down
2 changes: 1 addition & 1 deletion torch_xla/csrc/aten_autograd_ops.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
#include <ATen/native/CPUFallback.h>
#include <c10/core/impl/PythonDispatcherTLS.h>

#include "torch_xla/csrc/aten_cpu_fallback.h"
#include "torch_xla/csrc/aten_fallback.h"
#include "torch_xla/csrc/aten_xla_bridge.h"
#include "torch_xla/csrc/helpers.h"
#include "torch_xla/csrc/tensor_methods.h"
Expand Down
2 changes: 1 addition & 1 deletion torch_xla/csrc/aten_cuda_functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

// Context
// =======
// aten_cpu_fallback.cpp (compiled into _XLAC.so library) uses these functions
// aten_fallback.cpp (compiled into _XLAC.so library) uses these functions
// for providing OpenXLA fallback on CUDA. Therefore, they must be defined at
// some point, somewhere.
//
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#include "torch_xla/csrc/aten_cpu_fallback.h"
#include "torch_xla/csrc/aten_fallback.h"

#include <ATen/DLConvertor.h>
#include <ATen/ops/_copy_from_and_resize.h>
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion torch_xla/csrc/aten_xla_type.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
#include "torch_xla/csrc/LazyIr.h"
#include "torch_xla/csrc/XLANativeFunctions.h"
#include "torch_xla/csrc/aten_autograd_ops.h"
#include "torch_xla/csrc/aten_cpu_fallback.h"
#include "torch_xla/csrc/aten_fallback.h"
#include "torch_xla/csrc/aten_xla_bridge.h"
#include "torch_xla/csrc/debug_util.h"
#include "torch_xla/csrc/device.h"
Expand Down
2 changes: 1 addition & 1 deletion torch_xla/csrc/generated_file_include.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

#include <torch/csrc/lazy/core/shape.h>

#include "torch_xla/csrc/aten_cpu_fallback.h"
#include "torch_xla/csrc/aten_fallback.h"
#include "torch_xla/csrc/aten_xla_bridge.h"
#include "torch_xla/csrc/ir.h"
#include "torch_xla/csrc/ops/ops_xla_shape_fn.h"
Expand Down
2 changes: 1 addition & 1 deletion torch_xla/csrc/init_python_bindings.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
#include "pybind11/stl_bind.h"
#include "torch_xla/csrc/XLANativeFunctions.h"
#include "torch_xla/csrc/aten_autograd_ops.h"
#include "torch_xla/csrc/aten_cpu_fallback.h"
#include "torch_xla/csrc/aten_fallback.h"
#include "torch_xla/csrc/aten_xla_bridge.h"
#include "torch_xla/csrc/device.h"
#include "torch_xla/csrc/dl_convertor.h"
Expand Down
2 changes: 1 addition & 1 deletion torch_xla/csrc/xla_manual_registration.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#include <ATen/ATen.h>
#include <torch/library.h>

#include "torch_xla/csrc/aten_cpu_fallback.h"
#include "torch_xla/csrc/aten_fallback.h"
#include "torch_xla/csrc/aten_xla_bridge.h"
#include "torch_xla/csrc/debug_util.h"
#include "torch_xla/csrc/ops/nms.h"
Expand Down