From d603a66ae6ba9f55d1b1e3026aeb1e83df67e9a1 Mon Sep 17 00:00:00 2001 From: Pedro Goncalves Mokarzel Date: Wed, 5 Mar 2025 21:30:10 -0800 Subject: [PATCH 1/8] Update CODEGEN_MIGRATION_GUIDE.md --- CODEGEN_MIGRATION_GUIDE.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/CODEGEN_MIGRATION_GUIDE.md b/CODEGEN_MIGRATION_GUIDE.md index dccffdd26520..0e8ebaf96855 100644 --- a/CODEGEN_MIGRATION_GUIDE.md +++ b/CODEGEN_MIGRATION_GUIDE.md @@ -76,7 +76,11 @@ at::Tensor XLANativeFunctions::abs(const at::Tensor& self) { ``` ### 2. Codegen the op and inspect the generated file -Find the op in `xla/codegen/xla_native_functions.yaml` and move it to the full_codegen column and run `python setup.py install` under xla directory again. The build will fail (reason explained later in this guide) but you can still see the generated file. The code snippets below uses `abs` as an example. +Find the op in `xla/codegen/xla_native_functions.yaml` and move it to the full_codegen column and run `python setup.py install` under xla directory again. The build will fail (reason explained later in this guide) but you can still see the generated file. + +If while generating the file you run into an error involving [`shape_inference.h`](https://github.com/pytorch/pytorch/blob/main/torch/csrc/lazy/core/shape_inference.h), you might be running into a problem with PyTorch not yet having the necessary implementation for the function to be generated. You can attempt to add the necessary function in[`shape_inference.h`](https://github.com/pytorch/pytorch/blob/main/torch/csrc/lazy/core/shape_inference.h) to be unblocked. + +The code snippets below uses `abs` as an example. #### XLANativeFunctions.cpp ``` at::Tensor XLANativeFunctions::abs(const at::Tensor & self) { From c592e5634307af97b9b3b399efafe90f77c5e023 Mon Sep 17 00:00:00 2001 From: Pedro Goncalves Mokarzel Date: Wed, 5 Mar 2025 21:32:19 -0800 Subject: [PATCH 2/8] Update OP_LOWERING_GUIDE.md --- OP_LOWERING_GUIDE.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/OP_LOWERING_GUIDE.md b/OP_LOWERING_GUIDE.md index 8288e57779ea..db5af924ea57 100644 --- a/OP_LOWERING_GUIDE.md +++ b/OP_LOWERING_GUIDE.md @@ -8,6 +8,8 @@ Here's an example of what you might see from the PyTorch/XLA debugging tool for pt-xla-profiler: Op(s) not lowered: aten::_ctc_loss, aten::_ctc_loss_backward, Please open a GitHub issue with the above op lowering requests. ``` +Furthermore, if possible, we want to lower operations to use `full_codegen` see our [Codegen migration guide](https://github.com/pytorch/xla/edit/document_xla_override/CODEGEN_MIGRATION_GUIDE.md) for more instructions. + ## Before you start You should follow the instructions in [here](https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md) to install required dependencies and build pytorch and pytorch/XLA from the source. You do not need access to TPU to implement the lowering. It is recommended to experiment on a workstation and configure it to use XLA:CPU. You can configure Pytorch/XLA to use XLA:CPU by running From 5c2ac028a0e942d12f32209f0882303dc6c9c71d Mon Sep 17 00:00:00 2001 From: Pedro Goncalves Mokarzel Date: Wed, 5 Mar 2025 21:32:48 -0800 Subject: [PATCH 3/8] Fix minor typo --- OP_LOWERING_GUIDE.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/OP_LOWERING_GUIDE.md b/OP_LOWERING_GUIDE.md index db5af924ea57..56f710d55aa5 100644 --- a/OP_LOWERING_GUIDE.md +++ b/OP_LOWERING_GUIDE.md @@ -8,7 +8,7 @@ Here's an example of what you might see from the PyTorch/XLA debugging tool for pt-xla-profiler: Op(s) not lowered: aten::_ctc_loss, aten::_ctc_loss_backward, Please open a GitHub issue with the above op lowering requests. ``` -Furthermore, if possible, we want to lower operations to use `full_codegen` see our [Codegen migration guide](https://github.com/pytorch/xla/edit/document_xla_override/CODEGEN_MIGRATION_GUIDE.md) for more instructions. +Furthermore, if possible, we want to lower operations to use `full_codegen` see our [codegen migration guide](https://github.com/pytorch/xla/edit/document_xla_override/CODEGEN_MIGRATION_GUIDE.md) for more instructions. ## Before you start You should follow the instructions in [here](https://github.com/pytorch/xla/blob/master/CONTRIBUTING.md) to install required dependencies and build pytorch and pytorch/XLA from the source. You do not need access to TPU to implement the lowering. It is recommended to experiment on a workstation and configure it to use XLA:CPU. You can configure Pytorch/XLA to use XLA:CPU by running From fa1502fd639d3eb0b9f0cf3fc31377bce0423eff Mon Sep 17 00:00:00 2001 From: Pedro Goncalves Mokarzel Date: Wed, 5 Mar 2025 21:50:05 -0800 Subject: [PATCH 4/8] Update OP_LOWERING_GUIDE.md --- OP_LOWERING_GUIDE.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/OP_LOWERING_GUIDE.md b/OP_LOWERING_GUIDE.md index 56f710d55aa5..80a91d2f84bd 100644 --- a/OP_LOWERING_GUIDE.md +++ b/OP_LOWERING_GUIDE.md @@ -91,3 +91,10 @@ The codegen will automatically generate lowerings for `lerp_.Scalar` and `lerp.S In general, if there is an operator in pytorch core that has both an out-of-place and an out= variant, it's better to write a lowering for the out-of-place variant, since you'll get a code-generated out= lowering for free. For each node we need to pass an `ir::OpKind`. Here is an ([example](https://github.com/pytorch/xla/blob/5ce99bff336325feb41a982dc80299fb53166b29/torch_xla/csrc/ops/var_mean.cpp#L36)). You can find the `OpKind` definition in [interned_strings.h](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/core/interned_strings.h). If the aten symbol is missing, you can submit a PR like [this](https://github.com/pytorch/pytorch/pull/36851). + +## Overriding `XLA` Flag +In certain cases, it might be that we need to manually override the `XLA` key implementation of an operation. Ideally codegeneration would handle this, but it is useful to know how to handle an unfortunate edge case. + +If you need to override the `XLA` flag you can do this through macros in the [xla_manual_registration.cpp](https://github.com/pytorch/xla/blob/master/torch_xla/csrc/xla_manual_registration.cpp) file. + +You can use the https://github.com/pytorch/xla/pull/8801 PR for reference on what files to change. From 5c160d24dfb8c0b91671179760dd0c168fdf4e2d Mon Sep 17 00:00:00 2001 From: Pedro Goncalves Mokarzel Date: Wed, 5 Mar 2025 21:53:39 -0800 Subject: [PATCH 5/8] Update troubleshoot.md --- docs/source/learn/troubleshoot.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/source/learn/troubleshoot.md b/docs/source/learn/troubleshoot.md index fdc97f8a0b8c..22377ecbb5c8 100644 --- a/docs/source/learn/troubleshoot.md +++ b/docs/source/learn/troubleshoot.md @@ -380,6 +380,11 @@ We don't expect users to use tools in this section to debug their models. But we might ask for them when you submit a bug report since they provide additional information that metrics report doesn't have. +### Debugging Tensor Operations + +The following tools are useful for gathering information on the execution +of lowered operations. + - `print(torch_xla._XLAC._get_xla_tensors_text([res]))` where `res` is the result tensor prints out the IR. - `print(torch_xla._XLAC._get_xla_tensors_hlo([res]))` where `res` is From 2a4ce6d13ce00181923afb600d38ec52331d4e4a Mon Sep 17 00:00:00 2001 From: Pedro Goncalves Mokarzel Date: Fri, 7 Mar 2025 11:20:51 -0800 Subject: [PATCH 6/8] Update OP_LOWERING_GUIDE.md --- OP_LOWERING_GUIDE.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/OP_LOWERING_GUIDE.md b/OP_LOWERING_GUIDE.md index 80a91d2f84bd..7ff427d1dbdc 100644 --- a/OP_LOWERING_GUIDE.md +++ b/OP_LOWERING_GUIDE.md @@ -92,9 +92,9 @@ In general, if there is an operator in pytorch core that has both an out-of-plac For each node we need to pass an `ir::OpKind`. Here is an ([example](https://github.com/pytorch/xla/blob/5ce99bff336325feb41a982dc80299fb53166b29/torch_xla/csrc/ops/var_mean.cpp#L36)). You can find the `OpKind` definition in [interned_strings.h](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/core/interned_strings.h). If the aten symbol is missing, you can submit a PR like [this](https://github.com/pytorch/pytorch/pull/36851). -## Overriding `XLA` Flag +## Overriding `XLA` Dispatch Key In certain cases, it might be that we need to manually override the `XLA` key implementation of an operation. Ideally codegeneration would handle this, but it is useful to know how to handle an unfortunate edge case. -If you need to override the `XLA` flag you can do this through macros in the [xla_manual_registration.cpp](https://github.com/pytorch/xla/blob/master/torch_xla/csrc/xla_manual_registration.cpp) file. +If you need to override the `XLA` dispatch key you can do this through macros in the [xla_manual_registration.cpp](https://github.com/pytorch/xla/blob/master/torch_xla/csrc/xla_manual_registration.cpp) file. You can use the https://github.com/pytorch/xla/pull/8801 PR for reference on what files to change. From 417dfda818b9818a708d7593891f3e5aba707d82 Mon Sep 17 00:00:00 2001 From: Pedro Goncalves Mokarzel Date: Fri, 7 Mar 2025 11:46:31 -0800 Subject: [PATCH 7/8] Update codegen_migration.md --- docs/source/contribute/codegen_migration.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/docs/source/contribute/codegen_migration.md b/docs/source/contribute/codegen_migration.md index a84e5568ba5d..cadb15ec6feb 100644 --- a/docs/source/contribute/codegen_migration.md +++ b/docs/source/contribute/codegen_migration.md @@ -133,8 +133,17 @@ at::Tensor XLANativeFunctions::abs(const at::Tensor& self) { Find the op in `xla/codegen/xla_native_functions.yaml` and move it to the full_codegen column and run `python setup.py install` under xla directory again. The build will fail (reason explained later in this -guide) but you can still see the generated file. The code snippets below -uses `abs` as an example. \#### XLANativeFunctions.cpp +guide) but you can still see the generated file. + +If while generating the file you run into an error involving +[`shape_inference.h`](https://github.com/pytorch/pytorch/blob/main/torch/csrc/lazy/core/shape_inference.h), +you might be running into a problem with PyTorch not yet having the +necessary implementation for the function to be generated. You can +attempt to add the necessary function in +[`shape_inference.h`](https://github.com/pytorch/pytorch/blob/main/torch/csrc/lazy/core/shape_inference.h) +to be unblocked. + +The code snippets below uses `abs` as an example. \#### XLANativeFunctions.cpp ``` c++ at::Tensor XLANativeFunctions::abs(const at::Tensor & self) { From 3eca782d703c1dd3e3283ed0b889159c9ee67bed Mon Sep 17 00:00:00 2001 From: Pedro Goncalves Mokarzel Date: Fri, 7 Mar 2025 11:49:03 -0800 Subject: [PATCH 8/8] Update CODEGEN_MIGRATION_GUIDE.md --- CODEGEN_MIGRATION_GUIDE.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/CODEGEN_MIGRATION_GUIDE.md b/CODEGEN_MIGRATION_GUIDE.md index 0e8ebaf96855..dccffdd26520 100644 --- a/CODEGEN_MIGRATION_GUIDE.md +++ b/CODEGEN_MIGRATION_GUIDE.md @@ -76,11 +76,7 @@ at::Tensor XLANativeFunctions::abs(const at::Tensor& self) { ``` ### 2. Codegen the op and inspect the generated file -Find the op in `xla/codegen/xla_native_functions.yaml` and move it to the full_codegen column and run `python setup.py install` under xla directory again. The build will fail (reason explained later in this guide) but you can still see the generated file. - -If while generating the file you run into an error involving [`shape_inference.h`](https://github.com/pytorch/pytorch/blob/main/torch/csrc/lazy/core/shape_inference.h), you might be running into a problem with PyTorch not yet having the necessary implementation for the function to be generated. You can attempt to add the necessary function in[`shape_inference.h`](https://github.com/pytorch/pytorch/blob/main/torch/csrc/lazy/core/shape_inference.h) to be unblocked. - -The code snippets below uses `abs` as an example. +Find the op in `xla/codegen/xla_native_functions.yaml` and move it to the full_codegen column and run `python setup.py install` under xla directory again. The build will fail (reason explained later in this guide) but you can still see the generated file. The code snippets below uses `abs` as an example. #### XLANativeFunctions.cpp ``` at::Tensor XLANativeFunctions::abs(const at::Tensor & self) {