Skip to content

Conversation

vkovinicTT
Copy link

This PR fixes two related issues in PopulateXlaOpMetadata that prevented proper usage of CustomOpNameMetaData:

Fix 1: Support custom op names without stack frame locations

This will enable user to change the name of the op using CustomOpNameMetaData without necessarily having to add stack frame locations. Example of usage (if we pass 0 for max_stack_depth field it will just prepend custom prefix to the current op name):

torch_xla._XLAC._set_xla_custom_op_name_prefix(tensor, your_custom_name_prefix, 0)

Additionaly, this will guard the AddStackFrameLocations() function from passing invalid number for the max_stack_depth. If we were to pass number that is <= 0, we would get a segmentation fault due to improper iterator dereferencing (which could've happen before this change).

The AddStackFrameLocations() function in stack_frame_index_builder.cpp uses reverse iterators and assumes at least one iteration occurs:

  auto frame_it = frame_info.rbegin();
  for (; frame_it != frame_info.rend() && depth < max_stack_depth; ++frame_it) {
    // Loop never executes when max_stack_depth == 0
  }
  --frame_it;  // ← Segfault: iterator is still at rbegin(), decrement goes past-the-end
  metadata_to_populate.set_source_file(frame_it->file);  // ← Dereference invalid iterator

Fix 2: Prevent scope from overwriting custom metadata

Problem:

Even when users set custom metadata via _set_xla_custom_op_name_prefix(), the nmeta.scope field was unconditionally overwriting the custom op_name_prefix. This affected operations like add and mul which have scope set (e.g., aten::add.3), resulting in loss of user-provided semantic location information.

Changes:

  • Modified the condition from if (!nmeta.scope.empty()) to else if (!nmeta.scope.empty())
  • This ensures custom metadata takes precedence: custom metadata is used if available, otherwise scope is used as fallback

Precedence hierarchy (now correctly implemented):

  1. Custom user metadata (via SetUserMetadata APIs) - highest priority
  2. Scope-based naming (auto-generated by torch-xla) - fallback
  3. Bare op_type - default

@qihqi qihqi enabled auto-merge (squash) October 13, 2025 16:38
@vkovinicTT
Copy link
Author

Thanks for the fast review 🚀

@vkovinicTT
Copy link
Author

I noticed the tests have been canceled a few times - is that part of the normal process?

vkovinicTT added a commit to tenstorrent/tt-xla that referenced this pull request Oct 20, 2025
[Ticket](#1011)

### Problem

`HLO` operations in compiled graphs lack semantic context (module
hierarchy, source file/line), making debugging and profiling difficult.
PyTorch's FX graph captures this metadata, but it's lost during export
and execution since the generated `forward()` code is a flat sequence of
operations.

###  Solution

Inject FX metadata into lazy `IR nodes` at runtime using
`TorchDispatchMode` with a **counter-based** mapping approach:

1. **Compile-time**: Extract metadata from FX nodes after all passes
complete, building ordered list of semantic locations (format:
`ModuleClass[instance]/func_name(file.py:line)/`)
2. **Runtime**: Intercept operations during lazy graph construction via
`MetadataDispatchMode`, attaching metadata to XLA tensors using
**torch-xla**'s `_set_xla_custom_op_name_prefix` API

**The counter-based mapping** works because FX enforces topological node
ordering, code generation preserves this order, and `TorchDispatchMode`
intercepts operations in execution order—maintaining a 1:1
correspondence between FX nodes and dispatched operations.

###  Key Changes

- `utils.py`: Added `extract_nodes_info()` for metadata extraction and
`MetadataDispatchMode` for runtime interception
- `backend.py`: Integrated metadata extraction in
`torch_pass_pipeline()` after `recompile()`, and metadata injection in
`XLAExecutor`
  - Controlled via `XLA_HLO_DEBUG=1` environment variable

 ### Important Notes
- In order for this to work I also had to make changes in the
`pytorch-xla` repo. I've already merged the PR in our fork, and
[here](pytorch/xla#9676) is the opened PR in the
`pytorch/xla` repo (which has been approved 🚀).
- Any future FX passes **MUST** be added before
`compiled_graph.recompile()` and `extract_nodes_info()`. Extracting
metadata before passes complete causes misalignment between FX node
order and runtime execution order.

###  Result


[Here](https://gist.github.com/vkovinicTT/212b50c0e4382d54494a28b436daf0ee)
is the example of the model, and
[here](https://gist.github.com/vkovinicTT/efe3e5b51e4f08abab8013ee2c340c70)
are the locs that we will get in TTIR with this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants