Skip to content

Fix backtrace deduplication for Clang debug info#4

Merged
Hzfengsy merged 1 commit into
mainfrom
clang_traceback
Jan 5, 2026
Merged

Fix backtrace deduplication for Clang debug info#4
Hzfengsy merged 1 commit into
mainfrom
clang_traceback

Conversation

@Hzfengsy
Copy link
Copy Markdown
Member

@Hzfengsy Hzfengsy commented Jan 5, 2026

Deduplicate stack frames by PC address to handle Clang's DWARF generation issues where multiple virtual frames are reported for the same program counter with incorrect source locations. Keep only the last frame for each unique PC as it typically represents the actual call site.

Also add "object.h" to the filename filter list to exclude Python object header frames from backtraces.

Deduplicate stack frames by PC address to handle Clang's DWARF generation
issues where multiple virtual frames are reported for the same program
counter with incorrect source locations. Keep only the last frame for each
unique PC as it typically represents the actual call site.

Also add "object.h" to the filename filter list to exclude Python object
header frames from backtraces.
Copilot AI review requested due to automatic review settings January 5, 2026 06:29
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Hzfengsy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the backtrace generation logic by introducing a frame deduplication strategy to mitigate issues with Clang's debug information, specifically when dealing with inlined functions. Additionally, it refines the backtrace filtering to exclude irrelevant Python internal frames, leading to more accurate and readable stack traces.

Highlights

  • Backtrace Deduplication: Implemented a new mechanism to deduplicate stack frames based on their Program Counter (PC) address. This addresses an issue with Clang's DWARF generation where multiple 'virtual' frames might be reported for the same PC. The logic retains only the last frame for each unique PC, which is typically the most accurate call site.
  • Filename Filter Enhancement: Added 'object.h' to the list of filenames to be filtered out from backtraces. This change aims to exclude Python object header frames, improving the clarity of generated stack traces.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix for backtrace deduplication to handle issues with Clang's debug information, and adds object.h to the filename filter list. The changes look good, but I have one suggestion to improve the robustness of the deduplication logic. The current implementation assumes that frames with the same program counter are always consecutive, which could be brittle. My suggestion provides a more robust solution that handles non-consecutive duplicate frames as well.

Comment thread src/core/backtrace.cpp
Comment on lines +147 to +156
std::vector<StackFrame> deduplicated_frames;
for (const auto& frame : reversed_frames) {
if (frame.pc != 0 && !deduplicated_frames.empty() && deduplicated_frames.back().pc == frame.pc) {
// Same PC as the previous frame - this is likely a spurious inline frame.
// Replace with the current frame (which is typically more accurate).
deduplicated_frames.back() = frame;
} else {
deduplicated_frames.push_back(frame);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current deduplication logic assumes that stack frames with the same program counter (PC) are always consecutive in the reversed_frames vector. While this might be true for the specific Clang DWARF issue this change is addressing, this assumption makes the implementation fragile. If frames with the same PC were to appear non-consecutively for any reason, this logic would fail to deduplicate them correctly.

A more robust approach would be to iterate through the frames from most-recent to least-recent and use a set to track seen PCs. This ensures that only the most recent frame for each unique PC is kept, regardless of its position in the stack trace. This would correctly handle both consecutive and non-consecutive duplicate frames.

Note: This change would require adding #include <unordered_set> at the top of the file.

  std::vector<StackFrame> deduplicated_frames;
  std::unordered_set<uintptr_t> seen_pcs;
  // Iterate from most-recent to least-recent to keep the most recent frame for each PC.
  for (auto it = reversed_frames.rbegin(); it != reversed_frames.rend(); ++it) {
    const auto& frame = *it;
    if (frame.pc == 0 || seen_pcs.find(frame.pc) == seen_pcs.end()) {
      deduplicated_frames.push_back(frame);
      if (frame.pc != 0) {
        seen_pcs.insert(frame.pc);
      }
    }
  }
  // Reverse to get the correct order for display (least-recent to most-recent).
  std::reverse(deduplicated_frames.begin(), deduplicated_frames.end());

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Clang's DWARF debug information generation issues by implementing PC-based deduplication of stack frames in backtraces, and adds Python's object.h to the filename filter list.

  • Implements deduplication logic to handle multiple virtual frames reported for the same program counter
  • Adds "object.h" to the backtrace filename filter to exclude Python C API header frames

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/core/backtrace.cpp
Comment on lines 148 to +156
for (const auto& frame : reversed_frames) {
if (frame.pc != 0 && !deduplicated_frames.empty() && deduplicated_frames.back().pc == frame.pc) {
// Same PC as the previous frame - this is likely a spurious inline frame.
// Replace with the current frame (which is typically more accurate).
deduplicated_frames.back() = frame;
} else {
deduplicated_frames.push_back(frame);
}
}
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deduplication implementation only removes consecutive frames with the same PC address. While this may be sufficient for Clang's inline frame generation (which likely produces consecutive duplicates), the PR description states "Deduplicate stack frames by PC address" and "keep only the last frame for each unique PC", which suggests complete deduplication is intended. If non-consecutive frames can have the same PC, consider using an unordered_map to track the last frame for each unique PC value. Otherwise, clarify in the comments that only consecutive duplicates are handled.

Copilot uses AI. Check for mistakes.
Comment thread src/core/backtrace.cpp
Comment on lines +142 to +156
// Deduplicate frames by PC address to handle Clang's debug info issues.
// When Clang generates DWARF info for inlined functions/templates, it may
// report multiple "virtual" frames for the same PC with incorrect source
// locations. We keep only the last frame for each unique PC, as that's
// typically the actual call site.
std::vector<StackFrame> deduplicated_frames;
for (const auto& frame : reversed_frames) {
if (frame.pc != 0 && !deduplicated_frames.empty() && deduplicated_frames.back().pc == frame.pc) {
// Same PC as the previous frame - this is likely a spurious inline frame.
// Replace with the current frame (which is typically more accurate).
deduplicated_frames.back() = frame;
} else {
deduplicated_frames.push_back(frame);
}
}
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new PC-based deduplication logic lacks test coverage. Consider adding a test case that verifies frames with duplicate PC addresses are correctly deduplicated, especially to ensure the behavior works as intended for Clang's DWARF generation issues. This could be done by creating a mock scenario with duplicate PC values or by testing with actual Clang-compiled code that exhibits the issue.

Copilot uses AI. Check for mistakes.
Comment thread src/core/backtrace.cpp
"pybind11", // Python binding layer
"__libc_", // C library internals
"include/c++/", // C++ standard library
"object.h", // Python object.h
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "Python object.h" could be more specific to clarify this filters out Python C API header frames, not a local header file. Consider changing to "Python C API object.h" or "CPython object.h" for clarity.

Suggested change
"object.h", // Python object.h
"object.h", // Python C API object.h (CPython header)

Copilot uses AI. Check for mistakes.
@Hzfengsy Hzfengsy merged commit ee0cb5e into main Jan 5, 2026
6 checks passed
@Hzfengsy Hzfengsy deleted the clang_traceback branch January 5, 2026 06:35
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Feb 22, 2026
- Fix conftest docstring to say BEFORE_AND_AFTER (#1, hw-native-sys#5, hw-native-sys#8)
- Align InitMemRef pass property table and docs with code (hw-native-sys#2)
- Add INTERNAL_CHECK for null pass result in Pass::operator() (hw-native-sys#3)
- Switch SSA assignment tracking to pointer identity (hw-native-sys#4)
- Add null instrument checks in RunBeforePass/RunAfterPass (hw-native-sys#6)
- Add strict=True to xfail with descriptive reason (hw-native-sys#7)
- Fix pass_manager.md verification mode to BEFORE_AND_AFTER (hw-native-sys#9)
- Add ExitContext stack invariant check (hw-native-sys#10)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Feb 22, 2026
- Use memory_order_relaxed for atomic ID counter (comment hw-native-sys#3)
- Fix UniqueId doc: "stable for the lifetime of the process" (comment hw-native-sys#2)
- Clarify ir.pyi docstring: variable identity is part of hash (comment #1)
- Fix Type overload docstring: note enable_auto_mapping scope (comment hw-native-sys#4)
- Remove redundant static_cast in hash_var_identity (comment hw-native-sys#5)
- Optimize triple As<T> to single GetKind() check (comment hw-native-sys#6)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Feb 26, 2026
- Add tuple arity validation in Function binding (comments hw-native-sys#2/hw-native-sys#4)
- Add strict validation in param_directions deserialization (comments hw-native-sys#3/hw-native-sys#5)
syfeng-bot pushed a commit that referenced this pull request Mar 1, 2026
- Fix closure variable merging order to follow lexical scoping (comment #1)
- Fix error.span isinstance check - span is ir.Span not dict (comment #6)
- Use forward-compatible AST field copying with ast.copy_location (comment #3)
- Add return handling in inline mode with early termination after return (comment #4)
- Prevent leaking inline function variables to caller scope (comments #2, #5, #7)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Mar 2, 2026
- Reorder __all__ to match import order (comment #1)
- Document all 5 system ops in en/zh-cn syntax docs (comment hw-native-sys#3)
- Fix frame_offset=2 in helper functions for accurate span capture (comment hw-native-sys#4)
- Exclude 'system' from unified dispatch to prevent misleading errors (comment hw-native-sys#5)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Mar 12, 2026
- Derive repo owner/name dynamically via gh repo view (hw-native-sys#2)
- Use comments(last: 1) to fetch latest comment per thread (hw-native-sys#4)
- Add pagination with pageInfo/endCursor for >100 threads (hw-native-sys#6)
- Extract run ID from check link field for gh run view (hw-native-sys#5, hw-native-sys#7)
- Document external check fallback (open link URL directly)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Mar 13, 2026
- Expanded GraphQL query for readability (gemini hw-native-sys#2)
- Added gh pr checks command to define $LINK variable (gemini #1, copilot hw-native-sys#7)
- Replaced grep -oP with portable sed for run ID extraction (gemini #1, copilot hw-native-sys#7)
- Fixed grep pattern to handle whitespace in JSON (copilot hw-native-sys#6)
- Replaced non-standard BRE with grep -E (copilot hw-native-sys#5)
- Added local reproduction fallback for CI logs (gemini hw-native-sys#3)
- Removed ellipsis placeholder, save output inline (copilot hw-native-sys#4)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request May 6, 2026
- inline pass: extend DefVarCollector + VarSubstituteMutator to cover
  WhileStmt return_vars and ForStmt return_vars (Copilot hw-native-sys#5).
- inline pass: reject inline bodies with non-trailing ReturnStmts via
  NestedReturnCounter — silent miscompile if hand-built IR ever does
  early-return inside an If/For branch (Gemini #1).
- inline pass: drop the stale "Inline function as program entry: detected
  before splicing; raises" doc-comment line; the actual behaviour is
  silent removal in the cleanup phase (Copilot hw-native-sys#6).
- verifier: also flag *dangling* Calls — Calls whose callee GlobalVar
  isn't in program->functions_ — with error_code 2. The previous code
  short-circuited when no Inline functions survived, missing the case
  where the InlineFunctions pass dropped the function but left a Call
  behind (Copilot hw-native-sys#4).
- JIT _resolve_int: handle ast.Pow (only with non-negative int exponents)
  and ast.UAdd (Gemini hw-native-sys#2).
- docs/zh-cn/dev/passes/22-lower_pipeline_loops.md: narrow
  CanonicalizeIOOrder scope wording from "全程序每一个 SeqStmts" to
  "ForKind::Pipeline 作用域内的 SeqStmts" (CodeRabbit hw-native-sys#9).
- examples/models/qwen3_jit/kernels/rmsnorm.py: cast resid chunks to FP32
  in post_rmsnorm to mirror input_rmsnorm and avoid BF16 accumulation
  precision loss (CodeRabbit hw-native-sys#11).
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request May 6, 2026
- inline pass: extend DefVarCollector + VarSubstituteMutator to cover
  WhileStmt return_vars and ForStmt return_vars (Copilot hw-native-sys#5).
- inline pass: reject inline bodies with non-trailing ReturnStmts via
  NestedReturnCounter — silent miscompile if hand-built IR ever does
  early-return inside an If/For branch (Gemini #1).
- inline pass: drop the stale "Inline function as program entry: detected
  before splicing; raises" doc-comment line; the actual behaviour is
  silent removal in the cleanup phase (Copilot hw-native-sys#6).
- verifier: also flag *dangling* Calls — Calls whose callee GlobalVar
  isn't in program->functions_ — with error_code 2. The previous code
  short-circuited when no Inline functions survived, missing the case
  where the InlineFunctions pass dropped the function but left a Call
  behind (Copilot hw-native-sys#4).
- JIT _resolve_int: handle ast.Pow (only with non-negative int exponents)
  and ast.UAdd (Gemini hw-native-sys#2).
- docs/zh-cn/dev/passes/22-lower_pipeline_loops.md: narrow
  CanonicalizeIOOrder scope wording from "全程序每一个 SeqStmts" to
  "ForKind::Pipeline 作用域内的 SeqStmts" (CodeRabbit hw-native-sys#9).
- examples/models/qwen3_jit/kernels/rmsnorm.py: cast resid chunks to FP32
  in post_rmsnorm to mirror input_rmsnorm and avoid BF16 accumulation
  precision loss (CodeRabbit hw-native-sys#11).
lyfne123 added a commit to lyfne123/pypto that referenced this pull request May 11, 2026
CI fix (root cause of pypto-lib-model + system-tests failures):

- P6's call-site injector previously inlined `tensor.as_layout(arg, DN)` into
  the kernel-call args directly, which the orchestration codegen rejects with
  `Call to '<callee>' arg N is neither a variable nor a recognized constant
  literal`. Refactor `CallSiteAsLayoutInjector` to operate at the statement
  level: for every AssignStmt / EvalStmt / ReturnStmt whose RHS targets a
  promoted callee, emit one `bridged_<param> = tensor.as_layout(arg, DN)`
  AssignStmt immediately before the call statement and replace the inline
  Call arg with the bound Var. Net IR is SSA-form and matches what
  orchestration codegen consumes per arg slot (Var | const-literal).

Review comments addressed:

- gemini #1: codegen `tensor.as_layout` now special-cases the identity flip
  (target layout == source layout) and emits a plain `Tensor result = input;`
  alias instead of a spurious `.transpose()`. Simplify still folds these
  before codegen in the default pipeline, but the codegen is now robust
  against ad-hoc compile paths that skip Simplify.
- coderabbitai hw-native-sys#2 / hw-native-sys#3: drop the "next default pass" wording in en/zh-cn doc
  17 — `MaterializeTensorStrides` runs later in the pipeline (after
  `CanonicalizeIOOrder`), not immediately after. The zh-cn doc's "17th pass"
  text is also clarified — the 17 is the docs/passes/ slot, not a literal
  pipeline call-count.
- coderabbitai hw-native-sys#4: `DeduceTensorAsLayoutType` now preserves the source
  TensorView's `valid_shape` (with trailing-pair swap on cross-layout flips)
  and `pad` through `tensor.as_layout`. Previously these fields were dropped,
  making the reinterpret silently lossy for sliced or fill-padded inputs.
- coderabbitai hw-native-sys#6: `MaterializeTensorStrides` direct-ctor rebuild path now
  forwards `op->attrs_`. The previous version preserved type and kwargs but
  dropped attrs, which would have silently discarded call metadata
  (arg_directions, manual_dep_edges) attached by earlier passes.
- coderabbitai hw-native-sys#7: update the stale comment block above `VisitExpr_` in
  `simplify_pass.cpp` — it still described the dropped shape-bearing
  `as_layout(x, shape, layout)` form and the never-implemented chain
  folding. New comment accurately describes the single identity-elimination
  rule and explains why chain folding is deferred.
- coderabbitai hw-native-sys#8 / hw-native-sys#9: the unit-test pattern that previously inspected an
  inline `tensor.as_layout` Call as a kernel-call arg no longer applies
  after the SSA refactor above. Tests now look up the bridge via
  `_find_assign_rhs(orch, var)` and guard `op is not None` before reading
  `op.name` (matching the defensive pattern already used in the B^T test).

Skipped (with reason):

- coderabbitai hw-native-sys#5: "Handle 3-arg `tile.load`". `tile.load` registers four
  mandatory args (tensor, offsets, shapes, valid_shapes) and the Python
  builder always materializes `valid_shapes` (defaults to `shapes` when the
  caller omits it). Once IR is constructed, every `tile.load` is 4-arg —
  the 3-arg form only exists at the DSL surface. The internal check stays
  as-is.
lyfne123 added a commit to lyfne123/pypto that referenced this pull request May 11, 2026
CI fix (root cause of pypto-lib-model + system-tests failures):

- P6's call-site injector previously inlined `tensor.as_layout(arg, DN)` into
  the kernel-call args directly, which the orchestration codegen rejects with
  `Call to '<callee>' arg N is neither a variable nor a recognized constant
  literal`. Refactor `CallSiteAsLayoutInjector` to operate at the statement
  level: for every AssignStmt / EvalStmt / ReturnStmt whose RHS targets a
  promoted callee, emit one `bridged_<param> = tensor.as_layout(arg, DN)`
  AssignStmt immediately before the call statement and replace the inline
  Call arg with the bound Var. Net IR is SSA-form and matches what
  orchestration codegen consumes per arg slot (Var | const-literal).

Review comments addressed:

- gemini #1: codegen `tensor.as_layout` now special-cases the identity flip
  (target layout == source layout) and emits a plain `Tensor result = input;`
  alias instead of a spurious `.transpose()`. Simplify still folds these
  before codegen in the default pipeline, but the codegen is now robust
  against ad-hoc compile paths that skip Simplify.
- coderabbitai hw-native-sys#2 / hw-native-sys#3: drop the "next default pass" wording in en/zh-cn doc
  17 — `MaterializeTensorStrides` runs later in the pipeline (after
  `CanonicalizeIOOrder`), not immediately after. The zh-cn doc's "17th pass"
  text is also clarified — the 17 is the docs/passes/ slot, not a literal
  pipeline call-count.
- coderabbitai hw-native-sys#4: `DeduceTensorAsLayoutType` now preserves the source
  TensorView's `valid_shape` (with trailing-pair swap on cross-layout flips)
  and `pad` through `tensor.as_layout`. Previously these fields were dropped,
  making the reinterpret silently lossy for sliced or fill-padded inputs.
- coderabbitai hw-native-sys#6: `MaterializeTensorStrides` direct-ctor rebuild path now
  forwards `op->attrs_`. The previous version preserved type and kwargs but
  dropped attrs, which would have silently discarded call metadata
  (arg_directions, manual_dep_edges) attached by earlier passes.
- coderabbitai hw-native-sys#7: update the stale comment block above `VisitExpr_` in
  `simplify_pass.cpp` — it still described the dropped shape-bearing
  `as_layout(x, shape, layout)` form and the never-implemented chain
  folding. New comment accurately describes the single identity-elimination
  rule and explains why chain folding is deferred.
- coderabbitai hw-native-sys#8 / hw-native-sys#9: the unit-test pattern that previously inspected an
  inline `tensor.as_layout` Call as a kernel-call arg no longer applies
  after the SSA refactor above. Tests now look up the bridge via
  `_find_assign_rhs(orch, var)` and guard `op is not None` before reading
  `op.name` (matching the defensive pattern already used in the B^T test).

Skipped (with reason):

- coderabbitai hw-native-sys#5: "Handle 3-arg `tile.load`". `tile.load` registers four
  mandatory args (tensor, offsets, shapes, valid_shapes) and the Python
  builder always materializes `valid_shapes` (defaults to `shapes` when the
  caller omits it). Once IR is constructed, every `tile.load` is 4-arg —
  the 3-arg form only exists at the DSL surface. The internal check stays
  as-is.
lyfne123 added a commit to lyfne123/pypto that referenced this pull request May 12, 2026
CI fix (root cause of pypto-lib-model + system-tests failures):

- P6's call-site injector previously inlined `tensor.as_layout(arg, DN)` into
  the kernel-call args directly, which the orchestration codegen rejects with
  `Call to '<callee>' arg N is neither a variable nor a recognized constant
  literal`. Refactor `CallSiteAsLayoutInjector` to operate at the statement
  level: for every AssignStmt / EvalStmt / ReturnStmt whose RHS targets a
  promoted callee, emit one `bridged_<param> = tensor.as_layout(arg, DN)`
  AssignStmt immediately before the call statement and replace the inline
  Call arg with the bound Var. Net IR is SSA-form and matches what
  orchestration codegen consumes per arg slot (Var | const-literal).

Review comments addressed:

- gemini #1: codegen `tensor.as_layout` now special-cases the identity flip
  (target layout == source layout) and emits a plain `Tensor result = input;`
  alias instead of a spurious `.transpose()`. Simplify still folds these
  before codegen in the default pipeline, but the codegen is now robust
  against ad-hoc compile paths that skip Simplify.
- coderabbitai hw-native-sys#2 / hw-native-sys#3: drop the "next default pass" wording in en/zh-cn doc
  17 — `MaterializeTensorStrides` runs later in the pipeline (after
  `CanonicalizeIOOrder`), not immediately after. The zh-cn doc's "17th pass"
  text is also clarified — the 17 is the docs/passes/ slot, not a literal
  pipeline call-count.
- coderabbitai hw-native-sys#4: `DeduceTensorAsLayoutType` now preserves the source
  TensorView's `valid_shape` (with trailing-pair swap on cross-layout flips)
  and `pad` through `tensor.as_layout`. Previously these fields were dropped,
  making the reinterpret silently lossy for sliced or fill-padded inputs.
- coderabbitai hw-native-sys#6: `MaterializeTensorStrides` direct-ctor rebuild path now
  forwards `op->attrs_`. The previous version preserved type and kwargs but
  dropped attrs, which would have silently discarded call metadata
  (arg_directions, manual_dep_edges) attached by earlier passes.
- coderabbitai hw-native-sys#7: update the stale comment block above `VisitExpr_` in
  `simplify_pass.cpp` — it still described the dropped shape-bearing
  `as_layout(x, shape, layout)` form and the never-implemented chain
  folding. New comment accurately describes the single identity-elimination
  rule and explains why chain folding is deferred.
- coderabbitai hw-native-sys#8 / hw-native-sys#9: the unit-test pattern that previously inspected an
  inline `tensor.as_layout` Call as a kernel-call arg no longer applies
  after the SSA refactor above. Tests now look up the bridge via
  `_find_assign_rhs(orch, var)` and guard `op is not None` before reading
  `op.name` (matching the defensive pattern already used in the B^T test).

Skipped (with reason):

- coderabbitai hw-native-sys#5: "Handle 3-arg `tile.load`". `tile.load` registers four
  mandatory args (tensor, offsets, shapes, valid_shapes) and the Python
  builder always materializes `valid_shapes` (defaults to `shapes` when the
  caller omits it). Once IR is constructed, every `tile.load` is 4-arg —
  the 3-arg form only exists at the DSL surface. The internal check stays
  as-is.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants