Skip to content

Fix: add mandatory arg_index to l3_l2_orch_comm test#1171

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:fix/l3-l2-orch-comm-arg-index
Jun 26, 2026
Merged

Fix: add mandatory arg_index to l3_l2_orch_comm test#1171
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoWao:fix/l3-l2-orch-comm-arg-index

Conversation

@ChaoWao

@ChaoWao ChaoWao commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes the st-sim-a5 breakage on main: test_l3_l2_orch_comm.py calls CoreCallable.build without the mandatory arg_index introduced by #1123, so it raises at build time and fails on every PR merging against current main.

Testing

  • st-sim-a5 (the failing job) — relying on CI; the omitted argument is the sole cause and the value matches the signature=[D.IN, D.OUT] ordering.

Fixes #1170

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 38fe3b2a-8863-4e49-923f-f8ae29688c56

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The test case in tests/st/a5/tensormap_and_ringbuffer/l3_l2_orch_comm/test_l3_l2_orch_comm.py now passes arg_index=[0, 1] to CoreCallable.build together with signature=[D.IN, D.OUT].

Changes

L3-L2 orchestration communication test

Layer / File(s) Summary
Explicit child argument mapping
tests/st/a5/tensormap_and_ringbuffer/l3_l2_orch_comm/test_l3_l2_orch_comm.py
The CoreCallable.build(...) child configuration adds arg_index=[0, 1] next to signature=[D.IN, D.OUT].

Sequence Diagram(s)

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

🐰 I hop through tests with ears held high,
An index now sits by IN and OUT nearby.
One tiny tweak, a tidy little clue,
And the orchard path runs straight and true.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly states the main fix: adding the required arg_index in the l3_l2_orch_comm test.
Description check ✅ Passed The description is directly related to the test fix and explains the breakage and the intended correction.
Linked Issues check ✅ Passed The change matches #1170 by adding arg_index=[0, 1] alongside signature=[D.IN, D.OUT] in the affected test.
Out of Scope Changes check ✅ Passed The PR appears scoped to the single required test change with no unrelated modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the test configuration in test_l3_l2_orch_comm.py by adding the arg_index=[0, 1] parameter to the CoreCallable.build call. There are no review comments, and I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

PR hw-native-sys#1015 added the l3_l2_orch_comm family (a5 scene test + a2a3 stream
example) with CoreCallable.build calls that omit arg_index. PR hw-native-sys#1123 had
already made arg_index mandatory (parallel to signature, equal length)
and removed the contiguous fallback, so the two merged without
reconciling. Every PR that merges against current main then fails at
build time with:

  ValueError: CoreCallable.build: arg_index is required and must be
  parallel to signature (equal length)

affecting st-sim-a5 (the a5 test) and st-sim-a2a3 / st-onboard-a2a3
(the a2a3 example).

Declare arg_index=[0, 1] for signature=[D.IN, D.OUT] at both sites,
matching the explicit slot mapping every other migrated incore uses:
- tests/st/a5/.../l3_l2_orch_comm/test_l3_l2_orch_comm.py
- examples/a2a3/.../l3_l2_orch_comm_stream/l3_l2_orch_comm_stream.py
@ChaoWao ChaoWao force-pushed the fix/l3-l2-orch-comm-arg-index branch from 689d715 to 5f06ff0 Compare June 26, 2026 08:32
@ChaoWao ChaoWao merged commit a27d9f8 into hw-native-sys:main Jun 26, 2026
16 checks passed
@ChaoWao ChaoWao deleted the fix/l3-l2-orch-comm-arg-index branch June 26, 2026 08:45
indigo1973 added a commit to indigo1973/simpler that referenced this pull request Jun 27, 2026
The args dump no longer needs a per-incore arg_index to map each
declared tensor to a payload slot. Each incore declares its full
(mix-task) signature and the dump maps signature entry i to payload
slot i positionally; every record carries the task's active-subtask
set as a func_id array (its mix membership) rather than a single
scalar func_id.

This supersedes the hw-native-sys#1123 arg_index mechanism (and the hw-native-sys#1171 follow-up
that backfilled arg_index for l3_l2_orch_comm), tailored for the
upcoming l0_swimlane tool, which reconstructs and replays a whole mix
task rather than one kernel.

- dump: a single positional walk over the first active subtask's
  signature; each payload tensor is emitted once, stamped with the
  func array (no per-subtask geometry duplication). Slots beyond the
  payload (a prefix-dispatched task) are skipped.
- record/info/DumpedTensor: func_id scalar -> func_ids[3] + func_count
  (reuses the existing pad, 128B record unchanged; TENSOR_DUMP_MAX_FUNC_IDS
  is tied to PTO2_SUBTASK_SLOT_COUNT by a static_assert).
- args_dump.json: func_id is now an array ([0,1,2] for a mix, [0] for a
  single-kernel task).
- CoreCallable.build / make_callable: drop the arg_index parameter,
  field, and accessor; scene_test and the binding stop threading it.
- migrate every CALLABLE incore / CoreCallable.build repo-wide to drop
  arg_index; complete the offset-mix signatures (mixed_example a2a3/a5,
  l2_swimlane_mixed) to full width so positional mapping covers payload.
- docs/dfx/args-dump.md updated for the func_id array + positional model.

Verified on a2a3 silicon (--dump-args 3, golden PASS, func_id arrays
correct, each slot emitted once): mixed_example (offset mix, 108
records), l2_swimlane_mixed, spmd_basic (cooperative mix),
l3_l2_orch_comm_stream (hw-native-sys#1171 site), dummy_task.
indigo1973 added a commit to indigo1973/simpler that referenced this pull request Jun 29, 2026
The args dump no longer needs a per-incore arg_index to map each
declared tensor to a payload slot. Each incore declares its full
(mix-task) signature and the dump maps signature entry i to payload
slot i positionally; every record carries the task's active-subtask
set as a func_id array (its mix membership) rather than a single
scalar func_id.

This supersedes the hw-native-sys#1123 arg_index mechanism (and the hw-native-sys#1171 follow-up
that backfilled arg_index for l3_l2_orch_comm), tailored for the
upcoming l0_swimlane tool, which reconstructs and replays a whole mix
task rather than one kernel.

- dump: a single positional walk over the first active subtask's
  signature; each payload tensor is emitted once, stamped with the
  func array (no per-subtask geometry duplication). Slots beyond the
  payload (a prefix-dispatched task) are skipped.
- record/info/DumpedTensor: func_id scalar -> func_ids[3] + func_count
  (reuses the existing pad, 128B record unchanged; TENSOR_DUMP_MAX_FUNC_IDS
  is tied to PTO2_SUBTASK_SLOT_COUNT by a static_assert).
- args_dump.json: func_id is now an array ([0,1,2] for a mix, [0] for a
  single-kernel task).
- CoreCallable.build / make_callable: drop the arg_index parameter,
  field, and accessor; scene_test and the binding stop threading it.
- migrate every CALLABLE incore / CoreCallable.build repo-wide to drop
  arg_index; complete the offset-mix signatures (mixed_example a2a3/a5,
  l2_swimlane_mixed) to full width so positional mapping covers payload.
- docs/dfx/args-dump.md updated for the func_id array + positional model.

Verified on a2a3 silicon (--dump-args 3, golden PASS, func_id arrays
correct, each slot emitted once): mixed_example (offset mix, 108
records), l2_swimlane_mixed, spmd_basic (cooperative mix),
l3_l2_orch_comm_stream (hw-native-sys#1171 site), dummy_task.
ChaoZheng109 pushed a commit that referenced this pull request Jun 29, 2026
The args dump no longer needs a per-incore arg_index to map each
declared tensor to a payload slot. Each incore declares its full
(mix-task) signature and the dump maps signature entry i to payload
slot i positionally; every record carries the task's active-subtask
set as a func_id array (its mix membership) rather than a single
scalar func_id.

This supersedes the #1123 arg_index mechanism (and the #1171 follow-up
that backfilled arg_index for l3_l2_orch_comm), tailored for the
upcoming l0_swimlane tool, which reconstructs and replays a whole mix
task rather than one kernel.

- dump: a single positional walk over the first active subtask's
  signature; each payload tensor is emitted once, stamped with the
  func array (no per-subtask geometry duplication). Slots beyond the
  payload (a prefix-dispatched task) are skipped.
- record/info/DumpedTensor: func_id scalar -> func_ids[3] + func_count
  (reuses the existing pad, 128B record unchanged; TENSOR_DUMP_MAX_FUNC_IDS
  is tied to PTO2_SUBTASK_SLOT_COUNT by a static_assert).
- args_dump.json: func_id is now an array ([0,1,2] for a mix, [0] for a
  single-kernel task).
- CoreCallable.build / make_callable: drop the arg_index parameter,
  field, and accessor; scene_test and the binding stop threading it.
- migrate every CALLABLE incore / CoreCallable.build repo-wide to drop
  arg_index; complete the offset-mix signatures (mixed_example a2a3/a5,
  l2_swimlane_mixed) to full width so positional mapping covers payload.
- docs/dfx/args-dump.md updated for the func_id array + positional model.

Verified on a2a3 silicon (--dump-args 3, golden PASS, func_id arrays
correct, each slot emitted once): mixed_example (offset mix, 108
records), l2_swimlane_mixed, spmd_basic (cooperative mix),
l3_l2_orch_comm_stream (#1171 site), dummy_task.
doraemonmj pushed a commit to doraemonmj/simpler_wc that referenced this pull request Jul 1, 2026
…sys#1171)

PR hw-native-sys#1015 added the l3_l2_orch_comm family (a5 scene test + a2a3 stream
example) with CoreCallable.build calls that omit arg_index. PR hw-native-sys#1123 had
already made arg_index mandatory (parallel to signature, equal length)
and removed the contiguous fallback, so the two merged without
reconciling. Every PR that merges against current main then fails at
build time with:

  ValueError: CoreCallable.build: arg_index is required and must be
  parallel to signature (equal length)

affecting st-sim-a5 (the a5 test) and st-sim-a2a3 / st-onboard-a2a3
(the a2a3 example).

Declare arg_index=[0, 1] for signature=[D.IN, D.OUT] at both sites,
matching the explicit slot mapping every other migrated incore uses:
- tests/st/a5/.../l3_l2_orch_comm/test_l3_l2_orch_comm.py
- examples/a2a3/.../l3_l2_orch_comm_stream/l3_l2_orch_comm_stream.py
doraemonmj pushed a commit to doraemonmj/simpler_wc that referenced this pull request Jul 1, 2026
…native-sys#1181)

The args dump no longer needs a per-incore arg_index to map each
declared tensor to a payload slot. Each incore declares its full
(mix-task) signature and the dump maps signature entry i to payload
slot i positionally; every record carries the task's active-subtask
set as a func_id array (its mix membership) rather than a single
scalar func_id.

This supersedes the hw-native-sys#1123 arg_index mechanism (and the hw-native-sys#1171 follow-up
that backfilled arg_index for l3_l2_orch_comm), tailored for the
upcoming l0_swimlane tool, which reconstructs and replays a whole mix
task rather than one kernel.

- dump: a single positional walk over the first active subtask's
  signature; each payload tensor is emitted once, stamped with the
  func array (no per-subtask geometry duplication). Slots beyond the
  payload (a prefix-dispatched task) are skipped.
- record/info/DumpedTensor: func_id scalar -> func_ids[3] + func_count
  (reuses the existing pad, 128B record unchanged; TENSOR_DUMP_MAX_FUNC_IDS
  is tied to PTO2_SUBTASK_SLOT_COUNT by a static_assert).
- args_dump.json: func_id is now an array ([0,1,2] for a mix, [0] for a
  single-kernel task).
- CoreCallable.build / make_callable: drop the arg_index parameter,
  field, and accessor; scene_test and the binding stop threading it.
- migrate every CALLABLE incore / CoreCallable.build repo-wide to drop
  arg_index; complete the offset-mix signatures (mixed_example a2a3/a5,
  l2_swimlane_mixed) to full width so positional mapping covers payload.
- docs/dfx/args-dump.md updated for the func_id array + positional model.

Verified on a2a3 silicon (--dump-args 3, golden PASS, func_id arrays
correct, each slot emitted once): mixed_example (offset mix, 108
records), l2_swimlane_mixed, spmd_basic (cooperative mix),
l3_l2_orch_comm_stream (hw-native-sys#1171 site), dummy_task.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

st-sim-a5 broken on main: test_l3_l2_orch_comm omits mandatory arg_index (#1015 × #1123 merge skew)

1 participant