Fix: add mandatory arg_index to l3_l2_orch_comm test#1171
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe test case in ChangesL3-L2 orchestration communication test
Sequence Diagram(s)Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the test configuration in test_l3_l2_orch_comm.py by adding the arg_index=[0, 1] parameter to the CoreCallable.build call. There are no review comments, and I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
PR hw-native-sys#1015 added the l3_l2_orch_comm family (a5 scene test + a2a3 stream example) with CoreCallable.build calls that omit arg_index. PR hw-native-sys#1123 had already made arg_index mandatory (parallel to signature, equal length) and removed the contiguous fallback, so the two merged without reconciling. Every PR that merges against current main then fails at build time with: ValueError: CoreCallable.build: arg_index is required and must be parallel to signature (equal length) affecting st-sim-a5 (the a5 test) and st-sim-a2a3 / st-onboard-a2a3 (the a2a3 example). Declare arg_index=[0, 1] for signature=[D.IN, D.OUT] at both sites, matching the explicit slot mapping every other migrated incore uses: - tests/st/a5/.../l3_l2_orch_comm/test_l3_l2_orch_comm.py - examples/a2a3/.../l3_l2_orch_comm_stream/l3_l2_orch_comm_stream.py
689d715 to
5f06ff0
Compare
The args dump no longer needs a per-incore arg_index to map each declared tensor to a payload slot. Each incore declares its full (mix-task) signature and the dump maps signature entry i to payload slot i positionally; every record carries the task's active-subtask set as a func_id array (its mix membership) rather than a single scalar func_id. This supersedes the hw-native-sys#1123 arg_index mechanism (and the hw-native-sys#1171 follow-up that backfilled arg_index for l3_l2_orch_comm), tailored for the upcoming l0_swimlane tool, which reconstructs and replays a whole mix task rather than one kernel. - dump: a single positional walk over the first active subtask's signature; each payload tensor is emitted once, stamped with the func array (no per-subtask geometry duplication). Slots beyond the payload (a prefix-dispatched task) are skipped. - record/info/DumpedTensor: func_id scalar -> func_ids[3] + func_count (reuses the existing pad, 128B record unchanged; TENSOR_DUMP_MAX_FUNC_IDS is tied to PTO2_SUBTASK_SLOT_COUNT by a static_assert). - args_dump.json: func_id is now an array ([0,1,2] for a mix, [0] for a single-kernel task). - CoreCallable.build / make_callable: drop the arg_index parameter, field, and accessor; scene_test and the binding stop threading it. - migrate every CALLABLE incore / CoreCallable.build repo-wide to drop arg_index; complete the offset-mix signatures (mixed_example a2a3/a5, l2_swimlane_mixed) to full width so positional mapping covers payload. - docs/dfx/args-dump.md updated for the func_id array + positional model. Verified on a2a3 silicon (--dump-args 3, golden PASS, func_id arrays correct, each slot emitted once): mixed_example (offset mix, 108 records), l2_swimlane_mixed, spmd_basic (cooperative mix), l3_l2_orch_comm_stream (hw-native-sys#1171 site), dummy_task.
The args dump no longer needs a per-incore arg_index to map each declared tensor to a payload slot. Each incore declares its full (mix-task) signature and the dump maps signature entry i to payload slot i positionally; every record carries the task's active-subtask set as a func_id array (its mix membership) rather than a single scalar func_id. This supersedes the hw-native-sys#1123 arg_index mechanism (and the hw-native-sys#1171 follow-up that backfilled arg_index for l3_l2_orch_comm), tailored for the upcoming l0_swimlane tool, which reconstructs and replays a whole mix task rather than one kernel. - dump: a single positional walk over the first active subtask's signature; each payload tensor is emitted once, stamped with the func array (no per-subtask geometry duplication). Slots beyond the payload (a prefix-dispatched task) are skipped. - record/info/DumpedTensor: func_id scalar -> func_ids[3] + func_count (reuses the existing pad, 128B record unchanged; TENSOR_DUMP_MAX_FUNC_IDS is tied to PTO2_SUBTASK_SLOT_COUNT by a static_assert). - args_dump.json: func_id is now an array ([0,1,2] for a mix, [0] for a single-kernel task). - CoreCallable.build / make_callable: drop the arg_index parameter, field, and accessor; scene_test and the binding stop threading it. - migrate every CALLABLE incore / CoreCallable.build repo-wide to drop arg_index; complete the offset-mix signatures (mixed_example a2a3/a5, l2_swimlane_mixed) to full width so positional mapping covers payload. - docs/dfx/args-dump.md updated for the func_id array + positional model. Verified on a2a3 silicon (--dump-args 3, golden PASS, func_id arrays correct, each slot emitted once): mixed_example (offset mix, 108 records), l2_swimlane_mixed, spmd_basic (cooperative mix), l3_l2_orch_comm_stream (hw-native-sys#1171 site), dummy_task.
The args dump no longer needs a per-incore arg_index to map each declared tensor to a payload slot. Each incore declares its full (mix-task) signature and the dump maps signature entry i to payload slot i positionally; every record carries the task's active-subtask set as a func_id array (its mix membership) rather than a single scalar func_id. This supersedes the #1123 arg_index mechanism (and the #1171 follow-up that backfilled arg_index for l3_l2_orch_comm), tailored for the upcoming l0_swimlane tool, which reconstructs and replays a whole mix task rather than one kernel. - dump: a single positional walk over the first active subtask's signature; each payload tensor is emitted once, stamped with the func array (no per-subtask geometry duplication). Slots beyond the payload (a prefix-dispatched task) are skipped. - record/info/DumpedTensor: func_id scalar -> func_ids[3] + func_count (reuses the existing pad, 128B record unchanged; TENSOR_DUMP_MAX_FUNC_IDS is tied to PTO2_SUBTASK_SLOT_COUNT by a static_assert). - args_dump.json: func_id is now an array ([0,1,2] for a mix, [0] for a single-kernel task). - CoreCallable.build / make_callable: drop the arg_index parameter, field, and accessor; scene_test and the binding stop threading it. - migrate every CALLABLE incore / CoreCallable.build repo-wide to drop arg_index; complete the offset-mix signatures (mixed_example a2a3/a5, l2_swimlane_mixed) to full width so positional mapping covers payload. - docs/dfx/args-dump.md updated for the func_id array + positional model. Verified on a2a3 silicon (--dump-args 3, golden PASS, func_id arrays correct, each slot emitted once): mixed_example (offset mix, 108 records), l2_swimlane_mixed, spmd_basic (cooperative mix), l3_l2_orch_comm_stream (#1171 site), dummy_task.
…sys#1171) PR hw-native-sys#1015 added the l3_l2_orch_comm family (a5 scene test + a2a3 stream example) with CoreCallable.build calls that omit arg_index. PR hw-native-sys#1123 had already made arg_index mandatory (parallel to signature, equal length) and removed the contiguous fallback, so the two merged without reconciling. Every PR that merges against current main then fails at build time with: ValueError: CoreCallable.build: arg_index is required and must be parallel to signature (equal length) affecting st-sim-a5 (the a5 test) and st-sim-a2a3 / st-onboard-a2a3 (the a2a3 example). Declare arg_index=[0, 1] for signature=[D.IN, D.OUT] at both sites, matching the explicit slot mapping every other migrated incore uses: - tests/st/a5/.../l3_l2_orch_comm/test_l3_l2_orch_comm.py - examples/a2a3/.../l3_l2_orch_comm_stream/l3_l2_orch_comm_stream.py
…native-sys#1181) The args dump no longer needs a per-incore arg_index to map each declared tensor to a payload slot. Each incore declares its full (mix-task) signature and the dump maps signature entry i to payload slot i positionally; every record carries the task's active-subtask set as a func_id array (its mix membership) rather than a single scalar func_id. This supersedes the hw-native-sys#1123 arg_index mechanism (and the hw-native-sys#1171 follow-up that backfilled arg_index for l3_l2_orch_comm), tailored for the upcoming l0_swimlane tool, which reconstructs and replays a whole mix task rather than one kernel. - dump: a single positional walk over the first active subtask's signature; each payload tensor is emitted once, stamped with the func array (no per-subtask geometry duplication). Slots beyond the payload (a prefix-dispatched task) are skipped. - record/info/DumpedTensor: func_id scalar -> func_ids[3] + func_count (reuses the existing pad, 128B record unchanged; TENSOR_DUMP_MAX_FUNC_IDS is tied to PTO2_SUBTASK_SLOT_COUNT by a static_assert). - args_dump.json: func_id is now an array ([0,1,2] for a mix, [0] for a single-kernel task). - CoreCallable.build / make_callable: drop the arg_index parameter, field, and accessor; scene_test and the binding stop threading it. - migrate every CALLABLE incore / CoreCallable.build repo-wide to drop arg_index; complete the offset-mix signatures (mixed_example a2a3/a5, l2_swimlane_mixed) to full width so positional mapping covers payload. - docs/dfx/args-dump.md updated for the func_id array + positional model. Verified on a2a3 silicon (--dump-args 3, golden PASS, func_id arrays correct, each slot emitted once): mixed_example (offset mix, 108 records), l2_swimlane_mixed, spmd_basic (cooperative mix), l3_l2_orch_comm_stream (hw-native-sys#1171 site), dummy_task.
Summary
Fixes the
st-sim-a5breakage onmain:test_l3_l2_orch_comm.pycallsCoreCallable.buildwithout the mandatoryarg_indexintroduced by #1123, so it raises at build time and fails on every PR merging against current main.arg_index=[0, 1]forsignature=[D.IN, D.OUT]— the explicit per-slot payload mapping every other migrated incore already uses.Testing
st-sim-a5(the failing job) — relying on CI; the omitted argument is the sole cause and the value matches thesignature=[D.IN, D.OUT]ordering.Fixes #1170