feat(runtime): unify runtime_env ring sizing into one int-or-list field by ChaoZheng109 · Pull Request #1128 · hw-native-sys/simpler

ChaoZheng109 · 2026-06-24T06:28:56Z

Closes #1126.

Problem

#1099 exposed ring sizing through two near-identical CallConfig.runtime_env names per resource that differ only by a trailing s:

scalar (broadcast)	per-ring array
`ring_task_window`	`ring_task_windows`
`ring_heap`	`ring_heaps`
`ring_dep_pool`	`ring_dep_pools`

The one-letter difference is an ergonomics footgun (easy to mistype, silently accepted), and the layered "scalar baseline + per-ring override" semantics it bought are not worth the confusing twin names for this project's usage.

Change

Collapse each pair into a single field that accepts either an int (broadcast) or a 4-entry list (per-ring):

cfg.runtime_env.ring_task_window = 128             # broadcast to every ring
cfg.runtime_env.ring_task_window = [128, 0, 0, 0]  # per-ring; 0 falls through

Broadcast happens in the Python binding (int → [v, v, v, v]); the wire format now carries only the three 4-element arrays (12 × uint64, down from 15) and the getter always returns a 4-list.
A 0 entry falls through to PTO2_RING_* env → compile-time default. The separate scalar-CallConfig precedence tier is intentionally dropped (accepted trade-off): a 0 in a list can no longer fall back to a sibling scalar, only to env/default.
The internal C-API (run_prepared) and wire layout are internal-only (no external consumers; everything rebuilds together via pip install), so this is a clean break with no back-compat shim.

Surface (mirrored a2a3 ⇄ a5)

Core struct + validate + wire asserts (call_config.h); Python binding int|list property + repr (task_interface.cpp); wire pack/unpack (worker.py); scene-test parse (scene_test.py); internal C-API (pto_runtime_c_api.h, chip_worker.{h,cpp}, onboard+sim c_api_shared.cpp, both host_build_graph/runtime_maker.cpp); resolution (both tensormap_and_ringbuffer/host/runtime_maker.cpp); docs (both MULTI_RING.md); tests (test_call_config.cpp, test_chip_worker.py); and the l2/l3 per_task_runtime_env examples.

Test

tests/ut/py/test_chip_worker.py — 26 passed (defaults/roundtrip, validate rejects, mailbox wire roundtrip, length validation), updated to the unified API.
tests/ut/cpp/types/test_call_config.cpp — updated to the array struct (compiles clean; the local cpput run hits a pre-existing, change-unrelated gtest EqFailure link error that also fails on untouched targets like test_child_memory).
l2 + l3 per_task_runtime_env examples and the paged_attention* scene tests pass under a2a3sim, exercising the full path: binding → wire → C-API → resolve_ring_config → runtime → device sim. Scalar inputs correctly arrive as [v, v, v, v]; lists pass through; repr shows lists.

python examples/workers/l2/per_task_runtime_env/main.py -p a2a3sim -d 0
python examples/workers/l3/per_task_runtime_env/main.py -p a2a3sim -d 0

hw-native-sys#1099 exposed ring sizing through two near-identical CallConfig.runtime_env names per resource that differ only by a trailing `s` — `ring_task_window` (scalar broadcast) vs `ring_task_windows` (per-ring array), etc. The one-letter difference is an ergonomics footgun and the layered "scalar baseline + per-ring override" semantics it bought are not worth the confusing twin names. Collapse each pair into a single field that accepts EITHER a scalar (broadcast to every ring) OR a 4-entry list (per-ring): cfg.runtime_env.ring_task_window = 128 # broadcast cfg.runtime_env.ring_task_window = [128, 0, 0, 0] # per-ring; 0 falls through Broadcast happens in the Python binding (int -> [v, v, v, v]); the wire format now carries only the three 4-element arrays (12 uint64, down from 15) and the getter always returns a 4-list. A 0 entry falls through to PTO2_RING_* env -> compile-time default; the separate scalar-CallConfig precedence tier is dropped (accepted trade-off — a 0 in a list no longer falls back to a sibling scalar). The internal C-API (run_prepared) and wire layout are internal-only and rebuild together via pip install, so this is a clean break with no back-compat shim. Mirrored across a2a3/a5, both runtimes, bindings, scene-test parsing, docs, unit tests, and the per_task_runtime_env examples. Closes hw-native-sys#1126.

gemini-code-assist · 2026-06-24T06:28:59Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-06-24T06:29:16Z

📝 Walkthrough

Walkthrough

CallConfig.runtime_env ring-sizing fields (ring_task_window, ring_heap, ring_dep_pool) are unified from a dual scalar+plural-array design into a single field accepting either an int (broadcast to all rings) or a 4-entry list. The change propagates through the C++ struct, C ABI, runtime resolution in a2a3/a5, Python bindings, wire packing, scene-test parsing, unit tests, docs, and examples.

Changes

RuntimeEnv ring-field unification

Layer / File(s)	Summary
RuntimeEnv struct, constants, and validate `src/common/task_interface/call_config.h`	Removes `RUNTIME_ENV_SCALAR_FIELD_COUNT` and `RUNTIME_ENV_PER_RING_FIELD_GROUPS`, adds `RUNTIME_ENV_FIELD_GROUPS=3`, restructures `RuntimeEnv` to hold only per-ring arrays (`ring_task_window[N]`, `ring_heap[N]`, `ring_dep_pool[N]`), simplifies `any()` and `validate()` to iterate those arrays only.
C ABI signature updates `src/common/worker/pto_runtime_c_api.h`, `src/common/worker/chip_worker.h`, `src/common/worker/chip_worker.cpp`, `src/common/platform/onboard/host/c_api_shared.cpp`, `src/common/platform/sim/host/c_api_shared.cpp`	`run_prepared` and `bind_callable_to_runtime_impl` drop scalar `uint64_t` ring params and plural `ring_s` pointer params, accepting only `const uint64_t` for each ring field. `ChipWorker::run` removes local stack arrays and passes `config.runtime_env` array pointers directly.
resolve_ring_config and runtime_maker stubs (a2a3 & a5) `src/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp`, `src/a2a3/runtime/host_build_graph/host/runtime_maker.cpp`, `src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp`, `src/a5/runtime/host_build_graph/host/runtime_maker.cpp`	`resolve_ring_config` accepts only per-ring pointer arrays; replaces scalar-broadcast + separate-array logic with a per-ring loop applying overrides when pointer is non-null and value is non-zero. Both host_build_graph stubs update their unused parameter lists to match the new signature.
Python bindings, wire packing, and scene-test parsing `python/bindings/task_interface.cpp`, `python/simpler/worker.py`, `simpler_setup/scene_test.py`	Replaces `def_rw` scalar + plural vector properties with unified `def_prop_rw` accepting `int` (broadcast) or `list[int]` of length `RUNTIME_ENV_RING_COUNT`; updates `__repr__`. Wire unpacking slices a single `ring_values` sequence into per-ring lists. Scene-test config drops plural key support.
C++ and Python unit tests `tests/ut/cpp/types/test_call_config.cpp`, `tests/ut/py/test_chip_worker.py`	Updates all tests to use per-ring array field access, expect 4-entry list defaults, verify scalar broadcasting, and check mailbox roundtrip with singular field names. Removes `per_ring_any()` assertions.
MULTI_RING docs and per-task examples `src/a2a3/runtime/tensormap_and_ringbuffer/docs/MULTI_RING.md`, `src/a5/runtime/tensormap_and_ringbuffer/docs/MULTI_RING.md`, `examples/workers/l2/per_task_runtime_env/`, `examples/workers/l3/per_task_runtime_env/`	MULTI_RING docs update Runtime Overrides section for scalar-or-4-list semantics with `0` as fall-through. L2/L3 examples update `RING_FIELDS` to singular keys and populate configs with 4-entry lists under those keys.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

hw-native-sys/simpler#1099: Introduced the original scalar + plural per-ring array dual-field design in RuntimeEnv that this PR directly refactors away.
hw-native-sys/simpler#1122: Added the L2/L3 per-task runtime_env examples using the old plural keys (ring_task_windows/ring_heaps/ring_dep_pools) that are updated here to the new singular form.
hw-native-sys/simpler#1042: Modified the same ring_task_window/ring_heap/ring_dep_pool plumbing path through run_prepared → chip_worker → c_api_shared → runtime_maker that this PR reshapes.

Poem

🐇 Hop, hop — the rings collapse to one!
No more _windows vs _window confusion done.
A scalar broadcasts, a list per-depth stays true,
Zero falls through — the env picks up the cue.
Four entries aligned, the plural names are gone,
Clean fields from C struct to Python, carry on! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: unifying runtime_env ring sizing into a single int-or-list field.
Description check	✅ Passed	The description matches the change set and explains the int-or-list unification, precedence changes, and scope.
Linked Issues check	✅ Passed	The PR implements the `#1126` requirements: removes plural variants, supports int broadcast and 4-entry lists, and updates docs/tests/surface.
Out of Scope Changes check	✅ Passed	The changes stay within the ring-sizing API unification and related docs/tests/examples, with no obvious unrelated additions.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

🧹 Nitpick comments (1)

tests/ut/py/test_chip_worker.py (1)
84-130: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Add a mixed-zero per-ring case for the new fall-through contract.

Line 86 and the mailbox roundtrip cover all-zero defaults and fully populated lists, but they never exercise the key new behavior from this PR: a 4-entry list with some 0 elements should be accepted and preserved so those rings can fall through to env/default resolution. A regression that rejects or rewrites [0, 32, 0, 256] would still pass this suite today.
Suggested test shape
 def test_runtime_env_defaults_and_roundtrip(self):
     config = CallConfig()
@@
     config.runtime_env.ring_dep_pool = [64, 128, 256, 512]
@@
     assert "runtime_env.ring_dep_pool=[64, 128, 256, 512]" in r
+
+    config.runtime_env.ring_task_window = [0, 32, 0, 256]
+    config.runtime_env.ring_heap = [0, 2048, 0, 8192]
+    config.runtime_env.ring_dep_pool = [0, 128, 0, 512]
+    assert config.runtime_env.ring_task_window == [0, 32, 0, 256]
+    assert config.runtime_env.ring_heap == [0, 2048, 0, 8192]
+    assert config.runtime_env.ring_dep_pool == [0, 128, 0, 512]
+    config.validate()
Also applies to: 331-363
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/ut/py/test_chip_worker.py` around lines 84 - 130, Add a test in
test_runtime_env_defaults_and_roundtrip (or a nearby RuntimeEnv roundtrip test)
that assigns a 4-entry mixed-zero list such as [0, 32, 0, 256] to a per-ring
field like ring_task_window or ring_heap and asserts the exact list is preserved
after readback and validate(). This should exercise the new fall-through
contract without treating 0 as invalid or rewriting it, alongside the existing
RuntimeEnv and CallConfig roundtrip checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/ut/py/test_chip_worker.py`:
- Around line 84-130: Add a test in test_runtime_env_defaults_and_roundtrip (or
a nearby RuntimeEnv roundtrip test) that assigns a 4-entry mixed-zero list such
as [0, 32, 0, 256] to a per-ring field like ring_task_window or ring_heap and
asserts the exact list is preserved after readback and validate(). This should
exercise the new fall-through contract without treating 0 as invalid or
rewriting it, alongside the existing RuntimeEnv and CallConfig roundtrip checks.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 367cd3e0-dc60-44ef-980d-c817df063da4

📥 Commits

Reviewing files that changed from the base of the PR and between ae59a8e and ba585ea.

📒 Files selected for processing (21)

examples/workers/l2/per_task_runtime_env/README.md
examples/workers/l2/per_task_runtime_env/main.py
examples/workers/l3/per_task_runtime_env/README.md
examples/workers/l3/per_task_runtime_env/main.py
python/bindings/task_interface.cpp
python/simpler/worker.py
simpler_setup/scene_test.py
src/a2a3/runtime/host_build_graph/host/runtime_maker.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/docs/MULTI_RING.md
src/a2a3/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
src/a5/runtime/host_build_graph/host/runtime_maker.cpp
src/a5/runtime/tensormap_and_ringbuffer/docs/MULTI_RING.md
src/a5/runtime/tensormap_and_ringbuffer/host/runtime_maker.cpp
src/common/platform/onboard/host/c_api_shared.cpp
src/common/platform/sim/host/c_api_shared.cpp
src/common/task_interface/call_config.h
src/common/worker/chip_worker.cpp
src/common/worker/chip_worker.h
src/common/worker/pto_runtime_c_api.h
tests/ut/cpp/types/test_call_config.cpp
tests/ut/py/test_chip_worker.py

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

ChaoWao approved these changes Jun 24, 2026

View reviewed changes

ChaoWao merged commit c635484 into hw-native-sys:main Jun 24, 2026
16 checks passed

This was referenced Jun 24, 2026

[Optimization] Replace wiring with polling-based task readiness test (~17% median device speedup) #1137

Open

Refactor: move HostApi out of Runtime into a shared explicit parameter #1227

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(runtime): unify runtime_env ring sizing into one int-or-list field#1128

feat(runtime): unify runtime_env ring sizing into one int-or-list field#1128
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
ChaoZheng109:feature/unify-runtime-env-ring-sizing

ChaoZheng109 commented Jun 24, 2026

Uh oh!

gemini-code-assist Bot commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ChaoZheng109 commented Jun 24, 2026

Problem

Change

Surface (mirrored a2a3 ⇄ a5)

Test

Uh oh!

gemini-code-assist Bot commented Jun 24, 2026

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading