Refactor: pass output_prefix via CallConfig, drop SIMPLER_OUTPUT_DIR env var by ChaoWao · Pull Request #693 · hw-native-sys/simpler

ChaoWao · 2026-04-28T01:05:37Z

Summary

Replace the env-var-based output-directory scoping (SIMPLER_OUTPUT_DIR / SIMPLER_L2_PERF_RECORDS_OUTPUT_DIR) with a per-task output_prefix field on CallConfig. Each test case sets its own prefix (chosen by scene_test.py::_build_output_prefix as outputs/<case>_<ts>/), the C++ runtime writes diagnostic artifacts under that directory with fixed filenames, and all the workarounds for the second-precision timestamp collision go away.

This is the third and final PR superseding #685.

What gets deleted

flatten_l2_perf_records_subdirs in parallel_scheduler.py — no more "scoped subdir → flatten back" dance.
_snapshot_l2_perf_records_files / _wait_new_l2_perf_records_file / rename-after-the-fact in scene_test.py — the path is known a priori from CallConfig.output_prefix.
SIMPLER_*_OUTPUT_DIR env-var setting in conftest.py (xdist worker scoping) and _dispatch_test_phases_standalone.
getenv and strftime-based filename composition in C++ l2_perf_collector / tensor_dump_collector / pmu_collector (a5 + a2a3, onboard + sim).
Per-file timestamps. Every output now lands in <prefix>/{l2_perf_records.json, tensor_dump/, pmu.csv} with fixed names. The directory IS the per-task uniqueness boundary.

What gets added

CallConfig::output_prefix (char[1024], NUL-terminated). Increases sizeof(CallConfig) from 20 → 1044 bytes.
CallConfig::validate() — throws std::invalid_argument if any diagnostic flag is enabled but output_prefix is empty. Called at every submit/run entry point (Orchestrator::submit_impl, ChipWorker::run, Worker::run) before any IPC happens, so failures land closest to the user's call site.
pto_runtime_c_api::run_runtime gains a const char *output_prefix parameter, plumbed through to DeviceRunner::set_output_prefix() (a5 + a2a3, onboard + sim).
nanobind def_property bridge for the str↔char[1024] crossing, with size validation.
Python-side bounds check in _read_args_from_mailbox (was unchecked — could read past the buffer if the header was corrupt).

Mailbox layout

MAILBOX_OFF_ARGS auto-derives from sizeof(CallConfig), rounded up to 8 bytes for ContinuousTensor.data alignment. static_assert guards both invariants. MAILBOX_ARGS_CAPACITY shrinks from 3776 → ~2776 bytes (still ~69 tensors per task).

Output layout (before/after)

	Before	After
Swimlane	`outputs/l2_perf_records_<ts>.json`	`outputs/<case>_<ts>/l2_perf_records.json`
Tensor dump	`outputs/tensor_dump_<ts>/`	`outputs/<case>_<ts>/tensor_dump/`
PMU	`outputs/pmu_<ts>_<ms>.csv`	`outputs/<case>_<ts>/pmu.csv`
Parallel scoping	`SIMPLER_OUTPUT_DIR` env + per-subprocess subdirs + flatten step	each case picks its own dir; no env, no flatten

CLI tools

swimlane_converter, sched_overhead_analysis, perf_to_mermaid, dump_viewer now glob outputs/*/l2_perf_records.json (or outputs/*/tensor_dump) and pick the latest by mtime.

PR sequence

~~Rename ChipCallConfig → CallConfig~~ (Refactor: rename ChipCallConfig -> CallConfig #687, merged)
~~Pack mailbox config fields into a single POD block~~ (Refactor: pack mailbox config fields into a single POD block #689, merged)
This PR — pass output_prefix via CallConfig, drop env var

Test plan

CI: lint + unit tests + simulation tests + onboard hardware tests pass
Round-trip: a test case with --enable-l2-swimlane produces outputs/<case>_<ts>/l2_perf_records.json and the swimlane converter auto-picks it
Negative: enabling --enable-l2-swimlane from a code path that bypasses scene_test (i.e. with empty output_prefix) throws RuntimeError from CallConfig::validate() — verifies the contract surfaces synchronously instead of producing garbage files

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request refactors the diagnostic artifact storage system to use per-task directories with fixed filenames, eliminating the need for environment-variable scoping and post-run file flattening. This change improves reliability and organization during parallel test execution, supported by new validation logic in CallConfig. The review feedback highlights issues where the C++ runtime fails to create the necessary output directories before attempting to write L2 performance records and PMU data, which would lead to failed exports.

Replace the env-var-based output-directory scoping (SIMPLER_OUTPUT_DIR / SIMPLER_L2_PERF_RECORDS_OUTPUT_DIR) with a per-task output_prefix field on CallConfig. Each test case sets its own prefix (chosen by scene_test.py::_build_output_prefix as outputs/<case>_<ts>/), the C++ runtime writes diagnostic artifacts under that directory with fixed filenames, and all the workarounds for the second-precision timestamp collision go away: - delete flatten_l2_perf_records_subdirs from parallel_scheduler.py - delete _snapshot_l2_perf_records_files / _wait_new_l2_perf_records_file / rename-after-the-fact dance from scene_test.py - delete SIMPLER_*_OUTPUT_DIR env var setting in conftest.py (xdist worker scoping) and _dispatch_test_phases_standalone - delete getenv and strftime-based filename composition in C++ l2_perf_collector / tensor_dump_collector / pmu_collector - collector exports now write <prefix>/{l2_perf_records.json, tensor_dump/, pmu.csv} with fixed filenames CallConfig::validate() throws std::invalid_argument if any diagnostic flag is enabled but output_prefix is empty - surfaces the contract violation at every submit/run entry point (Orchestrator::submit_impl, ChipWorker::run, Worker::run) before any IPC happens, so failures land closest to the user's call site. Mailbox layout: CallConfig grows to 1044 bytes (5 int32 + char[1024]). MAILBOX_OFF_ARGS auto-derives from sizeof(CallConfig), rounded up to 8 bytes for ContinuousTensor.data alignment. ARGS_CAPACITY shrinks to ~2776 bytes (was 3776) - still fits ~69 tensors per task. Also tightens Python-side mailbox safety: _read_args_from_mailbox now bounds-checks t_count/s_count against _MAILBOX_ARGS_CAPACITY (was unchecked - could read past the buffer if the header was corrupt). CLI tools (swimlane_converter, sched_overhead_analysis, perf_to_mermaid, dump_viewer) now glob outputs/*/l2_perf_records.json (or outputs/*/tensor_dump) and pick the latest by mtime. Note: clang-tidy and check-headers hooks skipped locally (simpler not pip-installed in this venv); CI will lint.

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread src/a2a3/platform/src/host/l2_perf_collector.cpp

Comment thread src/a5/platform/src/host/l2_perf_collector.cpp

Comment thread src/a2a3/platform/include/host/pmu_collector.h Outdated

Comment thread src/a5/platform/include/host/pmu_collector.h Outdated

ChaoWao force-pushed the refactor/output-prefix-drop-env-var branch 5 times, most recently from e49c823 to c42ebd7 Compare April 28, 2026 06:36

ChaoWao force-pushed the refactor/output-prefix-drop-env-var branch from c42ebd7 to 27bdeea Compare April 28, 2026 07:01

poursoul approved these changes Apr 28, 2026

View reviewed changes

ChaoWao merged commit c557319 into hw-native-sys:main Apr 28, 2026
14 checks passed

ChaoWao deleted the refactor/output-prefix-drop-env-var branch April 28, 2026 07:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: pass output_prefix via CallConfig, drop SIMPLER_OUTPUT_DIR env var#693

Refactor: pass output_prefix via CallConfig, drop SIMPLER_OUTPUT_DIR env var#693
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao:refactor/output-prefix-drop-env-var

ChaoWao commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChaoWao commented Apr 28, 2026

Summary

What gets deleted

What gets added

Mailbox layout

Output layout (before/after)

CLI tools

PR sequence

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants