Skip to content

Refactor: extract launch_aicpu + shared collectors/flags into base#913

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:extract-group-e-collectors
May 30, 2026
Merged

Refactor: extract launch_aicpu + shared collectors/flags into base#913
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:extract-group-e-collectors

Conversation

@hw-native-sys-bot
Copy link
Copy Markdown
Collaborator

Summary

Continues the onboard host refactor (.docs/ONBOARD_HOST_COMMON_REFACTOR.md, PR 6 minimal). Moves the trivially-shared Group E pieces into DeviceRunnerBase.

Moved to base

  • Method: launch_aicpu_kernel (3-line wrapper around load_aicpu_op_.LaunchBuiltInOp, byte-identical on both arches).
  • Collectors: l2_perf_collector_, dump_collector_, pmu_collector_, scope_stats_collector_.
  • Enable flags + resolved values: enable_l2_swimlane_, enable_dump_tensor_, enable_pmu_, enable_scope_stats_, l2_perf_level_, pmu_event_type_, output_prefix_.
  • Setters/getter: set_l2_swimlane_enabled, set_dump_tensor_enabled, set_pmu_enabled, set_scope_stats_enabled, set_output_prefix, output_prefix().

Left arch-specific

  • a2a3 only: enable_dep_gen_, dep_gen_collector_, set_dep_gen_enabled, init_dep_gen.
  • Each arch's init_l2_perf / init_tensor_dump / init_pmu / init_scope_stats / finalize_collectors — these have genuinely divergent callback shapes (a2a3 wraps halHostRegister/Unregister; a5 uses direct rtMalloc/rtFree). Unifying them would require introducing virtuals; that's a separate design call.

Verification

  • nm -D confirms launch_aicpu_kernel now resolves to DeviceRunnerBase on both arches' libhost_runtime.so.
  • a2a3 + a5 host runtimes built clean (onboard + sim, both runtimes).
  • Local a2a3 onboard smoke (dummy_task, alternating_matmul_add, prepared_callable suite) — 8/8 passed in 16s.

Test plan

  • CI st-sim-a2a3 / st-sim-a5
  • CI st-onboard-a2a3 / st-onboard-a5
  • CI ut-a2a3 / ut-a5

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 31307785-9bfa-42ec-8932-87154e786fd6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR extracts shared diagnostics configuration and AICPU kernel launch functionality from platform-specific DeviceRunner classes (a2a3 and a5) into a common DeviceRunnerBase, reducing code duplication while preserving a2a3-specific dep_gen support.

Changes

DeviceRunner diagnostics refactoring

Layer / File(s) Summary
DeviceRunnerBase diagnostics infrastructure
src/common/platform/onboard/host/device_runner_base.h, src/common/platform/onboard/host/device_runner_base.cpp
New diagnostics headers and public inline setters for L2 swimlane, tensor dump, PMU, and scope stats enablement, plus output prefix configuration. Protected member variables for collectors (L2PerfCollector, TensorDumpCollector, PmuCollector, ScopeStatsCollector) and associated enable flags and settings. A shared launch_aicpu_kernel helper forwards calls to load_aicpu_op_.LaunchBuiltInOp.
a2a3 DeviceRunner remove shared diagnostics
src/a2a3/platform/onboard/host/device_runner.h, src/a2a3/platform/onboard/host/device_runner.cpp
Removes shared diagnostics collectors and configuration setters from a2a3 DeviceRunner, keeping only set_dep_gen_enabled and enable_dep_gen_ (documented as a2a3-only). The launch_aicpu_kernel implementation is removed and documented as inherited from DeviceRunnerBase.
a5 DeviceRunner remove shared diagnostics
src/a5/platform/onboard/host/device_runner.h, src/a5/platform/onboard/host/device_runner.cpp
Removes all shared diagnostics collectors, enable flags, and configuration members from a5 DeviceRunner. The launch_aicpu_kernel declaration and related implementation are removed, documented as now provided by DeviceRunnerBase.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • hw-native-sys/simpler#880: Establishes the DeviceRunnerBase inheritance pattern that this PR builds upon to extract shared diagnostics and AICPU launch functionality.

Poem

🐰 Collectors climb the family tree,
From leaves to root, duplication flees,
Base and branches share the load,
One launch helper, one shared road.
Dep_gen stays, a2a3's own,
Less to maintain, codebase grown! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main refactoring change: extracting launch_aicpu and shared collectors/flags from arch-specific classes into the base class.
Description check ✅ Passed The description is directly related to the changeset, clearly explaining what was moved to the base class, what remained arch-specific, and providing verification details.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the DeviceRunner classes for both a2a3 and a5 platforms by moving shared diagnostics collectors, enablement flags, setters, getters, and the launch_aicpu_kernel helper method into the common base class DeviceRunnerBase. The dep_gen collector and enablement setter remain platform-specific to a2a3. Feedback on this PR points out an inaccuracy in a comment within src/a2a3/platform/onboard/host/device_runner.h that refers to "three shared setters" instead of listing them as shared setters and getters, and provides a suggestion to correct it.

Comment thread src/a2a3/platform/onboard/host/device_runner.h
@hw-native-sys-bot hw-native-sys-bot force-pushed the extract-group-e-collectors branch from c033b03 to 8e6bd78 Compare May 30, 2026 08:50
Move the trivially-shared Group E pieces from a2a3 and a5 onboard
DeviceRunners into the shared DeviceRunnerBase (-10 net lines on what
already existed, with much of the structure now centralized):

- `launch_aicpu_kernel`: identical 3-line wrapper around
  `load_aicpu_op_.LaunchBuiltInOp`.
- Shared collector fields: `l2_perf_collector_`, `dump_collector_`,
  `pmu_collector_`, `scope_stats_collector_`.
- Shared enable flags + resolved values: `enable_l2_swimlane_`,
  `enable_dump_tensor_`, `enable_pmu_`, `enable_scope_stats_`,
  `l2_perf_level_`, `pmu_event_type_`, `output_prefix_`.
- Shared setters/getter: `set_l2_swimlane_enabled`,
  `set_dump_tensor_enabled`, `set_pmu_enabled`,
  `set_scope_stats_enabled`, `set_output_prefix`, `output_prefix()`.

Leaves arch-specific:
- a2a3: `enable_dep_gen_`, `dep_gen_collector_`, `set_dep_gen_enabled`,
  `init_dep_gen` (a2a3 only).
- Each arch's `init_l2_perf` / `init_tensor_dump` / `init_pmu` /
  `init_scope_stats` / `finalize_collectors` (divergent callback styles
  — a2a3 wraps halHostRegister/Unregister; a5 uses direct rtMalloc/rtFree).

Verbatim move. Both arches built clean. a2a3 onboard smoke 8/8 in 16s.
@ChaoWao ChaoWao merged commit aa6ce64 into hw-native-sys:main May 30, 2026
29 of 31 checks passed
@ChaoWao ChaoWao deleted the extract-group-e-collectors branch May 30, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants