Refactor: extract launch_aicpu + shared collectors/flags into base#913
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR extracts shared diagnostics configuration and AICPU kernel launch functionality from platform-specific ChangesDeviceRunner diagnostics refactoring
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors the DeviceRunner classes for both a2a3 and a5 platforms by moving shared diagnostics collectors, enablement flags, setters, getters, and the launch_aicpu_kernel helper method into the common base class DeviceRunnerBase. The dep_gen collector and enablement setter remain platform-specific to a2a3. Feedback on this PR points out an inaccuracy in a comment within src/a2a3/platform/onboard/host/device_runner.h that refers to "three shared setters" instead of listing them as shared setters and getters, and provides a suggestion to correct it.
c033b03 to
8e6bd78
Compare
Move the trivially-shared Group E pieces from a2a3 and a5 onboard DeviceRunners into the shared DeviceRunnerBase (-10 net lines on what already existed, with much of the structure now centralized): - `launch_aicpu_kernel`: identical 3-line wrapper around `load_aicpu_op_.LaunchBuiltInOp`. - Shared collector fields: `l2_perf_collector_`, `dump_collector_`, `pmu_collector_`, `scope_stats_collector_`. - Shared enable flags + resolved values: `enable_l2_swimlane_`, `enable_dump_tensor_`, `enable_pmu_`, `enable_scope_stats_`, `l2_perf_level_`, `pmu_event_type_`, `output_prefix_`. - Shared setters/getter: `set_l2_swimlane_enabled`, `set_dump_tensor_enabled`, `set_pmu_enabled`, `set_scope_stats_enabled`, `set_output_prefix`, `output_prefix()`. Leaves arch-specific: - a2a3: `enable_dep_gen_`, `dep_gen_collector_`, `set_dep_gen_enabled`, `init_dep_gen` (a2a3 only). - Each arch's `init_l2_perf` / `init_tensor_dump` / `init_pmu` / `init_scope_stats` / `finalize_collectors` (divergent callback styles — a2a3 wraps halHostRegister/Unregister; a5 uses direct rtMalloc/rtFree). Verbatim move. Both arches built clean. a2a3 onboard smoke 8/8 in 16s.
Summary
Continues the onboard host refactor (
.docs/ONBOARD_HOST_COMMON_REFACTOR.md, PR 6 minimal). Moves the trivially-shared Group E pieces intoDeviceRunnerBase.Moved to base
launch_aicpu_kernel(3-line wrapper aroundload_aicpu_op_.LaunchBuiltInOp, byte-identical on both arches).l2_perf_collector_,dump_collector_,pmu_collector_,scope_stats_collector_.enable_l2_swimlane_,enable_dump_tensor_,enable_pmu_,enable_scope_stats_,l2_perf_level_,pmu_event_type_,output_prefix_.set_l2_swimlane_enabled,set_dump_tensor_enabled,set_pmu_enabled,set_scope_stats_enabled,set_output_prefix,output_prefix().Left arch-specific
enable_dep_gen_,dep_gen_collector_,set_dep_gen_enabled,init_dep_gen.init_l2_perf/init_tensor_dump/init_pmu/init_scope_stats/finalize_collectors— these have genuinely divergent callback shapes (a2a3 wrapshalHostRegister/Unregister; a5 uses directrtMalloc/rtFree). Unifying them would require introducing virtuals; that's a separate design call.Verification
nm -Dconfirmslaunch_aicpu_kernelnow resolves toDeviceRunnerBaseon both arches'libhost_runtime.so.dummy_task,alternating_matmul_add,prepared_callablesuite) — 8/8 passed in 16s.Test plan