Fix: unify tensor dump control under profiling flags#567
Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom Apr 15, 2026
Merged
Fix: unify tensor dump control under profiling flags#567ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao merged 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request unifies profiling and tensor dump controls by introducing a bitmask flag (enable_profiling_flag) in the Handshake structure shared between the host, AICPU, and AICore. It replaces the PTO2_DUMP_TENSOR macro with PTO2_PROFILING and implements pipe_barrier calls in AICore execution loops to ensure memory visibility when tensor dumping is enabled. Review feedback highlights the need to extend pipe_barrier usage to general profiling to prevent race conditions on weak memory model architectures and suggests that the AICPU should utilize the new handshake flag for better configuration consistency.
82702bd to
901edea
Compare
c57e775 to
2422bb4
Compare
- add `enable_profiling_flag` to the AICPU/AICore handshake and
initialize the dump bit in onboard and sim device runners
- replace `PTO2_DUMP_TENSOR` guards with `PTO2_PROFILING` and remove
the old per-runtime dump macro definitions
- add an AICore pipe barrier before completion when dumping tensors to
preserve write visibility for dumps
2422bb4 to
87867c8
Compare
ChaoWao
approved these changes
Apr 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
enable_profiling_flagto the AICPU/AICore handshake and initialize the dump bit in onboard and sim device runnersPTO2_DUMP_TENSORguards withPTO2_PROFILINGand remove the old per-runtime dump macro definitions