Refactor: convert TensorMap to C++ member API and inline hot paths by ChaoWao · Pull Request #156 · hw-native-sys/simpler

ChaoWao · 2026-03-02T12:06:43Z

Summary

Convert TensorMap free functions to C++ member methods (init_default, destroy, reset, lookup, insert, remove_entry, sync_tensormap, valid_count)
Replace index-based entry access with pointer-based API (entry_idx → PTO2TensorMapEntry*) for simpler call sites
Inline hot-path methods (lookup, insert, hash, new_entry) into the header for better codegen
Convert pto2_alloc_packed_buffer from free function to PTO2OrchestratorState member method
Remove unused output_index/num_outputs fields from PTO2TaskDescriptor
Simplify TensorPool overlap detection with early-return and set ref_count=1 in init()

Testing

Simulation tests pass (./ci.sh -p a2a3sim — 10/10)
Hardware tests pass (if applicable)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-03-02T12:07:09Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the TensorMap and related orchestration components to enhance code clarity, improve performance, and streamline memory management. By converting free functions to member methods and adopting pointer-based access, the design becomes more object-oriented and efficient. Key hot paths are inlined for better compiler optimization, and unnecessary fields are removed from task descriptors, contributing to a leaner and faster runtime system.

Highlights

TensorMap Refactoring: Converted all pto2_tensormap_ free functions to C++ member methods of the PTO2TensorMap class, improving encapsulation and object-oriented design.
Pointer-based Entry Access: Replaced index-based access to PTO2TensorMapEntry objects with direct pointer-based access, simplifying call sites and potentially improving performance by avoiding array lookups.
Hot Path Inlining: Inlined critical hot-path methods such as lookup, insert, hash, and new_entry directly into the pto_tensormap.h header file to enable better compiler optimizations and potentially reduce function call overhead.
Orchestrator Buffer Allocation: Transformed the pto2_alloc_packed_buffer free function into a member method of PTO2OrchestratorState, aligning it with the orchestrator's state management.
Task Descriptor Simplification: Removed the unused output_index and num_outputs fields from the PTO2TaskDescriptor struct, reducing memory footprint and simplifying the task representation.
Tensor Overlap Detection: Simplified the TensorPool's overlap detection logic by adding an early-return for non-overlapping cases and ensuring ref_count=1 is set during TensorData initialization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
- Updated calls to TensorMap functions to use new member method syntax.
- Replaced assert with always_assert for critical checks.
- Removed initialization of num_outputs field from PTO2TaskDescriptor.
- Converted pto2_alloc_packed_buffer call to a member method call on orch.
- Modified TensorMap lookup result handling to use PTO2TensorMapEntry* directly.
src/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.h
- Moved pto2_alloc_packed_buffer function implementation into PTO2OrchestratorState as a member method.
- Removed the external declaration of pto2_alloc_packed_buffer.
src/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h
- Removed output_index and num_outputs fields from the PTO2TaskDescriptor struct.
src/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp
- Converted all pto2_tensormap_ free functions to member methods of PTO2TensorMap.
- Updated PTO2TensorMap initialization and reset logic to use nullptr for pointers instead of -1 for indices.
- Modified internal logic to use PTO2TensorMapEntry* for linked list navigation and entry management.
- Removed the #include "tensor.h" header.
- Adjusted sync_tensormap to include a while loop for more aggressive cleanup based on free entry count.
src/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h
- Changed include guard from #ifndef/#define to #pragma once.
- Modified PTO2TensorMapEntry to use PTO2TensorMapEntry* for linked list pointers and bucket_index instead of in_bucket.
- Added uint64_t addr to PTO2TensorMapEntry.
- Updated PTO2LookupResult::Entry to store PTO2TensorMapEntry* instead of an index.
- Converted all pto2_tensormap_ free function declarations to PTO2TensorMap member method declarations.
- Inlined new_entry, free_entry, lookup, insert, cleanup_retired, hash, entry_valid, remove_entry, and remove_from_task methods directly into the class definition.
src/runtime/tensormap_and_ringbuffer/runtime/tensor_pool.h
- Added ref_count = 1 initialization to all TensorData::init overloads.
- Simplified TensorData::is_overlap method by removing redundant is_same_memref check and optimizing the overlap loop with early returns.
- Removed data[index].ref_count = 1 from TensorPool::new_index as it is now handled during TensorData initialization.

Activity

The author, ChaoWao, has ensured that simulation tests pass (10/10) for the changes, as indicated in the PR description.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly refactors TensorMap by converting it to a C++ class with member methods, changing its internal data structures to be pointer-based, inlining hot-path methods, and removing unused fields from PTO2TaskDescriptor. However, a security audit identified critical and high-severity vulnerabilities, including buffer overflows and integer overflows. Specifically, pto2_submit_task lacks bounds checking and uses 32-bit integers for size calculations that can lead to out-of-bounds accesses, and TensorData::init is vulnerable to a buffer overflow if the number of dimensions exceeds 5. These require strict bounds checking and appropriate 64-bit types for memory sizes. Additionally, a critical issue was found that could lead to an infinite loop in sync_tensormap.

gemini-code-assist · 2026-03-02T12:13:50Z

+void PTO2TensorMap::sync_tensormap() {
+    constexpr int MIN_FREE_NUM = 1024;
+    always_assert(orch != nullptr);
+    while(true) {
+        // Read current last_task_alive from shared memory
+        int32_t new_last_task_alive = PTO2_LOAD_ACQUIRE(&orch->sm_handle->header->last_task_alive);
+        sync_validity(new_last_task_alive);
+        if ((pool_size - next_entry_idx + free_num < MIN_FREE_NUM) || new_last_task_alive - orch->tensormap_last_cleanup >= PTO2_TENSORMAP_CLEANUP_INTERVAL) {
+            cleanup_retired(orch->tensormap_last_cleanup, new_last_task_alive);
+            orch->tensormap_last_cleanup = new_last_task_alive;
+        } else {
+            break;
+        }
    }
 }


The while(true) loop introduced in sync_tensormap could lead to an infinite loop. If cleanup_retired is called but does not free enough entries to satisfy the MIN_FREE_NUM condition, and new_last_task_alive does not advance between loop iterations, this loop will continue indefinitely without making progress. This is because orch->tensormap_last_cleanup is updated to new_last_task_alive, so subsequent calls to cleanup_retired within the loop will have no effect.

This is particularly concerning because the new_entry function in pto_tensormap.h no longer waits for free entries but instead uses always_assert. This means the system will crash if it runs out of entries, which is a regression from the previous behavior of waiting.

Consider changing the while(true) to a single if statement to perform cleanup only once, which was the original behavior. If a loop is desired, a mechanism to prevent infinite spinning, such as a wait counter with an assertion, should be added.

gemini-code-assist · 2026-03-02T12:13:50Z

-        tm->entry_pool = NULL;
-        tm->buckets = NULL;
-        tm->free_entry_list = NULL;
+    task_entry_head = (PTO2TensorMapEntry**)malloc(new_pool_size * sizeof(PTO2TensorMapEntry*));


The allocation size for task_entry_head appears to be incorrect. It's being allocated with new_pool_size, but it's indexed by task_id & (PTO2_TASK_WINDOW_SIZE - 1). Therefore, its size should be PTO2_TASK_WINDOW_SIZE to match its usage. While new_pool_size (via PTO2_TENSORMAP_POOL_SIZE) and PTO2_TASK_WINDOW_SIZE currently have the same value, this is not guaranteed and makes the code fragile. Please use PTO2_TASK_WINDOW_SIZE for the allocation size to ensure correctness and clarity.

Suggested change

task_entry_head = (PTO2TensorMapEntry**)malloc(new_pool_size * sizeof(PTO2TensorMapEntry*));

task_entry_head = (PTO2TensorMapEntry**)malloc(PTO2_TASK_WINDOW_SIZE * sizeof(PTO2TensorMapEntry*));

gemini-code-assist · 2026-03-02T12:13:50Z

+    for (int32_t i = 0; i < pool_size; i++) {
+        task_entry_head[i] = nullptr;
    }


The loop to reset task_entry_head uses pool_size as its bound. However, task_entry_head is allocated with a size related to PTO2_TASK_WINDOW_SIZE. To avoid potential buffer overruns or incomplete resets, the loop bound should be PTO2_TASK_WINDOW_SIZE.

Suggested change

for (int32_t i = 0; i < pool_size; i++) {

task_entry_head[i] = nullptr;

}

for (int32_t i = 0; i < PTO2_TASK_WINDOW_SIZE; i++) {

task_entry_head[i] = nullptr;

}

…w-native-sys#156) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Refactor: convert TensorMap to C++ member API and inline hot paths

2b688aa

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist Bot reviewed Mar 2, 2026

View reviewed changes

poursoul approved these changes Mar 2, 2026

View reviewed changes

poursoul merged commit 1130f71 into main Mar 2, 2026
3 checks passed

ChaoWao deleted the refactor/tensormap-cpp-member-api branch March 3, 2026 03:34

ChaoWao mentioned this pull request Mar 3, 2026

Refactor: migrate scheduling to scheduler API, C++ member API, inline hot paths #155

Closed

2 tasks

PKUZHOU pushed a commit to PKUZHOU/simpler that referenced this pull request Mar 31, 2026

Refactor: convert TensorMap to C++ member API and inline hot paths (h…

e053e66

…w-native-sys#156) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: convert TensorMap to C++ member API and inline hot paths#156

Refactor: convert TensorMap to C++ member API and inline hot paths#156
poursoul merged 1 commit into
mainfrom
refactor/tensormap-cpp-member-api

ChaoWao commented Mar 2, 2026

Uh oh!

gemini-code-assist Bot commented Mar 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	task_entry_head = (PTO2TensorMapEntry*)malloc(new_pool_size sizeof(PTO2TensorMapEntry*));
	task_entry_head = (PTO2TensorMapEntry*)malloc(PTO2_TASK_WINDOW_SIZE sizeof(PTO2TensorMapEntry*));

Conversation

ChaoWao commented Mar 2, 2026

Summary

Testing

Uh oh!

gemini-code-assist Bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants