Skip to content

Refactor: convert TensorMap to C++ member API and inline hot paths#156

Merged
poursoul merged 1 commit into
mainfrom
refactor/tensormap-cpp-member-api
Mar 2, 2026
Merged

Refactor: convert TensorMap to C++ member API and inline hot paths#156
poursoul merged 1 commit into
mainfrom
refactor/tensormap-cpp-member-api

Conversation

@ChaoWao
Copy link
Copy Markdown
Collaborator

@ChaoWao ChaoWao commented Mar 2, 2026

Summary

  • Convert TensorMap free functions to C++ member methods (init_default, destroy, reset, lookup, insert, remove_entry, sync_tensormap, valid_count)
  • Replace index-based entry access with pointer-based API (entry_idxPTO2TensorMapEntry*) for simpler call sites
  • Inline hot-path methods (lookup, insert, hash, new_entry) into the header for better codegen
  • Convert pto2_alloc_packed_buffer from free function to PTO2OrchestratorState member method
  • Remove unused output_index/num_outputs fields from PTO2TaskDescriptor
  • Simplify TensorPool overlap detection with early-return and set ref_count=1 in init()

Testing

  • Simulation tests pass (./ci.sh -p a2a3sim — 10/10)
  • Hardware tests pass (if applicable)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the TensorMap and related orchestration components to enhance code clarity, improve performance, and streamline memory management. By converting free functions to member methods and adopting pointer-based access, the design becomes more object-oriented and efficient. Key hot paths are inlined for better compiler optimization, and unnecessary fields are removed from task descriptors, contributing to a leaner and faster runtime system.

Highlights

  • TensorMap Refactoring: Converted all pto2_tensormap_ free functions to C++ member methods of the PTO2TensorMap class, improving encapsulation and object-oriented design.
  • Pointer-based Entry Access: Replaced index-based access to PTO2TensorMapEntry objects with direct pointer-based access, simplifying call sites and potentially improving performance by avoiding array lookups.
  • Hot Path Inlining: Inlined critical hot-path methods such as lookup, insert, hash, and new_entry directly into the pto_tensormap.h header file to enable better compiler optimizations and potentially reduce function call overhead.
  • Orchestrator Buffer Allocation: Transformed the pto2_alloc_packed_buffer free function into a member method of PTO2OrchestratorState, aligning it with the orchestrator's state management.
  • Task Descriptor Simplification: Removed the unused output_index and num_outputs fields from the PTO2TaskDescriptor struct, reducing memory footprint and simplifying the task representation.
  • Tensor Overlap Detection: Simplified the TensorPool's overlap detection logic by adding an early-return for non-overlapping cases and ensuring ref_count=1 is set during TensorData initialization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
    • Updated calls to TensorMap functions to use new member method syntax.
    • Replaced assert with always_assert for critical checks.
    • Removed initialization of num_outputs field from PTO2TaskDescriptor.
    • Converted pto2_alloc_packed_buffer call to a member method call on orch.
    • Modified TensorMap lookup result handling to use PTO2TensorMapEntry* directly.
  • src/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.h
    • Moved pto2_alloc_packed_buffer function implementation into PTO2OrchestratorState as a member method.
    • Removed the external declaration of pto2_alloc_packed_buffer.
  • src/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h
    • Removed output_index and num_outputs fields from the PTO2TaskDescriptor struct.
  • src/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.cpp
    • Converted all pto2_tensormap_ free functions to member methods of PTO2TensorMap.
    • Updated PTO2TensorMap initialization and reset logic to use nullptr for pointers instead of -1 for indices.
    • Modified internal logic to use PTO2TensorMapEntry* for linked list navigation and entry management.
    • Removed the #include "tensor.h" header.
    • Adjusted sync_tensormap to include a while loop for more aggressive cleanup based on free entry count.
  • src/runtime/tensormap_and_ringbuffer/runtime/pto_tensormap.h
    • Changed include guard from #ifndef/#define to #pragma once.
    • Modified PTO2TensorMapEntry to use PTO2TensorMapEntry* for linked list pointers and bucket_index instead of in_bucket.
    • Added uint64_t addr to PTO2TensorMapEntry.
    • Updated PTO2LookupResult::Entry to store PTO2TensorMapEntry* instead of an index.
    • Converted all pto2_tensormap_ free function declarations to PTO2TensorMap member method declarations.
    • Inlined new_entry, free_entry, lookup, insert, cleanup_retired, hash, entry_valid, remove_entry, and remove_from_task methods directly into the class definition.
  • src/runtime/tensormap_and_ringbuffer/runtime/tensor_pool.h
    • Added ref_count = 1 initialization to all TensorData::init overloads.
    • Simplified TensorData::is_overlap method by removing redundant is_same_memref check and optimizing the overlap loop with early returns.
    • Removed data[index].ref_count = 1 from TensorPool::new_index as it is now handled during TensorData initialization.
Activity
  • The author, ChaoWao, has ensured that simulation tests pass (10/10) for the changes, as indicated in the PR description.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly refactors TensorMap by converting it to a C++ class with member methods, changing its internal data structures to be pointer-based, inlining hot-path methods, and removing unused fields from PTO2TaskDescriptor. However, a security audit identified critical and high-severity vulnerabilities, including buffer overflows and integer overflows. Specifically, pto2_submit_task lacks bounds checking and uses 32-bit integers for size calculations that can lead to out-of-bounds accesses, and TensorData::init is vulnerable to a buffer overflow if the number of dimensions exceeds 5. These require strict bounds checking and appropriate 64-bit types for memory sizes. Additionally, a critical issue was found that could lead to an infinite loop in sync_tensormap.

Comment on lines +236 to 250
void PTO2TensorMap::sync_tensormap() {
constexpr int MIN_FREE_NUM = 1024;
always_assert(orch != nullptr);
while(true) {
// Read current last_task_alive from shared memory
int32_t new_last_task_alive = PTO2_LOAD_ACQUIRE(&orch->sm_handle->header->last_task_alive);
sync_validity(new_last_task_alive);
if ((pool_size - next_entry_idx + free_num < MIN_FREE_NUM) || new_last_task_alive - orch->tensormap_last_cleanup >= PTO2_TENSORMAP_CLEANUP_INTERVAL) {
cleanup_retired(orch->tensormap_last_cleanup, new_last_task_alive);
orch->tensormap_last_cleanup = new_last_task_alive;
} else {
break;
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The while(true) loop introduced in sync_tensormap could lead to an infinite loop. If cleanup_retired is called but does not free enough entries to satisfy the MIN_FREE_NUM condition, and new_last_task_alive does not advance between loop iterations, this loop will continue indefinitely without making progress. This is because orch->tensormap_last_cleanup is updated to new_last_task_alive, so subsequent calls to cleanup_retired within the loop will have no effect.

This is particularly concerning because the new_entry function in pto_tensormap.h no longer waits for free entries but instead uses always_assert. This means the system will crash if it runs out of entries, which is a regression from the previous behavior of waiting.

Consider changing the while(true) to a single if statement to perform cleanup only once, which was the original behavior. If a loop is desired, a mechanism to prevent infinite spinning, such as a wait counter with an assertion, should be added.

tm->entry_pool = NULL;
tm->buckets = NULL;
tm->free_entry_list = NULL;
task_entry_head = (PTO2TensorMapEntry**)malloc(new_pool_size * sizeof(PTO2TensorMapEntry*));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The allocation size for task_entry_head appears to be incorrect. It's being allocated with new_pool_size, but it's indexed by task_id & (PTO2_TASK_WINDOW_SIZE - 1). Therefore, its size should be PTO2_TASK_WINDOW_SIZE to match its usage. While new_pool_size (via PTO2_TENSORMAP_POOL_SIZE) and PTO2_TASK_WINDOW_SIZE currently have the same value, this is not guaranteed and makes the code fragile. Please use PTO2_TASK_WINDOW_SIZE for the allocation size to ensure correctness and clarity.

Suggested change
task_entry_head = (PTO2TensorMapEntry**)malloc(new_pool_size * sizeof(PTO2TensorMapEntry*));
task_entry_head = (PTO2TensorMapEntry**)malloc(PTO2_TASK_WINDOW_SIZE * sizeof(PTO2TensorMapEntry*));

Comment on lines +157 to 159
for (int32_t i = 0; i < pool_size; i++) {
task_entry_head[i] = nullptr;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The loop to reset task_entry_head uses pool_size as its bound. However, task_entry_head is allocated with a size related to PTO2_TASK_WINDOW_SIZE. To avoid potential buffer overruns or incomplete resets, the loop bound should be PTO2_TASK_WINDOW_SIZE.

Suggested change
for (int32_t i = 0; i < pool_size; i++) {
task_entry_head[i] = nullptr;
}
for (int32_t i = 0; i < PTO2_TASK_WINDOW_SIZE; i++) {
task_entry_head[i] = nullptr;
}

@poursoul poursoul merged commit 1130f71 into main Mar 2, 2026
3 checks passed
@ChaoWao ChaoWao deleted the refactor/tensormap-cpp-member-api branch March 3, 2026 03:34
PKUZHOU pushed a commit to PKUZHOU/simpler that referenced this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants