Skip to content

Refactor: inline ring buffer hot paths and expand scheduler API#158

Merged
ChaoWao merged 1 commit into
mainfrom
refactor/ringbuffer-inline-scheduler-api
Mar 2, 2026
Merged

Refactor: inline ring buffer hot paths and expand scheduler API#158
ChaoWao merged 1 commit into
mainfrom
refactor/ringbuffer-inline-scheduler-api

Conversation

@ChaoWao
Copy link
Copy Markdown
Collaborator

@ChaoWao ChaoWao commented Mar 2, 2026

Summary

  • Inline ring buffer allocation hot paths (pto2_heap_ring_alloc, pto2_task_ring_alloc) from .cpp into .h as member methods
  • Add spinlock-based ready queue to scheduler (pto2_ready_queue_push/get_ready_task/on_task_complete)
  • Convert pto2_task_slot and pto2_scheduler_init_task to PTO2SchedulerState member methods
  • Add release_fanin_and_check_ready for atomic fanin arrive pattern
  • Update orchestrator call sites for new member APIs (3 call sites)

Testing

  • Simulation tests pass (./ci.sh -p a2a3sim — 10/10)
  • Hardware tests pass (if applicable)

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors core runtime components, primarily focusing on performance and API consistency. By inlining critical ring buffer allocation functions and integrating scheduler task management directly into the PTO2SchedulerState struct, the changes aim to reduce overhead and streamline the interaction between the orchestrator and scheduler. The introduction of spinlocks in the ready queue also enhances thread safety for concurrent task processing.

Highlights

  • Ring Buffer Hot Path Inlining: Inlined pto2_heap_ring_alloc and pto2_task_ring_alloc into their respective PTO2HeapRing and PTO2TaskRing structs as member methods, moving their definitions from .cpp to .h for potential hot path optimization.
  • Scheduler Ready Queue Synchronization: Introduced spinlock-based synchronization for the scheduler's ready queue (PTO2ReadyQueue) to ensure thread-safe push and pop operations.
  • Scheduler API Expansion and Refactoring: Refactored scheduler task management by converting pto2_task_slot and pto2_scheduler_init_task into member methods of PTO2SchedulerState, and added a new release_fanin_and_check_ready member method to handle atomic fanin dependency resolution and task readiness checks.
  • Orchestrator Call Site Updates: Updated orchestrator call sites across pto_orchestrator.cpp and pto_orchestrator.h to utilize the new member method APIs for ring buffer allocations and scheduler interactions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp
    • Updated calls to pto2_task_slot to use sched->pto2_task_slot().
    • Updated calls to pto2_heap_ring_alloc to use orch->heap_ring.pto2_heap_ring_alloc().
    • Updated calls to pto2_task_ring_alloc to use orch->task_ring.pto2_task_ring_alloc().
    • Updated calls to pto2_scheduler_init_task to use orch->scheduler->init_task().
  • src/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.h
    • Updated calls to pto2_heap_ring_alloc to use heap_ring.pto2_heap_ring_alloc().
  • src/runtime/tensormap_and_ringbuffer/runtime/pto_ring_buffer.cpp
    • Removed external definitions for pto2_heap_ring_alloc, pto2_heap_ring_try_alloc, pto2_heap_ring_available.
    • Removed external definitions for pto2_task_ring_alloc, pto2_task_ring_try_alloc.
    • Removed PTO2_SPIN_VERBOSE_LOGGING macro definition.
  • src/runtime/tensormap_and_ringbuffer/runtime/pto_ring_buffer.h
    • Moved spin logging and limit macros (PTO2_SPIN_VERBOSE_LOGGING, PTO2_BLOCK_NOTIFY_INTERVAL, PTO2_HEAP_SPIN_LIMIT, PTO2_FLOW_CONTROL_SPIN_LIMIT) from .cpp to .h.
    • Inlined pto2_heap_ring_alloc, pto2_heap_ring_try_alloc, and pto2_heap_ring_available as member methods of PTO2HeapRing.
    • Inlined pto2_task_ring_alloc and pto2_task_ring_try_alloc as member methods of PTO2TaskRing.
    • Removed external function declarations for the inlined ring buffer functions.
  • src/runtime/tensormap_and_ringbuffer/runtime/pto_scheduler.cpp
    • Initialized spinlock member in pto2_ready_queue_init.
    • Implemented spinlock protection for pto2_ready_queue_push and pto2_ready_queue_pop.
    • Updated memset calls in pto2_scheduler_reset to use sched->task_window_size.
    • Removed external pto2_scheduler_init_task and pto2_scheduler_check_ready functions.
    • Updated calls to pto2_task_slot to use sched->pto2_task_slot().
    • Adjusted atomic memory orders for __atomic_compare_exchange_n and __atomic_fetch_add operations for improved concurrency.
    • Refactored pto2_scheduler_on_task_complete to acquire and release the fanout lock earlier and utilize the new release_fanin_and_check_ready method.
  • src/runtime/tensormap_and_ringbuffer/runtime/pto_scheduler.h
    • Added int32_t spinlock member to PTO2ReadyQueue.
    • Moved pto2_ready_queue_push declaration to appear before PTO2SchedulerState definition.
    • Changed PTO2SchedulerState from typedef struct to struct to allow member methods.
    • Inlined pto2_task_slot, release_fanin_and_check_ready, and init_task as member methods of PTO2SchedulerState.
    • Removed external function declarations for pto2_scheduler_init_task and pto2_scheduler_check_ready.
Activity
  • No human activity has been recorded on this pull request since its creation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

- Move pto2_heap_ring_alloc and pto2_task_ring_alloc from .cpp to .h
  as inline member methods for better codegen on hot allocation paths
- Add spinlock-based ready queue, release_fanin_and_check_ready, and
  init_task as PTO2SchedulerState member methods
- Convert pto2_task_slot to scheduler member method
- Update orchestrator call sites for new member APIs
@ChaoWao ChaoWao force-pushed the refactor/ringbuffer-inline-scheduler-api branch from e1b313c to 0135388 Compare March 2, 2026 12:25
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors hot-path functions for inlining and enhances the scheduler's concurrency model with spinlocks and optimized atomic operations, aiming to improve performance and thread safety. However, it introduces several critical security vulnerabilities, including a buffer overflow in task submission due to missing parameter count validation, an integer overflow in output size calculation that can lead to heap corruption, and a race condition in the scheduler's ring pointer management. Additionally, a logic error in the dependency resolution mechanism will cause tasks with no dependencies to hang, resulting in a denial of service. These issues should be addressed before merging.

Comment thread src/runtime/tensormap_and_ringbuffer/runtime/pto_scheduler.h
Comment thread src/runtime/tensormap_and_ringbuffer/runtime/pto_scheduler.cpp
Comment thread src/runtime/tensormap_and_ringbuffer/runtime/pto_scheduler.h
@ChaoWao ChaoWao merged commit dcb7a12 into main Mar 2, 2026
3 checks passed
@ChaoWao ChaoWao deleted the refactor/ringbuffer-inline-scheduler-api branch March 2, 2026 12:48
PKUZHOU pushed a commit to PKUZHOU/simpler that referenced this pull request Mar 31, 2026
…ative-sys#158)

- Move pto2_heap_ring_alloc and pto2_task_ring_alloc from .cpp to .h
  as inline member methods for better codegen on hot allocation paths
- Add spinlock-based ready queue, release_fanin_and_check_ready, and
  init_task as PTO2SchedulerState member methods
- Convert pto2_task_slot to scheduler member method
- Update orchestrator call sites for new member APIs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant