webgpu: Add session-level buffer pool for graph capture reuse by qjia7 · Pull Request #28761 · microsoft/onnxruntime

qjia7 · 2026-06-03T09:32:05Z

Summary

Introduces SessionBufferPool that lets a session hold on to retired generator buffer caches (storage + uniform) and seed them into newly created generators.
Adds provider option ep.webgpuexecutionprovider.sessionBufferPoolGenerations to bound how many generations of retired buffers are kept (default 1; set to 0 to disable).
Wires the WebGPU EP to donate a retiring BufferManager's cache into the pool and absorb pooled buffers when a new BufferManager is created for the next generator.
The pool is only created when graph capture is enabled AND the option is > 0, so non-graph-capture sessions are unaffected.

Motivation

With graph capture enabled, each generator owns its own per-graph BufferManager. When the generator is destroyed (e.g., per-request in GenAI), the entire buffer cache is thrown away and the next generator must reallocate all storage and uniform buffers from scratch, increasing cold-start latency and GPU memory churn.

By keeping a small pool of recently-retired buffer slots at the session level, the next generator can reuse them and skip reallocation entirely after the first cycle.

Test plan

Build ORT (Windows, D3D12) with --use_webgpu — clean build.
lintrunner -a reports no lint issues.
Verified end-to-end with GenAI on phi4 + WebGPU graph capture using two scripts:
- verify_multi_gen.py: sequential and overlapping generators all produce matching, coherent output.
- verify_max_length_change.py: generators with varying max_length all coherent.
With diagnostic prints (since removed), confirmed that after the first generator donates buffers, subsequent generators report storage hits=171 misses=0, uniform hits=296 misses=0, i.e., the pool actually engages and eliminates reallocation.

Notes

Pairs with a GenAI-side change that invokes SessionReleaseCapturedGraph from State::~State() so the per-graph BufferManager is actually released and its buffers reach the pool.

When graph capture is enabled, each generator instance owns a per-graph BufferManager whose cache is discarded when the generator is destroyed. For workloads that repeatedly create and destroy generators on the same session (e.g., GenAI's per-request generators), this means every new generator has to reallocate all storage and uniform buffers from scratch, inflating cold-start cost and GPU memory churn. This change introduces a SessionBufferPool owned by the session. When a retiring BufferManager is released, its cached storage and uniform buffers are donated to the pool; the next BufferManager seeded from the session absorbs those buffers, skipping reallocation entirely. The pool capacity is controlled by a new provider option "ep.webgpuexecutionprovider.sessionBufferPoolGenerations" (defaults to disabled). The pool evicts the oldest slot when full, keeping the freshest distribution of buffer shapes. Verified with GenAI multi-generator scripts on phi4: subsequent generators report zero cache misses for both storage and uniform caches and produce coherent output across max_length changes and overlapping generator lifetimes.

Copilot

Pull request overview

This PR adds a per-session WebGPU buffer pooling mechanism to improve graph-capture reuse across generator lifetimes by retaining a small number of recently retired per-graph buffer caches and seeding them into newly created per-graph BufferManager instances.

Changes:

Introduces webgpu::SessionBufferPool to retain and recycle storage/uniform buffer caches across captured-graph lifetimes.
Adds a new provider option ep.webgpuexecutionprovider.sessionBufferPoolGenerations and parses/logs it in the WebGPU EP factory/config.
Wires pooling into graph-capture lifecycle: seed buffers on per-graph BufferManager creation and donate buffers on ReleaseCapturedGraph.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
onnxruntime/core/providers/webgpu/webgpu_provider_options.h	Adds config key for session buffer pool generations.
onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc	Parses `sessionBufferPoolGenerations` and logs configured value.
onnxruntime/core/providers/webgpu/webgpu_execution_provider.h	Adds config field and EP member for the session-level buffer pool.
onnxruntime/core/providers/webgpu/webgpu_execution_provider.cc	Creates/clears pool and integrates donate/seed with graph capture lifecycle.
onnxruntime/core/providers/webgpu/session_buffer_pool.h	New pool type definition and slot structure for storage/uniform buffers.
onnxruntime/core/providers/webgpu/session_buffer_pool.cc	New implementation for donate/seed/clear and buffer releasing.
onnxruntime/core/providers/webgpu/buffer_manager.h	Exposes cache managers and adds extract/absorb APIs to cache interface.
onnxruntime/core/providers/webgpu/buffer_manager.cc	Implements extract/absorb for graph-mode cache managers to enable pooling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- webgpu_provider_factory.cc: require std::from_chars to consume the full string for sessionBufferPoolGenerations so values like "1foo" are rejected instead of silently parsed as 1. - buffer_manager.h: drop const from StorageCache()/UniformCache() so the mutable cache references can no longer be obtained through a const BufferManager&. - session_buffer_pool.cc: drop slots_.reserve(max_generations_) to avoid a large up-front allocation when the option is set to an extreme value; slots grow on demand instead.

qjia7 · 2026-06-04T06:58:56Z

@hariharans29 @guschmue Please take a look, thanks.

qjia7 requested a review from Copilot June 3, 2026 09:41

Copilot started reviewing on behalf of qjia7 June 3, 2026 09:41 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread onnxruntime/core/providers/webgpu/webgpu_provider_factory.cc Outdated

Comment thread onnxruntime/core/providers/webgpu/buffer_manager.h Outdated

Comment thread onnxruntime/core/providers/webgpu/session_buffer_pool.cc

guschmue added the ep:WebGPU ort-web webgpu provider label Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webgpu: Add session-level buffer pool for graph capture reuse#28761

webgpu: Add session-level buffer pool for graph capture reuse#28761
qjia7 wants to merge 2 commits into
microsoft:mainfrom
qjia7:webgpu-session-buffer-pool

qjia7 commented Jun 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qjia7 commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qjia7 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qjia7 commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qjia7 commented Jun 3, 2026 •

edited

Loading