Skip to content

[WebGPU] Proposal: C++ optimization by reserving program inputs, output, and uniform variables #28516

@kareertl

Description

@kareertl

Describe the issue

Problem:
The capacities of the three vector members of the ProgramBase class, inputs_, outputs_, and variables_, aren't reserved before a vector entry is added. This can cause a lot of reallocation as new vector entries are added and affects the performance.

Proposal:
Add 3 methods to the ProgramBase class:

ProgramBase& ProgramBase::ReserveInputCapacity(size_t capacity) {
  inputs_.reserve(capacity);
  return *this;
}
ProgramBase& ProgramBase::ReserveOutputCapacity(size_t capacity) {
  outputs_.reserve(capacity);
  return *this;
}
ProgramBase& ProgramBase::ReserveUniformVariableCapacity(size_t capacity) {
  variables_.reserve(capacity);
  return *this;
}

In addition, utilize these methods before adding program inputs, outputs, or uniform variables. For example, in conv.cc, in ComputeInternal(), one can do this:

    program.CacheHint(activation_.ToString(), std::to_string(is_channels_last))
        .ReserveInputCapacity(has_bias ? 3 : 2)
        .AddInput({input, ProgramTensorMetadataDependency::TypeAndRank, input_shape, 1})
        .AddInput({kernel, ProgramTensorMetadataDependency::TypeAndRank, kernel_shape, 1})
...
        .ReserveUniformVariableCapacity(6)
        .AddUniformVariables({{static_cast<uint32_t>(output_size)}, {dilations}, {strides}, {updated_pads}, {static_cast<uint32_t>(output_channels_per_group)}, {static_cast<uint32_t>(components)}})
...

To reproduce

Use some kind of profiler to check whether reallocation occurs when adding program inputs, outputs, or uniform variables and its impact on performance. For LLVM, one might see __emplace_back_slow_path.

Urgency

No response

Platform

Other / Unknown

OS Version

Custom

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.26.0

ONNX Runtime API

C++

Architecture

Other / Unknown

Execution Provider

Other / Unknown

Execution Provider Library Version

WebGPU

Model File

No response

Is this a quantized model?

Unknown

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:WebGPUort-web webgpu providerperformanceissues related to performance regressionsplatform:webissues related to ONNX Runtime web; typically submitted using template

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions