[WebGPU] Proposal: C++ optimization by reserving program inputs, output, and uniform variables

### Describe the issue

**Problem:**
The capacities of the three vector members of the ProgramBase class, inputs_, outputs_, and variables_, aren't reserved before a vector entry is added. This can cause a lot of reallocation as new vector entries are added and affects the performance.

**Proposal:**
Add 3 methods to the ProgramBase class:
```
ProgramBase& ProgramBase::ReserveInputCapacity(size_t capacity) {
  inputs_.reserve(capacity);
  return *this;
}
ProgramBase& ProgramBase::ReserveOutputCapacity(size_t capacity) {
  outputs_.reserve(capacity);
  return *this;
}
ProgramBase& ProgramBase::ReserveUniformVariableCapacity(size_t capacity) {
  variables_.reserve(capacity);
  return *this;
}
```


In addition, utilize these methods before adding program inputs, outputs, or uniform variables. For example, in conv.cc, in ComputeInternal(), one can do this:
```
    program.CacheHint(activation_.ToString(), std::to_string(is_channels_last))
        .ReserveInputCapacity(has_bias ? 3 : 2)
        .AddInput({input, ProgramTensorMetadataDependency::TypeAndRank, input_shape, 1})
        .AddInput({kernel, ProgramTensorMetadataDependency::TypeAndRank, kernel_shape, 1})
...
        .ReserveUniformVariableCapacity(6)
        .AddUniformVariables({{static_cast<uint32_t>(output_size)}, {dilations}, {strides}, {updated_pads}, {static_cast<uint32_t>(output_channels_per_group)}, {static_cast<uint32_t>(components)}})
...
```


### To reproduce

Use some kind of profiler to check whether reallocation occurs when adding program inputs, outputs, or uniform variables and its impact on performance. For LLVM, one might see __emplace_back_slow_path.

### Urgency

_No response_

### Platform

Other / Unknown

### OS Version

Custom

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

1.26.0

### ONNX Runtime API

C++

### Architecture

Other / Unknown

### Execution Provider

Other / Unknown

### Execution Provider Library Version

WebGPU

### Model File

_No response_

### Is this a quantized model?

Unknown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WebGPU] Proposal: C++ optimization by reserving program inputs, output, and uniform variables #28516

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[WebGPU] Proposal: C++ optimization by reserving program inputs, output, and uniform variables #28516

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions