Describe the issue
Problem:
The capacities of the three vector members of the ProgramBase class, inputs_, outputs_, and variables_, aren't reserved before a vector entry is added. This can cause a lot of reallocation as new vector entries are added and affects the performance.
Proposal:
Add 3 methods to the ProgramBase class:
ProgramBase& ProgramBase::ReserveInputCapacity(size_t capacity) {
inputs_.reserve(capacity);
return *this;
}
ProgramBase& ProgramBase::ReserveOutputCapacity(size_t capacity) {
outputs_.reserve(capacity);
return *this;
}
ProgramBase& ProgramBase::ReserveUniformVariableCapacity(size_t capacity) {
variables_.reserve(capacity);
return *this;
}
In addition, utilize these methods before adding program inputs, outputs, or uniform variables. For example, in conv.cc, in ComputeInternal(), one can do this:
program.CacheHint(activation_.ToString(), std::to_string(is_channels_last))
.ReserveInputCapacity(has_bias ? 3 : 2)
.AddInput({input, ProgramTensorMetadataDependency::TypeAndRank, input_shape, 1})
.AddInput({kernel, ProgramTensorMetadataDependency::TypeAndRank, kernel_shape, 1})
...
.ReserveUniformVariableCapacity(6)
.AddUniformVariables({{static_cast<uint32_t>(output_size)}, {dilations}, {strides}, {updated_pads}, {static_cast<uint32_t>(output_channels_per_group)}, {static_cast<uint32_t>(components)}})
...
To reproduce
Use some kind of profiler to check whether reallocation occurs when adding program inputs, outputs, or uniform variables and its impact on performance. For LLVM, one might see __emplace_back_slow_path.
Urgency
No response
Platform
Other / Unknown
OS Version
Custom
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.26.0
ONNX Runtime API
C++
Architecture
Other / Unknown
Execution Provider
Other / Unknown
Execution Provider Library Version
WebGPU
Model File
No response
Is this a quantized model?
Unknown
Describe the issue
Problem:
The capacities of the three vector members of the ProgramBase class, inputs_, outputs_, and variables_, aren't reserved before a vector entry is added. This can cause a lot of reallocation as new vector entries are added and affects the performance.
Proposal:
Add 3 methods to the ProgramBase class:
In addition, utilize these methods before adding program inputs, outputs, or uniform variables. For example, in conv.cc, in ComputeInternal(), one can do this:
To reproduce
Use some kind of profiler to check whether reallocation occurs when adding program inputs, outputs, or uniform variables and its impact on performance. For LLVM, one might see __emplace_back_slow_path.
Urgency
No response
Platform
Other / Unknown
OS Version
Custom
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.26.0
ONNX Runtime API
C++
Architecture
Other / Unknown
Execution Provider
Other / Unknown
Execution Provider Library Version
WebGPU
Model File
No response
Is this a quantized model?
Unknown