-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Implement BURST and STEP load patterns #217
Copy link
Copy link
Open
Labels
area: core-engineLoad generator, scheduler, async utilsLoad generator, scheduler, async utilspriority: P2Medium — address within quarterMedium — address within quartertype: featureNew feature or capabilityNew feature or capability
Description
Problem
LoadPatternType.BURST and LoadPatternType.STEP are defined in the config schema enum but have no scheduler implementation:
# src/inference_endpoint/config/schema.py:72-73
BURST = "burst"
STEP = "step"A user who selects either pattern in their YAML config will get no error, but also no expected behavior — the scheduler will silently fall through or raise an unhandled case.
Expected Behavior
BURST: Issue a configurable burst of N queries at a fixed interval (e.g. N queries every T seconds)STEP: Incrementally step up QPS in stages (useful for finding saturation point)
Files to Modify
src/inference_endpoint/config/schema.py— document the pattern parameterssrc/inference_endpoint/load_generator/scheduler.py— add scheduler implementationssrc/inference_endpoint/config/runtime_settings.py— ensure RuntimeSettings handles these patterns
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area: core-engineLoad generator, scheduler, async utilsLoad generator, scheduler, async utilspriority: P2Medium — address within quarterMedium — address within quartertype: featureNew feature or capabilityNew feature or capability