feat: add maxUtilizationPercent option to quota checker#218
feat: add maxUtilizationPercent option to quota checker#218mcowger merged 5 commits intomcowger:mainfrom
Conversation
Allows reserving quota on shared provider keys by lowering the exhaustion threshold below the default 99%. When any quota window reaches the configured percentage, the provider is placed on cooldown until the window resets. This enables tiered routing setups where a friend's API key should only be used as emergency fallback — e.g. maxUtilizationPercent: 30 means the provider is cooled down at 30% usage, preserving 70% for the key owner. Changes: - Add maxUtilizationPercent (0-100, default 99) to SyntheticQuotaCheckerOptionsSchema - Replace hardcoded EXHAUSTION_THRESHOLD in quota-scheduler.ts with per-checker configurable threshold from options.maxUtilizationPercent - Include threshold in cooldown log messages for observability - Add 4 test cases covering default and custom threshold behavior - Document the option in CONFIGURATION.md with example config
Based on subagent code review, fixes:
1. CRITICAL: Multi-checker cooldown clearing bug — a lenient checker
(threshold=99) could clear a cooldown set by a strict checker
(threshold=30). Fixed by only allowing the strictest checker for
a provider to clear the provider-wide cooldown. Others are blocked.
2. HIGH: min(0) → min(1) in Zod schema. maxUtilizationPercent=0 was
a footgun (always triggers cooldown). Use enabled:false instead.
3. HIGH: Refactor exhaustionThreshold from unsafe options cast to a
proper getter on QuotaChecker base class. SyntheticQuotaChecker
overrides it via getOption('maxUtilizationPercent', 99). Scheduler
reads checker.exhaustionThreshold — type-safe, no casts.
4. MEDIUM: Update stale '≥99%' comment in cooldown-manager.ts.
5. MEDIUM: Add 3 new test cases:
- Cooldown clearing when utilization drops below threshold
- Multiple windows where only one exceeds threshold
- Lenient checker cannot clear strict checker's cooldown
Co-authored-by: Gwyn <gwyn@gwyn.tel>
|
Thanks for this contribution! This is a useful feature for shared API keys. A note: there is no UI to configure this option in the frontend. Also, YAML configuration is now deprecated in favor of database-driven config (via the management API). Could you add the UI element to Example: Note: you'll need to update the handleChange function to accept number types. |
Adds a number input field for maxUtilizationPercent (1–100, default 99) to the Synthetic quota checker configuration component. This enables database-driven config (management API) users to set the exhaustion threshold from the Plexus dashboard without editing YAML. Also updates handleChange to accept string | number types.
Based on Gemini CLI review: 1. CRITICAL: Empty input field bug — clearing maxUtilizationPercent set the value to '' (empty string), which JS evaluates as 0 in numeric comparisons, causing instant cooldown at any utilization. Fixed by emitting undefined instead of '' when field is cleared. 2. Move exhaustionThreshold getter to QuotaChecker base class — all checker types now benefit from maxUtilizationPercent support without individual overrides. Uses typeof check with val > 0 guard to handle '' and null/undefined gracefully. 3. Remove SyntheticQuotaChecker override — base class handles it. 4. Update docs: maxUtilizationPercent is now documented as available on all checker types (not just synthetic), since the getter lives on the base class.
|
Thanks for the review! I've pushed updates addressing both your feedback and additional issues found during a Gemini code review. Your requests — implemented✅ Added UI element to Gemini review — additional fixesCritical bug: empty field = instant cooldown. Clearing the input field emitted Moved Updated docs — Commit summary
All 1866 backend tests pass, frontend build passes, Biome format clean. |
🤖 pi ResponseOverall Summary This PR implements a clean, backward-compatible feature for configurable quota utilization thresholds. The implementation is well-structured with proper separation of concerns: the base class provides a default implementation, the scheduler handles multi-checker safety (preventing lenient checkers from clearing strict cooldowns), and tests cover the key edge cases. The logic for determining when to clear cooldowns based on the strictest threshold is sound. One discrepancy to address: the documentation states |
## Summary
Adds a configurable `maxUtilizationPercent` option to quota checkers
that controls when a provider is placed on cooldown based on quota
utilization. The default threshold remains 99% (existing behavior
unchanged).
### Motivation
For shared API keys (e.g., a friend's Synthetic key used as emergency
fallback), you want to reserve most of the quota for the key owner.
Setting `maxUtilizationPercent: 30` means the provider is treated as
exhausted at 30% usage, preserving 70% for the key owner.
This enables tiered routing setups:
- **Primary**: Your own provider key (no limit)
- **Secondary**: Other providers
- **Tertiary**: Shared key with `maxUtilizationPercent: 30`
### Implementation
- **`QuotaChecker` base class**: Added `exhaustionThreshold` getter
(default 99). Subclasses can override to customize the threshold —
type-safe, no unsafe casts.
- **`SyntheticQuotaChecker`**: Overrides `exhaustionThreshold` to read
from `options.maxUtilizationPercent` (default 99). Config validated via
Zod (1–100).
- **`quota-scheduler.ts`**: Replaced hardcoded `EXHAUSTION_THRESHOLD =
99` with `checker.exhaustionThreshold ?? 99`. Cooldown logs include
threshold value.
- **Multi-checker safety**: Only the strictest checker for a provider
can clear a provider-wide quota cooldown. Prevents a lenient checker
(threshold=99) from clearing a cooldown set by a strict checker
(threshold=30).
- **`cooldown-manager.ts`**: Updated stale `≥99%` comment.
### Config Example
```yaml
providers:
friend-synthetic:
api_base_url:
chat: https://api.synthetic.new/openai/v1
api_key: syn_friends_api_key
quota_checker:
type: synthetic
enabled: true
intervalMinutes: 5
options:
maxUtilizationPercent: 30 # cap at 30%, preserve 70%
```
### Testing
- All 1866 backend tests pass (7 new tests added)
- Pre-commit hooks pass (Biome format + full test suite)
- Backward compatible — default behavior unchanged (99% threshold)
### Extensibility
Any future checker type that needs custom thresholds can override the
`exhaustionThreshold` getter — no config schema changes needed beyond
adding the option to that type's Zod schema.
---------
Co-authored-by: Gwyn <gwyn@gwyn.tel>
Co-authored-by: Matt Cowger <mcowger@users.noreply.github.com>
Summary
Adds a configurable
maxUtilizationPercentoption to quota checkers that controls when a provider is placed on cooldown based on quota utilization. The default threshold remains 99% (existing behavior unchanged).Motivation
For shared API keys (e.g., a friend's Synthetic key used as emergency fallback), you want to reserve most of the quota for the key owner. Setting
maxUtilizationPercent: 30means the provider is treated as exhausted at 30% usage, preserving 70% for the key owner.This enables tiered routing setups:
maxUtilizationPercent: 30Implementation
QuotaCheckerbase class: AddedexhaustionThresholdgetter (default 99). Subclasses can override to customize the threshold — type-safe, no unsafe casts.SyntheticQuotaChecker: OverridesexhaustionThresholdto read fromoptions.maxUtilizationPercent(default 99). Config validated via Zod (1–100).quota-scheduler.ts: Replaced hardcodedEXHAUSTION_THRESHOLD = 99withchecker.exhaustionThreshold ?? 99. Cooldown logs include threshold value.cooldown-manager.ts: Updated stale≥99%comment.Config Example
Testing
Extensibility
Any future checker type that needs custom thresholds can override the
exhaustionThresholdgetter — no config schema changes needed beyond adding the option to that type's Zod schema.