feat: add maxUtilizationPercent option to quota checker by gwyntel-git · Pull Request #218 · mcowger/plexus

gwyntel-git · 2026-04-19T04:47:57Z

Summary

Adds a configurable maxUtilizationPercent option to quota checkers that controls when a provider is placed on cooldown based on quota utilization. The default threshold remains 99% (existing behavior unchanged).

Motivation

For shared API keys (e.g., a friend's Synthetic key used as emergency fallback), you want to reserve most of the quota for the key owner. Setting maxUtilizationPercent: 30 means the provider is treated as exhausted at 30% usage, preserving 70% for the key owner.

This enables tiered routing setups:

Primary: Your own provider key (no limit)
Secondary: Other providers
Tertiary: Shared key with maxUtilizationPercent: 30

Implementation

QuotaChecker base class: Added exhaustionThreshold getter (default 99). Subclasses can override to customize the threshold — type-safe, no unsafe casts.
SyntheticQuotaChecker: Overrides exhaustionThreshold to read from options.maxUtilizationPercent (default 99). Config validated via Zod (1–100).
quota-scheduler.ts: Replaced hardcoded EXHAUSTION_THRESHOLD = 99 with checker.exhaustionThreshold ?? 99. Cooldown logs include threshold value.
Multi-checker safety: Only the strictest checker for a provider can clear a provider-wide quota cooldown. Prevents a lenient checker (threshold=99) from clearing a cooldown set by a strict checker (threshold=30).
cooldown-manager.ts: Updated stale ≥99% comment.

Config Example

providers:
  friend-synthetic:
    api_base_url:
      chat: https://api.synthetic.new/openai/v1
    api_key: syn_friends_api_key
    quota_checker:
      type: synthetic
      enabled: true
      intervalMinutes: 5
      options:
        maxUtilizationPercent: 30  # cap at 30%, preserve 70%

Testing

All 1866 backend tests pass (7 new tests added)
Pre-commit hooks pass (Biome format + full test suite)
Backward compatible — default behavior unchanged (99% threshold)

Extensibility

Any future checker type that needs custom thresholds can override the exhaustionThreshold getter — no config schema changes needed beyond adding the option to that type's Zod schema.

Allows reserving quota on shared provider keys by lowering the exhaustion threshold below the default 99%. When any quota window reaches the configured percentage, the provider is placed on cooldown until the window resets. This enables tiered routing setups where a friend's API key should only be used as emergency fallback — e.g. maxUtilizationPercent: 30 means the provider is cooled down at 30% usage, preserving 70% for the key owner. Changes: - Add maxUtilizationPercent (0-100, default 99) to SyntheticQuotaCheckerOptionsSchema - Replace hardcoded EXHAUSTION_THRESHOLD in quota-scheduler.ts with per-checker configurable threshold from options.maxUtilizationPercent - Include threshold in cooldown log messages for observability - Add 4 test cases covering default and custom threshold behavior - Document the option in CONFIGURATION.md with example config

Based on subagent code review, fixes: 1. CRITICAL: Multi-checker cooldown clearing bug — a lenient checker (threshold=99) could clear a cooldown set by a strict checker (threshold=30). Fixed by only allowing the strictest checker for a provider to clear the provider-wide cooldown. Others are blocked. 2. HIGH: min(0) → min(1) in Zod schema. maxUtilizationPercent=0 was a footgun (always triggers cooldown). Use enabled:false instead. 3. HIGH: Refactor exhaustionThreshold from unsafe options cast to a proper getter on QuotaChecker base class. SyntheticQuotaChecker overrides it via getOption('maxUtilizationPercent', 99). Scheduler reads checker.exhaustionThreshold — type-safe, no casts. 4. MEDIUM: Update stale '≥99%' comment in cooldown-manager.ts. 5. MEDIUM: Add 3 new test cases: - Cooldown clearing when utilization drops below threshold - Multiple windows where only one exceeds threshold - Lenient checker cannot clear strict checker's cooldown Co-authored-by: Gwyn <gwyn@gwyn.tel>

mcowger · 2026-04-19T07:19:36Z

Thanks for this contribution! This is a useful feature for shared API keys.

A note: there is no UI to configure this option in the frontend. Also, YAML configuration is now deprecated in favor of database-driven config (via the management API).

Could you add the UI element to packages/frontend/src/components/quota/SyntheticQuotaConfig.tsx?

Example:

<div className="flex flex-col gap-1">
  <label className="font-body text-[13px] font-medium text-text-secondary">
    Max Utilization Percent (optional)
  </label>
  <Input
    type="number"
    min={1}
    max={100}
    value={(options.maxUtilizationPercent as number) ?? ''}
    onChange={(e) => handleChange('maxUtilizationPercent', parseInt(e.target.value))}
    placeholder="99"
  />
  <span className="text-[10px] text-text-muted">
    Threshold (1-100) to trigger cooldown. Default: 99. Set lower to reserve quota.
  </span>
</div>

Note: you'll need to update the handleChange function to accept number types.

Adds a number input field for maxUtilizationPercent (1–100, default 99) to the Synthetic quota checker configuration component. This enables database-driven config (management API) users to set the exhaustion threshold from the Plexus dashboard without editing YAML. Also updates handleChange to accept string | number types.

Based on Gemini CLI review: 1. CRITICAL: Empty input field bug — clearing maxUtilizationPercent set the value to '' (empty string), which JS evaluates as 0 in numeric comparisons, causing instant cooldown at any utilization. Fixed by emitting undefined instead of '' when field is cleared. 2. Move exhaustionThreshold getter to QuotaChecker base class — all checker types now benefit from maxUtilizationPercent support without individual overrides. Uses typeof check with val > 0 guard to handle '' and null/undefined gracefully. 3. Remove SyntheticQuotaChecker override — base class handles it. 4. Update docs: maxUtilizationPercent is now documented as available on all checker types (not just synthetic), since the getter lives on the base class.

gwyntel-git · 2026-04-19T07:46:35Z

Thanks for the review! I've pushed updates addressing both your feedback and additional issues found during a Gemini code review.

Your requests — implemented

✅ Added UI element to SyntheticQuotaConfig.tsx — number input with min/max/placeholder matching your example
✅ Updated handleChange to accept string | number types

Gemini review — additional fixes

Critical bug: empty field = instant cooldown. Clearing the input field emitted "" (empty string). In JS, utilizationPercent >= "" evaluates to true for any value ≥ 0, so clearing the field would immediately trigger cooldown at 0% utilization. Fixed by emitting undefined instead of "" when the field is cleared — the backend ?? 99 default then takes over correctly.

Moved exhaustionThreshold to base QuotaChecker class. All 22+ checker types now support maxUtilizationPercent without individual overrides. The getter reads options.maxUtilizationPercent with a typeof val === 'number' && val > 0 guard to handle "", undefined, and null gracefully, falling back to the 99 default. Removed the now-redundant SyntheticQuotaChecker override.

Updated docs — maxUtilizationPercent is now documented as available on all checker types, not just synthetic.

Commit summary

Commit	Description
`feat: add maxUtilizationPercent`	Config schema, scheduler logic, tests, docs
`fix: address code review feedback`	Multi-checker cooldown bug, exhaustionThreshold refactor, new tests
`feat: add maxUtilizationPercent UI`	Frontend input, handleChange update
`fix: address code review feedback`	Empty string bug, base class getter, docs update

All 1866 backend tests pass, frontend build passes, Biome format clean.

github-actions · 2026-04-19T07:56:25Z

🤖 pi Response

Overall Summary

This PR implements a clean, backward-compatible feature for configurable quota utilization thresholds. The implementation is well-structured with proper separation of concerns: the base class provides a default implementation, the scheduler handles multi-checker safety (preventing lenient checkers from clearing strict cooldowns), and tests cover the key edge cases. The logic for determining when to clear cooldowns based on the strictest threshold is sound.

One discrepancy to address: the documentation states maxUtilizationPercent is "available on all checker types," but only the SyntheticQuotaCheckerOptionsSchema has been updated to include this field in config.ts.

## Summary Adds a configurable `maxUtilizationPercent` option to quota checkers that controls when a provider is placed on cooldown based on quota utilization. The default threshold remains 99% (existing behavior unchanged). ### Motivation For shared API keys (e.g., a friend's Synthetic key used as emergency fallback), you want to reserve most of the quota for the key owner. Setting `maxUtilizationPercent: 30` means the provider is treated as exhausted at 30% usage, preserving 70% for the key owner. This enables tiered routing setups: - **Primary**: Your own provider key (no limit) - **Secondary**: Other providers - **Tertiary**: Shared key with `maxUtilizationPercent: 30` ### Implementation - **`QuotaChecker` base class**: Added `exhaustionThreshold` getter (default 99). Subclasses can override to customize the threshold — type-safe, no unsafe casts. - **`SyntheticQuotaChecker`**: Overrides `exhaustionThreshold` to read from `options.maxUtilizationPercent` (default 99). Config validated via Zod (1–100). - **`quota-scheduler.ts`**: Replaced hardcoded `EXHAUSTION_THRESHOLD = 99` with `checker.exhaustionThreshold ?? 99`. Cooldown logs include threshold value. - **Multi-checker safety**: Only the strictest checker for a provider can clear a provider-wide quota cooldown. Prevents a lenient checker (threshold=99) from clearing a cooldown set by a strict checker (threshold=30). - **`cooldown-manager.ts`**: Updated stale `≥99%` comment. ### Config Example ```yaml providers: friend-synthetic: api_base_url: chat: https://api.synthetic.new/openai/v1 api_key: syn_friends_api_key quota_checker: type: synthetic enabled: true intervalMinutes: 5 options: maxUtilizationPercent: 30 # cap at 30%, preserve 70% ``` ### Testing - All 1866 backend tests pass (7 new tests added) - Pre-commit hooks pass (Biome format + full test suite) - Backward compatible — default behavior unchanged (99% threshold) ### Extensibility Any future checker type that needs custom thresholds can override the `exhaustionThreshold` getter — no config schema changes needed beyond adding the option to that type's Zod schema. --------- Co-authored-by: Gwyn <gwyn@gwyn.tel> Co-authored-by: Matt Cowger <mcowger@users.noreply.github.com>

Gwyn and others added 2 commits April 18, 2026 21:47

Gwyn added 2 commits April 19, 2026 00:38

Merge branch 'main' into upstream/max-utilization-percent

6f7c7ba

mcowger merged commit dc75cc0 into mcowger:main Apr 19, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add maxUtilizationPercent option to quota checker#218

feat: add maxUtilizationPercent option to quota checker#218
mcowger merged 5 commits intomcowger:mainfrom
gwyntel:upstream/max-utilization-percent

gwyntel-git commented Apr 19, 2026

Uh oh!

mcowger commented Apr 19, 2026 •

edited

Loading

Uh oh!

gwyntel-git commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gwyntel-git commented Apr 19, 2026

Summary

Motivation

Implementation

Config Example

Testing

Extensibility

Uh oh!

mcowger commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gwyntel-git commented Apr 19, 2026

Your requests — implemented

Gemini review — additional fixes

Commit summary

Uh oh!

github-actions Bot commented Apr 19, 2026

🤖 pi Response

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mcowger commented Apr 19, 2026 •

edited

Loading