Skip to content

feat: add maxUtilizationPercent option to quota checker#218

Merged
mcowger merged 5 commits intomcowger:mainfrom
gwyntel:upstream/max-utilization-percent
Apr 19, 2026
Merged

feat: add maxUtilizationPercent option to quota checker#218
mcowger merged 5 commits intomcowger:mainfrom
gwyntel:upstream/max-utilization-percent

Conversation

@gwyntel-git
Copy link
Copy Markdown
Contributor

Summary

Adds a configurable maxUtilizationPercent option to quota checkers that controls when a provider is placed on cooldown based on quota utilization. The default threshold remains 99% (existing behavior unchanged).

Motivation

For shared API keys (e.g., a friend's Synthetic key used as emergency fallback), you want to reserve most of the quota for the key owner. Setting maxUtilizationPercent: 30 means the provider is treated as exhausted at 30% usage, preserving 70% for the key owner.

This enables tiered routing setups:

  • Primary: Your own provider key (no limit)
  • Secondary: Other providers
  • Tertiary: Shared key with maxUtilizationPercent: 30

Implementation

  • QuotaChecker base class: Added exhaustionThreshold getter (default 99). Subclasses can override to customize the threshold — type-safe, no unsafe casts.
  • SyntheticQuotaChecker: Overrides exhaustionThreshold to read from options.maxUtilizationPercent (default 99). Config validated via Zod (1–100).
  • quota-scheduler.ts: Replaced hardcoded EXHAUSTION_THRESHOLD = 99 with checker.exhaustionThreshold ?? 99. Cooldown logs include threshold value.
  • Multi-checker safety: Only the strictest checker for a provider can clear a provider-wide quota cooldown. Prevents a lenient checker (threshold=99) from clearing a cooldown set by a strict checker (threshold=30).
  • cooldown-manager.ts: Updated stale ≥99% comment.

Config Example

providers:
  friend-synthetic:
    api_base_url:
      chat: https://api.synthetic.new/openai/v1
    api_key: syn_friends_api_key
    quota_checker:
      type: synthetic
      enabled: true
      intervalMinutes: 5
      options:
        maxUtilizationPercent: 30  # cap at 30%, preserve 70%

Testing

  • All 1866 backend tests pass (7 new tests added)
  • Pre-commit hooks pass (Biome format + full test suite)
  • Backward compatible — default behavior unchanged (99% threshold)

Extensibility

Any future checker type that needs custom thresholds can override the exhaustionThreshold getter — no config schema changes needed beyond adding the option to that type's Zod schema.

Gwyn and others added 2 commits April 18, 2026 21:47
Allows reserving quota on shared provider keys by lowering the
exhaustion threshold below the default 99%. When any quota window
reaches the configured percentage, the provider is placed on cooldown
until the window resets.

This enables tiered routing setups where a friend's API key should only
be used as emergency fallback — e.g. maxUtilizationPercent: 30 means
the provider is cooled down at 30% usage, preserving 70% for the
key owner.

Changes:
- Add maxUtilizationPercent (0-100, default 99) to SyntheticQuotaCheckerOptionsSchema
- Replace hardcoded EXHAUSTION_THRESHOLD in quota-scheduler.ts with
  per-checker configurable threshold from options.maxUtilizationPercent
- Include threshold in cooldown log messages for observability
- Add 4 test cases covering default and custom threshold behavior
- Document the option in CONFIGURATION.md with example config
Based on subagent code review, fixes:

1. CRITICAL: Multi-checker cooldown clearing bug — a lenient checker
   (threshold=99) could clear a cooldown set by a strict checker
   (threshold=30). Fixed by only allowing the strictest checker for
   a provider to clear the provider-wide cooldown. Others are blocked.

2. HIGH: min(0) → min(1) in Zod schema. maxUtilizationPercent=0 was
   a footgun (always triggers cooldown). Use enabled:false instead.

3. HIGH: Refactor exhaustionThreshold from unsafe options cast to a
   proper getter on QuotaChecker base class. SyntheticQuotaChecker
   overrides it via getOption('maxUtilizationPercent', 99). Scheduler
   reads checker.exhaustionThreshold — type-safe, no casts.

4. MEDIUM: Update stale '≥99%' comment in cooldown-manager.ts.

5. MEDIUM: Add 3 new test cases:
   - Cooldown clearing when utilization drops below threshold
   - Multiple windows where only one exceeds threshold
   - Lenient checker cannot clear strict checker's cooldown

Co-authored-by: Gwyn <gwyn@gwyn.tel>
@mcowger
Copy link
Copy Markdown
Owner

mcowger commented Apr 19, 2026

Thanks for this contribution! This is a useful feature for shared API keys.

A note: there is no UI to configure this option in the frontend. Also, YAML configuration is now deprecated in favor of database-driven config (via the management API).

Could you add the UI element to packages/frontend/src/components/quota/SyntheticQuotaConfig.tsx?

Example:

<div className="flex flex-col gap-1">
  <label className="font-body text-[13px] font-medium text-text-secondary">
    Max Utilization Percent (optional)
  </label>
  <Input
    type="number"
    min={1}
    max={100}
    value={(options.maxUtilizationPercent as number) ?? ''}
    onChange={(e) => handleChange('maxUtilizationPercent', parseInt(e.target.value))}
    placeholder="99"
  />
  <span className="text-[10px] text-text-muted">
    Threshold (1-100) to trigger cooldown. Default: 99. Set lower to reserve quota.
  </span>
</div>

Note: you'll need to update the handleChange function to accept number types.

Gwyn added 2 commits April 19, 2026 00:38
Adds a number input field for maxUtilizationPercent (1–100, default 99)
to the Synthetic quota checker configuration component. This enables
database-driven config (management API) users to set the exhaustion
threshold from the Plexus dashboard without editing YAML.

Also updates handleChange to accept string | number types.
Based on Gemini CLI review:

1. CRITICAL: Empty input field bug — clearing maxUtilizationPercent set
   the value to '' (empty string), which JS evaluates as 0 in numeric
   comparisons, causing instant cooldown at any utilization. Fixed by
   emitting undefined instead of '' when field is cleared.

2. Move exhaustionThreshold getter to QuotaChecker base class — all
   checker types now benefit from maxUtilizationPercent support without
   individual overrides. Uses typeof check with val > 0 guard to handle
   '' and null/undefined gracefully.

3. Remove SyntheticQuotaChecker override — base class handles it.

4. Update docs: maxUtilizationPercent is now documented as available on
   all checker types (not just synthetic), since the getter lives on
   the base class.
@gwyntel-git
Copy link
Copy Markdown
Contributor Author

Thanks for the review! I've pushed updates addressing both your feedback and additional issues found during a Gemini code review.

Your requests — implemented

Added UI element to SyntheticQuotaConfig.tsx — number input with min/max/placeholder matching your example
Updated handleChange to accept string | number types

Gemini review — additional fixes

Critical bug: empty field = instant cooldown. Clearing the input field emitted "" (empty string). In JS, utilizationPercent >= "" evaluates to true for any value ≥ 0, so clearing the field would immediately trigger cooldown at 0% utilization. Fixed by emitting undefined instead of "" when the field is cleared — the backend ?? 99 default then takes over correctly.

Moved exhaustionThreshold to base QuotaChecker class. All 22+ checker types now support maxUtilizationPercent without individual overrides. The getter reads options.maxUtilizationPercent with a typeof val === 'number' && val > 0 guard to handle "", undefined, and null gracefully, falling back to the 99 default. Removed the now-redundant SyntheticQuotaChecker override.

Updated docsmaxUtilizationPercent is now documented as available on all checker types, not just synthetic.

Commit summary

Commit Description
feat: add maxUtilizationPercent Config schema, scheduler logic, tests, docs
fix: address code review feedback Multi-checker cooldown bug, exhaustionThreshold refactor, new tests
feat: add maxUtilizationPercent UI Frontend input, handleChange update
fix: address code review feedback Empty string bug, base class getter, docs update

All 1866 backend tests pass, frontend build passes, Biome format clean.

@github-actions
Copy link
Copy Markdown
Contributor

🤖 pi Response

Overall Summary

This PR implements a clean, backward-compatible feature for configurable quota utilization thresholds. The implementation is well-structured with proper separation of concerns: the base class provides a default implementation, the scheduler handles multi-checker safety (preventing lenient checkers from clearing strict cooldowns), and tests cover the key edge cases. The logic for determining when to clear cooldowns based on the strictest threshold is sound.

One discrepancy to address: the documentation states maxUtilizationPercent is "available on all checker types," but only the SyntheticQuotaCheckerOptionsSchema has been updated to include this field in config.ts.

@mcowger mcowger merged commit dc75cc0 into mcowger:main Apr 19, 2026
3 checks passed
github-actions Bot pushed a commit that referenced this pull request Apr 20, 2026
## Summary

Adds a configurable `maxUtilizationPercent` option to quota checkers
that controls when a provider is placed on cooldown based on quota
utilization. The default threshold remains 99% (existing behavior
unchanged).

### Motivation

For shared API keys (e.g., a friend's Synthetic key used as emergency
fallback), you want to reserve most of the quota for the key owner.
Setting `maxUtilizationPercent: 30` means the provider is treated as
exhausted at 30% usage, preserving 70% for the key owner.

This enables tiered routing setups:
- **Primary**: Your own provider key (no limit)
- **Secondary**: Other providers
- **Tertiary**: Shared key with `maxUtilizationPercent: 30`

### Implementation

- **`QuotaChecker` base class**: Added `exhaustionThreshold` getter
(default 99). Subclasses can override to customize the threshold —
type-safe, no unsafe casts.
- **`SyntheticQuotaChecker`**: Overrides `exhaustionThreshold` to read
from `options.maxUtilizationPercent` (default 99). Config validated via
Zod (1–100).
- **`quota-scheduler.ts`**: Replaced hardcoded `EXHAUSTION_THRESHOLD =
99` with `checker.exhaustionThreshold ?? 99`. Cooldown logs include
threshold value.
- **Multi-checker safety**: Only the strictest checker for a provider
can clear a provider-wide quota cooldown. Prevents a lenient checker
(threshold=99) from clearing a cooldown set by a strict checker
(threshold=30).
- **`cooldown-manager.ts`**: Updated stale `≥99%` comment.

### Config Example

```yaml
providers:
  friend-synthetic:
    api_base_url:
      chat: https://api.synthetic.new/openai/v1
    api_key: syn_friends_api_key
    quota_checker:
      type: synthetic
      enabled: true
      intervalMinutes: 5
      options:
        maxUtilizationPercent: 30  # cap at 30%, preserve 70%
```

### Testing

- All 1866 backend tests pass (7 new tests added)
- Pre-commit hooks pass (Biome format + full test suite)
- Backward compatible — default behavior unchanged (99% threshold)

### Extensibility

Any future checker type that needs custom thresholds can override the
`exhaustionThreshold` getter — no config schema changes needed beyond
adding the option to that type's Zod schema.

---------

Co-authored-by: Gwyn <gwyn@gwyn.tel>
Co-authored-by: Matt Cowger <mcowger@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants