Skip to content

fix: stop quota scheduler cooldown drift#371

Closed
ndycode wants to merge 1 commit intomainfrom
fix/preemptive-quota-cooldown-drift
Closed

fix: stop quota scheduler cooldown drift#371
ndycode wants to merge 1 commit intomainfrom
fix/preemptive-quota-cooldown-drift

Conversation

@ndycode
Copy link
Copy Markdown
Owner

@ndycode ndycode commented Apr 6, 2026

Problem

PreemptiveQuotaScheduler.getDeferral() was computing cooldowns from the nearest reset window.

That causes cooldown drift when a later primary or secondary quota window is still active: the scheduler can resume traffic early even though another active quota window has not reset yet.

Fix

Use the longest relevant reset window instead of the nearest one.

  • For 429 snapshots, defer until the latest active reset across primary and secondary windows.
  • For quota-near-exhaustion snapshots, defer until the latest reset among the windows that are actually near exhaustion.

Changes

  • lib/preemptive-quota-scheduler.ts
    • compute per-window waits once
    • use the longest active wait for 429 deferrals
    • use the longest near-exhausted wait for quota deferrals
  • test/preemptive-quota-scheduler.test.ts
    • added regression coverage for 429 cooldowns using the longer reset window
    • added regression coverage for near-exhaustion cooldowns using the longer reset window

Validation

npx vitest run test/preemptive-quota-scheduler.test.ts
Test Files  1 passed (1)
Tests      16 passed (16)

npx vitest run
Test Files  222 passed (222)
Tests      3315 passed (3315)

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

replaces nearest-window cooldown selection with longest-window in getDeferral(), fixing premature quota resumption when a longer secondary reset window is still active. both the 429 and near-exhaustion paths now correctly defer until all relevant windows have reset, backed by two new regression tests.

Confidence Score: 5/5

safe to merge — the fix is correct, all 16 tests pass, and the change is well-scoped

both findings are p2 style/coverage; the core logic correctly uses Math.max across windows for both the 429 and near-exhaustion paths, which is exactly the stated intent. no concurrency issues — all scheduler methods are synchronous with no async gaps in the map mutations. no filesystem or token handling touched.

no files require special attention; redundancy in getDeferral is cosmetic only

Important Files Changed

Filename Overview
lib/preemptive-quota-scheduler.ts fixes cooldown drift by using Math.max across windows for both 429 and near-exhaustion paths; minor redundancy between waitCandidates and primaryWait/secondaryWait is cosmetic only
test/preemptive-quota-scheduler.test.ts adds two targeted regression tests for the new longest-window logic; missing edge case for expired 429 snapshots before prune fires

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["getDeferral(key, now)"] --> B{scheduler enabled?}
    B -- no --> Z[defer: false, waitMs: 0]
    B -- yes --> C{snapshot exists?}
    C -- no --> Z
    C -- yes --> D["compute primaryWait = max(0, primary.resetAtMs - now)\ncompute secondaryWait = max(0, secondary.resetAtMs - now)\nlongestWait = max(primaryWait, secondaryWait)"]
    D --> E{status === 429\nAND longestWait > 0?}
    E -- yes --> F["bounded = min(longestWait, maxDeferralMs)\ndefer: true, reason: rate-limit"]
    E -- no --> G{primaryNearExhausted\nOR secondaryNearExhausted?}
    G -- yes --> H["nearExhaustedWait = max(\n  primaryNearExhausted ? primaryWait : 0,\n  secondaryNearExhausted ? secondaryWait : 0\n)"]
    H --> I{nearExhaustedWait > 0?}
    I -- yes --> J["bounded = min(nearExhaustedWait, maxDeferralMs)\ndefer: true, reason: quota-near-exhaustion"]
    I -- no --> Z
    G -- no --> Z
Loading

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: lib/preemptive-quota-scheduler.ts
Line: 254-258

Comment:
**redundant `waitCandidates` array**

`longestWait` is equivalent to `Math.max(primaryWait, secondaryWait)` — both values are already computed with identical semantics above. the `waitCandidates` intermediate array and the second `.filter(v => v > 0)` are redundant (the first filter already guarantees `v > now`, so `v - now > 0` is always true).

```suggestion
		const longestWait = Math.max(primaryWait, secondaryWait);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: test/preemptive-quota-scheduler.test.ts
Line: 63-71

Comment:
**missing vitest regression: expired 429 cooldown**

there's no test asserting `getDeferral` returns `{ defer: false }` once both reset windows on a 429 snapshot have elapsed but before `prune()` runs. a future regression dropping the `> now` guard would silently defer on a stale snapshot and this would go undetected.

```ts
it("does not defer after 429 reset windows have elapsed", () => {
  const scheduler = new PreemptiveQuotaScheduler();
  scheduler.update("acc:model", {
    status: 429,
    primary: { resetAtMs: 5_000 },
    secondary: { resetAtMs: 8_000 },
    updatedAt: 1_000,
  });
  // both windows are now in the past
  expect(scheduler.getDeferral("acc:model", 10_000).defer).toBe(false);
});
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: stop quota scheduler cooldown drift" | Re-trigger Greptile

The preemptive quota scheduler calculated deferrals from the nearest
reset window. When a later primary or secondary reset was still active,
that shortened the effective cooldown and let requests resume too early.

Use the longest relevant reset window for 429 deferrals and for
near-exhaustion quota deferrals so cooldowns stay aligned with the
latest active quota window.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 6, 2026

Warning

Rate limit exceeded

@ndycode has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 25 minutes and 8 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 25 minutes and 8 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 075b9277-3083-4202-a9a5-663bb080218a

📥 Commits

Reviewing files that changed from the base of the PR and between 478f44c and e70a2ed.

📒 Files selected for processing (2)
  • lib/preemptive-quota-scheduler.ts
  • test/preemptive-quota-scheduler.test.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/preemptive-quota-cooldown-drift
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/preemptive-quota-cooldown-drift

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ndycode
Copy link
Copy Markdown
Owner Author

ndycode commented Apr 6, 2026

Superseded by #387, which rebuilds the full open PR stack onto one reviewed integration branch.

@ndycode
Copy link
Copy Markdown
Owner Author

ndycode commented Apr 6, 2026

Closing in favor of #387.

@ndycode ndycode closed this Apr 6, 2026
@ndycode ndycode deleted the fix/preemptive-quota-cooldown-drift branch April 12, 2026 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant