fix fast-expiry token refresh persistence in account pool by ndycode · Pull Request #323 · ndycode/codex-multi-auth

ndycode · 2026-03-31T21:52:27Z

Summary

persist refreshed auth transactionally so fast-expiry refreshes do not get overwritten by stale pool snapshots

What Changed

added AccountManager.commitRefreshedAuth(...) and identity-based account matching so refresh results update the correct stored row before any later pool snapshot save
updated the runtime refresh paths to use transactional persistence for both the main request loop and stream failover refreshes
treated transient refresh and auth-store write failures as retryable refresh errors instead of terminal auth-invalid failures, and extended the proactive refresh / guardian coverage around fast-expiry cases

Validation

Docs and Governance Checklist

README updated (if user-visible behavior changed)
docs/getting-started.md updated (if onboarding flow changed)
docs/features.md updated (if capability surface changed)
relevant docs/reference/* pages updated (if commands/settings/paths changed)
docs/upgrade.md updated (if migration behavior changed)
SECURITY.md and CONTRIBUTING.md reviewed for alignment

Risk and Rollback

Risk level: moderate; this changes how refreshed auth is persisted and how transient refresh failures are classified in the runtime account loop
Rollback plan: revert commit 9cbc72a on fix/fast-expiry-refresh-persistence or revert the PR if the new persistence path causes account rotation regressions

Additional Notes

rebased onto current main and re-ran the full validation gate on the rebased branch

note: greptile review for oc-chatgpt-multi-auth. cite files like `lib/foo.ts:123`. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this pr fixes fast-expiry token refresh persistence by replacing the ad-hoc updateFromAuth + clearAuthFailures + saveToDiskDebounced triple with a single commitRefreshedAuth call that runs under withStorageLock, builds a fresh pool snapshot, updates both the on-disk record and the live in-memory account atomically, and rolls back in-memory changes if the write fails.

key changes:

AccountManager.commitRefreshedAuth — transactional persist with in-memory rollback on storage failure; identity-based account matching so the correct row is updated even if the index has shifted
index.ts — main loop and stream failover paths use commitRefreshedAuth; transient refresh/persistence errors (EAGAIN, EBUSY, EPERM, network_error) now trigger a network-class cooldown+continue instead of incrementAuthFailures
lib/proactive-refresh.ts — refreshExpiringAccounts gains an onResult callback so the guardian can commit each refresh transactionally as results arrive in parallel; shouldRefreshProactively now correctly returns true when the access token is missing even if expires is undefined
lib/refresh-guardian.ts — pre-filters cooling-down accounts, routes success results through commitRefreshedAuth, and only triggers saveToDiskDebounced for failure-cooldown writes (not for already-persisted successes)
lib/request/fetch-helpers.ts — isRetryableRefreshFailure classifies network_error/unknown/invalid_response as retryable and 400/401/403 http_error as terminal; missing_refresh stays terminal
tests updated across all six affected files with strong coverage of the transactional commit, rollback, retryable vs terminal error paths, and the cooling-down skip

Confidence Score: 4/5

solid approach with one P1 data-integrity edge case in commitRefreshedAuth where tokens are written to disk without in-memory update when liveAccount lookup fails post-snapshot

the lock-based transactional commit, retryable error classification, and rollback logic are all well-designed and well-tested. previous P1 concerns (missing_refresh as retryable, storage write triggering incrementAuthFailures on healthy accounts) are resolved. the remaining P1 is the "disk write proceeds but live pool not updated" path in commitRefreshedAuth when getAccountByIdentity returns null after storageIndex was found — this leaves the guardian in a perpetual refresh loop for the affected account until restart. the duplicate retryable helper (P2) and missing concurrent-commit test (P2) don't block merge.

lib/accounts.ts — specifically the await persist(nextStorage) call at line ~401 in the "liveAccount not found" branch of commitRefreshedAuth

Important Files Changed

Filename	Overview
lib/accounts.ts	core change: adds `commitRefreshedAuth` with transactional disk write, in-memory rollback on persist failure, and identity-based account matching; one P1 issue (disk write proceeds without in-memory update when `liveAccount` lookup fails post-snapshot) and one P2 (duplicate retryable error checker)
index.ts	main request loop and stream failover now use `commitRefreshedAuth`; retryable vs terminal refresh errors classified correctly; `sessionAffinityStore` forget moved before the retryable branch — looks correct
lib/proactive-refresh.ts	adds optional `onResult` callback to `refreshExpiringAccounts`, reorders `shouldRefreshProactively` checks so missing access-token triggers refresh even without expiry, extends `applyRefreshResult` with identity-field updates — all changes look correct
lib/refresh-guardian.ts	replaces inline `applyRefreshResult`+`clearAuthFailures` with `commitRefreshedAuth` via new `applyRefreshOutcome` helper; pre-filters cooling-down accounts before passing to `refreshExpiringAccounts`; `requiresSave` flag correctly distinguishes failure cooldowns (debounced save) from successful commits (already persisted)
lib/request/fetch-helpers.ts	adds `isRetryableRefreshFailure` and `isRetryableAuthSetterError`; wraps `authSetter.set` failure in `CodexAuthError`; `missing_refresh` correctly returns `false` (terminal) — logic is sound but `isRetryableAuthSetterError` duplicates `isRetryableAuthPersistenceError` in accounts.ts
test/accounts.test.ts	new `commitRefreshedAuth` suite covers transactional write, rollback on EBUSY, terminal EACCES, pool-state preservation, and debounced-save race; missing test for two concurrent commits on different accounts
test/refresh-guardian.test.ts	updated to use `commitRefreshedAuth` mock instead of `applyRefreshResult`; adds cooling-down skip test, null-commit cooldown test, and commit error classification; coverage looks solid
test/fetch-helpers.test.ts	adds tests for retryable network errors, terminal 401 and missing_refresh, EBUSY and EACCES auth persistence failures — comprehensive coverage of the new error classification
test/proactive-refresh.test.ts	adds `onResult` invocation test and identity-field update test for `applyRefreshResult`; also adds the access-missing-without-expiry path for `shouldRefreshProactively` — good coverage
test/accounts-edge.test.ts	plumbs `withAccountStorageTransaction` mock into edge-case setup; no functional change to test logic
test/accounts-load-from-disk.test.ts	migrates to `vi.hoisted` mocks and plumbs `withAccountStorageTransaction`; assertions updated to use named mocks — cleaner setup, no logic change

Sequence Diagram

sequenceDiagram
    participant RL as RequestLoop / Guardian
    participant CM as commitRefreshedAuth
    participant Lock as withStorageLock
    participant Mem as LivePool (in-memory)
    participant Disk as Storage (disk)

    RL->>CM: commitRefreshedAuth(source, auth)
    CM->>Lock: acquire lock
    Lock-->>CM: lock held
    CM->>CM: buildStorageSnapshot() → nextStorage
    CM->>CM: findAccountIndexByIdentity(nextStorage, source, auth)
    CM->>CM: update storedAccount in nextStorage (tokens, identity)
    CM->>Mem: getAccountByIdentity(source, auth) → liveAccount
    CM->>Mem: updateFromAuth(liveAccount, auth)
    CM->>Mem: clearAuthFailures / clearCooldown / enabled=true
    CM->>Disk: persist(nextStorage)
    alt persist succeeds
        Disk-->>CM: ok
        CM->>Lock: release lock
        CM-->>RL: return liveAccount
    else persist fails (EBUSY/EPERM)
        Disk-->>CM: error
        CM->>Mem: rollback liveAccount to previousState
        CM->>Lock: release lock
        CM-->>RL: throw CodexAuthError(retryable=true)
    else persist fails (EACCES/other)
        Disk-->>CM: error
        CM->>Mem: rollback liveAccount to previousState
        CM->>Lock: release lock
        CM-->>RL: throw CodexAuthError(retryable=false)
    end

Comments Outside Diff (3)

lib/refresh-guardian.ts, line 463-477 (link)

redundant double update on the same live account object

in the "success" branch of applyRefreshOutcome, the code:

calls applyRefreshResult(account, result.tokenResult) which writes access, refresh, expires, accountId, email to the live account
calls manager.clearAuthFailures(account) on the live account
then calls await manager.commitRefreshedAuth(sourceAccount, refreshedAuth) which calls this.updateFromAuth(liveAccount, auth) + this.clearAuthFailures(liveAccount) + this.clearAccountCooldown(liveAccount) on the same account object

steps 1-2 are fully superseded by step 3's internal calls. consider removing the pre-commit applyRefreshResult / clearAuthFailures lines and relying solely on commitRefreshedAuth for consistency.

Prompt To Fix With AI

This is a comment left during a code review.
Path: lib/refresh-guardian.ts
Line: 463-477

Comment:
**redundant double update on the same live account object**

in the `"success"` branch of `applyRefreshOutcome`, the code:

1. calls `applyRefreshResult(account, result.tokenResult)` which writes `access`, `refresh`, `expires`, `accountId`, `email` to the live account
2. calls `manager.clearAuthFailures(account)` on the live account
3. then calls `await manager.commitRefreshedAuth(sourceAccount, refreshedAuth)` which calls `this.updateFromAuth(liveAccount, auth)` + `this.clearAuthFailures(liveAccount)` + `this.clearAccountCooldown(liveAccount)` on the **same** account object

steps 1-2 are fully superseded by step 3's internal calls. consider removing the pre-commit `applyRefreshResult` / `clearAuthFailures` lines and relying solely on `commitRefreshedAuth` for consistency.

How can I resolve this? If you propose a fix, please make it concise.

test/accounts.test.ts, line 718-802 (link)

missing vitest coverage for commitRefreshedAuth storage write failure path

the test suite covers the happy path but has no case where withAccountStorageTransaction rejects. given the windows EBUSY concern and the p1 issue where a storage throw incorrectly routes through incrementAuthFailures, a test verifying the error propagation (and the expected type/retryable flag once the fix is applied) would close the gap. add a case like:

it("propagates storage write failure as retryable CodexAuthError", async () => {
  mockWithAccountStorageTransaction.mockRejectedValueOnce(
    Object.assign(new Error("EBUSY"), { code: "EBUSY" }),
  );
  await expect(manager.commitRefreshedAuth(account, refreshedAuth))
    .rejects.toMatchObject({ retryable: true });
});

Prompt To Fix With AI

This is a comment left during a code review.
Path: test/accounts.test.ts
Line: 718-802

Comment:
**missing vitest coverage for `commitRefreshedAuth` storage write failure path**

the test suite covers the happy path but has no case where `withAccountStorageTransaction` rejects. given the windows EBUSY concern and the p1 issue where a storage throw incorrectly routes through `incrementAuthFailures`, a test verifying the error propagation (and the expected type/retryable flag once the fix is applied) would close the gap. add a case like:

```ts
it("propagates storage write failure as retryable CodexAuthError", async () => {
  mockWithAccountStorageTransaction.mockRejectedValueOnce(
    Object.assign(new Error("EBUSY"), { code: "EBUSY" }),
  );
  await expect(manager.commitRefreshedAuth(account, refreshedAuth))
    .rejects.toMatchObject({ retryable: true });
});
```

How can I resolve this? If you propose a fix, please make it concise.

lib/accounts.ts, line 400-406 (link)

disk write with stale in-memory when liveAccount lookup fails

when storageIndex is found in the snapshot but getAccountByIdentity returns null (account removed from live pool between snapshot and lookup), persist(nextStorage) is still called — writing the new tokens to disk — but in-memory this.accounts is never updated. the live pool carries the old expiry/access token for the remainder of the process lifetime; the account will keep triggering proactive refresh on every guardian tick until restart.

the warning log covers observability, but the correct action here is to skip the write (or roll back), not silently persist tokens that can't be applied in-memory. at minimum, the persist on the "no live account" path should be gated the same way as the guarded path, or the caller should be notified the commit was a no-op.

Prompt To Fix With AI

This is a comment left during a code review.
Path: lib/accounts.ts
Line: 400-406

Comment:
**disk write with stale in-memory when liveAccount lookup fails**

when `storageIndex` is found in the snapshot but `getAccountByIdentity` returns `null` (account removed from live pool between snapshot and lookup), `persist(nextStorage)` is still called — writing the new tokens to disk — but in-memory `this.accounts` is never updated. the live pool carries the old expiry/access token for the remainder of the process lifetime; the account will keep triggering proactive refresh on every guardian tick until restart.

the warning log covers observability, but the correct action here is to skip the write (or roll back), not silently persist tokens that can't be applied in-memory. at minimum, the `persist` on the "no live account" path should be gated the same way as the guarded path, or the caller should be notified the commit was a no-op.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: lib/accounts.ts
Line: 164-194

Comment:
**duplicate retryable error checker vs `isRetryableAuthSetterError` in `fetch-helpers.ts`**

`isRetryableAuthPersistenceError` (here) and `isRetryableAuthSetterError` (`lib/request/fetch-helpers.ts:452-478`) are functionally identical — both check `EAGAIN/EBUSY/EPERM`, `HTTP 429`, and recurse into `.cause`. having two copies risks the sets diverging silently (e.g. a new code added to one but not the other).

extract to a shared helper in `lib/errors.ts` or `lib/utils.ts` and import from both call sites — consistent with the project's existing pattern of centralising storage error logic there.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/accounts.ts
Line: 400-406

Comment:
**disk write with stale in-memory when liveAccount lookup fails**

when `storageIndex` is found in the snapshot but `getAccountByIdentity` returns `null` (account removed from live pool between snapshot and lookup), `persist(nextStorage)` is still called — writing the new tokens to disk — but in-memory `this.accounts` is never updated. the live pool carries the old expiry/access token for the remainder of the process lifetime; the account will keep triggering proactive refresh on every guardian tick until restart.

the warning log covers observability, but the correct action here is to skip the write (or roll back), not silently persist tokens that can't be applied in-memory. at minimum, the `persist` on the "no live account" path should be gated the same way as the guarded path, or the caller should be notified the commit was a no-op.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: test/accounts.test.ts
Line: 1059-1142

Comment:
**missing vitest coverage: two concurrent `commitRefreshedAuth` calls on different accounts**

the debounced-save race test (`"prevents debounced saves from writing stale auth during refresh persistence"`) covers `commitRefreshedAuth` vs `saveToDiskDebounced` interleaving under the lock. there's no test that launches `commitRefreshedAuth` for accounts A and B concurrently and asserts that:

1. B's snapshot captures A's already-committed token updates (because A runs first under the lock)
2. neither account's tokens overwrite the other's on disk

this is the core serialisation guarantee of `withStorageLock` — given the windows filesystem EBUSY risk and the fact that the guardian now fires all account refreshes in parallel via `Promise.all`, this scenario is realistic and the lock ordering should be vitest-covered.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (6): Last reviewed commit: "test: cover trimmed refreshed account id..." | Re-trigger Greptile}

chatgpt-codex-connector · 2026-03-31T21:52:34Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

coderabbitai · 2026-03-31T21:52:40Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

walkthrough

the pr separates retryable vs terminal oauth refresh failures, adds identity-based account lookup and transactional commit of refreshed auth, and propagates refresh outcomes through proactive refresh and refresh guardian paths.

changes

Cohort / File(s)	Summary
oauth refresh flow `lib/request/fetch-helpers.ts:line`, `index.ts:line`	classifies refresh failures into retryable vs non-retryable and throws `CodexAuthError(retryable)`; refresh success paths now call `AccountManager.commitRefreshedAuth(...)` instead of ad-hoc updates; retryable refresh failures trigger immediate session-affinity forget and alternate guardian handling.
account storage & identity resolution `lib/accounts.ts:line`	introduces `AccountIdentityCandidate`, `getAccountByIdentity(...)`, `commitRefreshedAuth(...)`, and `buildStorageSnapshot()`; uses `withAccountStorageTransaction(...)` for transactional persistence and maps refresh token/account/email into persisted storage while clearing cooldown fields.
proactive refresh & guardian orchestration `lib/proactive-refresh.ts:line`, `lib/refresh-guardian.ts:line`	`refreshExpiringAccounts` accepts `onResult` and applies token-derived identity/email updates; `RefreshGuardian.tick()` filters cooling-down accounts, delegates outcomes to `applyRefreshOutcome(...)` which calls `commitRefreshedAuth(...)`, and only persists when a change requires saving.
tests updated `test/accounts.test.ts:line`, `test/fetch-helpers.test.ts:line`, `test/proactive-refresh.test.ts:line`, `test/refresh-guardian.test.ts:line`, `test/accounts-edge.test.ts:line`, `test/accounts-load-from-disk.test.ts:line`	tests added/updated to cover transactional commit behavior, `CodexAuthError.retryable` semantics, `onResult` invocation, identity extraction from tokens, cooling-down exclusion, debounce/concurrency scenarios, and transactional error mapping.

sequence diagram(s)

sequenceDiagram
    actor client
    participant fetch as "lib/request/fetch-helpers.ts"
    participant oauth as "oauth service"
    participant guardian as "lib/refresh-guardian.ts"
    participant manager as "lib/accounts.ts"
    participant storage as "storage"

    client->>fetch: refreshAndUpdateToken(...)
    fetch->>oauth: queuedRefresh()
    oauth-->>fetch: result (success | failed {reason,statusCode})
    alt success
        fetch-->>client: refreshed auth
        client->>manager: commitRefreshedAuth(account, auth)
        manager->>storage: withAccountStorageTransaction(handler)
        storage-->>manager: persist(snapshot)
        manager-->>client: updated account
    else failed (retryable)
        fetch-->>client: throw CodexAuthError(retryable: true)
        client->>guardian: treat as network failure
        guardian->>manager: forget session affinity immediately
        guardian->>guardian: do not increment auth-failure counters
    else failed (terminal)
        fetch-->>client: throw CodexAuthError(retryable: false)
        client->>guardian: treat as auth failure
        guardian->>manager: incrementAuthFailures()
        guardian->>manager: forget session affinity
    end

estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

suggested labels

bug

notes and risks:

missing regression tests: ensure explicit regression tests for session-affinity behavior when retryable refreshes occur (index.ts:line, lib/refresh-guardian.ts:line).
windows edge cases: transactional file semantics and error codes mapping (EBUSY/EACCES) may behave differently on windows; add platform-specific assertions (lib/accounts.ts:line, test/accounts.test.ts:line).
concurrency risks: concurrent calls to commitRefreshedAuth and debounced saveToDiskDebounced introduce race conditions; verify transactional persist/lock interactions and ensure tests cover simultaneous refreshes for same account (lib/accounts.ts:line, test/accounts.test.ts:line).

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: fixing token refresh persistence during fast-expiry scenarios using transactional writes.
Description check	✅ Passed	pull request description is comprehensive and well-structured, covering summary, what changed, validation checklist, and risk/rollback plan with specific commit reference.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/fast-expiry-refresh-persistence

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch fix/fast-expiry-refresh-persistence

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

index.ts (2)
1001-1016: ⚠️ Potential issue | 🟠 Major

don't count auth-store write failures as auth failures.

commitRefreshedAuth() persists through withAccountStorageTransaction() in lib/accounts.ts:795-847, but this branch only treats CodexAuthError.retryable as retryable. if the transactional write throws under concurrent refreshes/live-sync writes, or on windows file-lock / eperm / ebusy rename races, we still increment auth failures here and can eventually remove a healthy account. please route storage persistence errors through the retryable path and add a regression for a throwing commit call.
suggested fix
-											const isRetryableRefreshError =
-												err instanceof CodexAuthError && err.retryable;
+											const isRetryableRefreshError =
+												(err instanceof CodexAuthError && err.retryable) ||
+												err instanceof StorageError;
if withAccountStorageTransaction() can throw raw fs errors too, normalize them in lib/accounts.ts:795-847 before they reach this branch.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@index.ts` around lines 1001 - 1016, The catch block after calling
commitRefreshedAuth should not count storage persistence errors as final auth
failures; update the retryable detection so that errors thrown by
commitRefreshedAuth/withAccountStorageTransaction (e.g., raw fs errors like
EPERM/EBUSY or normalized storage errors) are treated as retryable in addition
to CodexAuthError.retryable—adjust the condition around isRetryableRefreshError
and the logic that increments
runtimeMetrics.authRefreshFailures/failedRequests/accountRotations/lastError
accordingly so persistence errors route to the retry path; also add a regression
test that makes commitRefreshedAuth throw (simulate concurrent write/fs errors)
to assert the code retries instead of removing the account; consider normalizing
storage errors inside withAccountStorageTransaction (lib/accounts.ts functions)
so they can be reliably recognized here.
1976-1995: ⚠️ Potential issue | 🟠 Major

don't let a persistence blip disqualify a valid stream failover account.

by the time commitRefreshedAuth() runs, refreshAndUpdateToken() has already produced usable fallbackAuth. a transient store failure from lib/accounts.ts:795-847 now makes this catch skip the account entirely, so concurrent storage contention or windows file-locking can turn a recoverable stream stall into a failed failover. please keep the refreshed token usable for the in-flight retry, and add a regression that forces commitRefreshedAuth() to throw during stream failover.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@index.ts` around lines 1976 - 1995, The current failover loop discards a
usable refreshed token if accountManager.commitRefreshedAuth(fallbackAccount,
fallbackAuth) throws, causing a transient persistence error to skip the account;
change the flow so refreshAndUpdateToken(...) still returns and is used for the
in-flight retry even if commitRefreshedAuth throws: call commitRefreshedAuth in
its own try/catch, log the persistence error but do NOT continue to the next
account on commit failure, and proceed to use fallbackAuth for the subsequent
retry logic; also add a regression test that forces
accountManager.commitRefreshedAuth to throw during stream failover to ensure the
refreshed fallbackAuth is still accepted and retried.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/accounts.ts`:
- Around line 815-818: Replace raw email logging in the auth-rotation failure
warn paths: remove source.email from the log.warn calls (the occurrences that
reference source.email alongside source.index) and either log only sourceIndex
(source.index) or log a redacted/hash of the email (e.g., a stable hash or
masked string) instead; locate the calls to log.warn in lib/accounts.ts that
include source.email (the warn invocations around the auth refresh/persistence
failure) and update them to avoid emitting raw PII while preserving sourceIndex
for correlation.
- Around line 795-864: commitRefreshedAuth currently persists refreshed auth
inside withAccountStorageTransaction (persist) but updates the in-memory account
afterwards, allowing a concurrent saveToDiskDebounced to snapshot stale data and
overwrite the persisted refresh; fix by making the in-memory update atomic with
the disk write: perform the live-account mutation (use getAccountByIdentity +
updateFromAuth + clearAccountCooldown + clearAuthFailures and set enabled)
inside the same withAccountStorageTransaction callback before calling persist
(or extend withAccountStorageTransaction/persist to accept and run a
post-persist-in-memory update atomically), so no interleaving save can observe
the pre-refresh state. Also add a vitest that blocks persist() (e.g., stub
persist to await a promise) and triggers saveToDiskDebounced() to assert no
stale write occurs, and ensure the persistence/queue code (saveToDiskDebounced
and any retry queues) properly handles EBUSY/429 retry semantics per lib/**
guidelines.

In `@lib/refresh-guardian.ts`:
- Around line 119-133: The code mutates in-memory account state
(applyRefreshResult and manager.clearAuthFailures) before persisting via
manager.commitRefreshedAuth, risking memory/disk divergence if
commitRefreshedAuth throws (EBUSY). Change the flow in refresh-guardian (around
applyRefreshResult/getAccountByIdentity/commitRefreshedAuth) so that you do not
call applyRefreshResult or clearAuthFailures until after commitRefreshedAuth
resolves; instead build the refreshedAuth object, call await
manager.commitRefreshedAuth(sourceAccount, refreshedAuth), then locate the live
account with manager.getAccountByIdentity and call applyRefreshResult(account,
result.tokenResult) and manager.clearAuthFailures(account) only on successful
commit; treat commit failures as retryable and ensure stats.refreshed is
incremented only after successful commit; update tests
test/refresh-guardian.test.ts and test/proactive-refresh.test.ts to simulate a
rejecting commitRefreshedAuth (EBUSY) and assert no in-memory mutation and
correct retry classification so 429/network items continue processing.

In `@lib/request/fetch-helpers.ts`:
- Around line 431-449: The classifier is marking "missing_refresh" as retryable
causing possible infinite retry loops; update isRetryableRefreshFailure (in
lib/request/fetch-helpers.ts) to treat "missing_refresh" as terminal (i.e.,
return false) instead of retryable, and ensure behavior aligns with
lib/refresh-guardian.ts's handling of auth-failure; add/adjust vitest in
test/fetch-helpers.test.ts to assert that a TokenResult with reason
"missing_refresh" is not retryable and cite that change in the test message.

---

Outside diff comments:
In `@index.ts`:
- Around line 1001-1016: The catch block after calling commitRefreshedAuth
should not count storage persistence errors as final auth failures; update the
retryable detection so that errors thrown by
commitRefreshedAuth/withAccountStorageTransaction (e.g., raw fs errors like
EPERM/EBUSY or normalized storage errors) are treated as retryable in addition
to CodexAuthError.retryable—adjust the condition around isRetryableRefreshError
and the logic that increments
runtimeMetrics.authRefreshFailures/failedRequests/accountRotations/lastError
accordingly so persistence errors route to the retry path; also add a regression
test that makes commitRefreshedAuth throw (simulate concurrent write/fs errors)
to assert the code retries instead of removing the account; consider normalizing
storage errors inside withAccountStorageTransaction (lib/accounts.ts functions)
so they can be reliably recognized here.
- Around line 1976-1995: The current failover loop discards a usable refreshed
token if accountManager.commitRefreshedAuth(fallbackAccount, fallbackAuth)
throws, causing a transient persistence error to skip the account; change the
flow so refreshAndUpdateToken(...) still returns and is used for the in-flight
retry even if commitRefreshedAuth throws: call commitRefreshedAuth in its own
try/catch, log the persistence error but do NOT continue to the next account on
commit failure, and proceed to use fallbackAuth for the subsequent retry logic;
also add a regression test that forces accountManager.commitRefreshedAuth to
throw during stream failover to ensure the refreshed fallbackAuth is still
accepted and retried.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8f022587-4220-411b-981e-fde3698e8bbc

📥 Commits

Reviewing files that changed from the base of the PR and between 53488ef and 9cbc72a.

📒 Files selected for processing (9)

index.ts
lib/accounts.ts
lib/proactive-refresh.ts
lib/refresh-guardian.ts
lib/request/fetch-helpers.ts
test/accounts.test.ts
test/fetch-helpers.test.ts
test/proactive-refresh.test.ts
test/refresh-guardian.test.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (2)

test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

test/proactive-refresh.test.ts
test/refresh-guardian.test.ts
test/fetch-helpers.test.ts
test/accounts.test.ts

lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

lib/proactive-refresh.ts
lib/request/fetch-helpers.ts
lib/refresh-guardian.ts
lib/accounts.ts

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/accounts.ts`:
- Around line 807-848: The transactional refresh currently clones the persisted
"current" snapshot and writes it back, which overwrites in-memory live mutation
(e.g., activeIndexByFamily, lastUsed, rate-limit/cooldown, workspace changes);
update the withAccountStorageTransaction callback in lib/accounts.ts so it
starts from this.buildStorageSnapshot() (or merges fields from current into that
live snapshot) before patching the refreshed row (findAccountIndexByIdentity,
storedAccount.* updates) and calling persist, ensuring only auth fields change;
add a vitest near test/accounts.test.ts:1203 that mutates activeIndexByFamily
and a rate-limit/cooldown field in-memory, calls the refresh helper
(commitRefreshedAuth/withAccountStorageTransaction flow), and asserts those
unsaved live changes survive the commit; run/update tests to verify and ensure
any new persistence/queue logic handles EBUSY and 429 retry semantics as per
guidelines.

In `@lib/refresh-guardian.ts`:
- Around line 126-142: commitRefreshedAuth can return ManagedAccount|null but
refresh-guardian.ts treats any non-exception path as success; change the
post-commit logic in refresh-guardian.ts so that after calling
manager.commitRefreshedAuth(sourceAccount, refreshedAuth) you check the returned
value: if null, treat it like a failure — call manager.getAccountByIdentity(...)
to locate the account, call manager.markAccountCoolingDown(account,
this.bufferMs, "network-error") when present, increment stats.failed and
stats.networkFailed, and return true (or the same boolean semantics used for
failures); only increment stats.refreshed and return success when
commitRefreshedAuth returned a non-null ManagedAccount. Also add a vitest in
test/refresh-guardian.test.ts (around the existing null-return branch near line
~638) that asserts the null-return case triggers cooling-down and increments the
failed counters, in addition to keeping the existing thrown-EBUSY test (~714);
ensure test coverage mentions EBUSY/429 queue behavior per guidelines.

In `@lib/request/fetch-helpers.ts`:
- Around line 463-465: The check for a missing or malformed authSetter
(authSetter and authSetter.set) currently throws
CodexAuthError(ERROR_MESSAGES.TOKEN_REFRESH_FAILED, { retryable: true }) but
this is a terminal configuration failure; change the thrown error's options to {
retryable: false } so the runtime surfaces the broken setup instead of retrying,
and update the vitest expectations in test/fetch-helpers.test.ts (around the
assertions at lines referenced near 80 and 93) to expect a
non-retryable/terminal CodexAuthError; ensure you run tests and confirm related
queue/retry logic still properly handles EBUSY and 429 transient cases
elsewhere.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 58b8e919-5593-4ae7-86e4-d7df85c5a3b4

📥 Commits

Reviewing files that changed from the base of the PR and between 9cbc72a and fe363b1.

📒 Files selected for processing (6)

lib/accounts.ts
lib/refresh-guardian.ts
lib/request/fetch-helpers.ts
test/accounts.test.ts
test/fetch-helpers.test.ts
test/refresh-guardian.test.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (2)

lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

lib/request/fetch-helpers.ts
lib/refresh-guardian.ts
lib/accounts.ts

test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

test/fetch-helpers.test.ts
test/refresh-guardian.test.ts
test/accounts.test.ts

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/accounts.ts`:
- Around line 864-868: The catch in lib/accounts.ts that currently throws new
CodexAuthError(ERROR_MESSAGES.TOKEN_REFRESH_FAILED, { retryable: true, cause:
error }) must be changed to only mark retryable for
transient/storage-concurrency errors (e.g., Windows EBUSY/EACCES, HTTP 429 or
other transient codes) by inspecting error.code / error.status before setting
retryable; for non-transient errors throw a non-retryable CodexAuthError so they
surface as terminal auth-store failures. Update the token refresh error handling
in the same catch (the CodexAuthError construction) to branch on transient
conditions, add a vitest in test/accounts.test.ts that simulates a non-transient
write error and asserts it is not retryable, and ensure any queues that
special-case EBUSY/429 still handle those transient cases.

In `@lib/refresh-guardian.ts`:
- Around line 207-248: The second "post-pass" over refreshResults is unreachable
because refreshExpiringAccounts already awaits the onResult callback for every
refreshed account; remove the dead code that builds snapshotByIndex and the
for-loop iterating refreshResults.entries() (the block that references
snapshotByIndex, processed, and calls applyRefreshOutcome a second time).
Instead rely on the existing callback path passed into refreshExpiringAccounts
(the async onResult which calls applyRefreshOutcome), keep the initial
requiresSave/stat updates and return logic, and delete the snapshotByIndex/map
construction and the duplicate applyRefreshOutcome calls to avoid the
redundant/unreachable path.

In `@test/refresh-guardian.test.ts`:
- Around line 23-50: The test helper findAccountByIdentity currently matches
accounts by index, then refreshToken, then accountId/email which diverges from
the identity-resolution used in lib/accounts.ts; update findAccountByIdentity to
mirror the same identity matching logic used in lib/accounts.ts (use the same
field priority and comparison rules from the resolution routine in
lib/accounts.ts:144-160) so the test selects the exact live account used by
production code—locate the findAccountByIdentity function in the test and change
its matching order/conditions to match the identity resolution in
lib/accounts.ts to ensure the reindex refresh-race scenario is reproduced
deterministically.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 33737776-7244-4de5-899c-8af0acd026b3

📥 Commits

Reviewing files that changed from the base of the PR and between fe363b1 and 52c7c8b.

📒 Files selected for processing (6)

lib/accounts.ts
lib/refresh-guardian.ts
lib/request/fetch-helpers.ts
test/accounts.test.ts
test/fetch-helpers.test.ts
test/refresh-guardian.test.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (2)

lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

lib/request/fetch-helpers.ts
lib/refresh-guardian.ts
lib/accounts.ts

test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

test/fetch-helpers.test.ts
test/accounts.test.ts
test/refresh-guardian.test.ts

🔇 Additional comments (3)

lib/request/fetch-helpers.ts (1)

431-479: the refresh-failure split matches the runtime policy.

lib/request/fetch-helpers.ts:431-479 now makes missing_refresh terminal while keeping network and 429-style failures retryable. test/fetch-helpers.test.ts:106-153 exercises that split, and it lines up with the retryable vs auth-failure branches in index.ts:1015-1082.

As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.

test/accounts.test.ts (1)

1077-1368: good regression coverage for the refresh persistence race.

test/accounts.test.ts:1077-1368 covers the live-state merge, the windows-style ebusy failure, and the blocked-persist/debounced-save interleave from lib/accounts.ts. that is the right vitest net for this concurrency path.

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

lib/accounts.ts (1)

762-863: building the transaction from a live snapshot is the right fix.

lib/accounts.ts:762-863 now persists refreshed auth from this.buildStorageSnapshot() instead of cloning the last saved row, and test/accounts.test.ts:1162-1368 covers both unsaved live-state preservation and the stale-save interleave.

As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.

ndycode · 2026-04-01T10:53:36Z

@coderabbitai review

coderabbitai · 2026-04-01T10:53:41Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/accounts.ts`:
- Around line 835-845: The code trims nextAccountId only when writing
nextStorage but leaves the in-memory auth claim unchanged, causing mismatch;
update the auth normalization path so the same normalized ID is used everywhere:
normalize/trim the result of extractAccountId(auth.access) early (e.g., replace
the current nextAccountId assignment) and ensure updateFromAuth (and any call
sites inside withAccountStorageTransaction, buildStorageSnapshot,
findAccountIndexByIdentity) uses that normalized value (or pass the normalized
value into updateFromAuth/commitRefreshedAuth) so memory and disk stay
identical; add a vitest named commitRefreshedAuth near
test/accounts.test.ts:1077-1230 (and reference test/accounts.test.ts:1702-1736)
exercising a token with padded ID, and when modifying persistence flows verify
Windows IO and concurrency helpers/queues handle EBUSY/429 retry/backoff per
lib/** guidelines.

In `@lib/refresh-guardian.ts`:
- Around line 146-157: The catch block around commitRefreshedAuth in
refresh-guardian.ts incorrectly treats all persist errors as transient network
failures; update the handler to inspect the thrown error (specifically check if
it's a CodexAuthError and use its retryable boolean) and only route retryable
errors to manager.markAccountCoolingDown(...) and increment stats.networkFailed,
while non-retryable storage/persistence errors should increment stats.authFailed
(or a new terminal failure counter) and avoid placing the account back into the
cooling path; keep the existing getAccountByIdentity call and return semantics
but branch on error.retryable to decide cooling vs terminal handling. Also add a
vitest alongside test/refresh-guardian.test.ts (covering lines ~802-874) that
simulates a non-retryable persistence error (e.g., EACCES) from
commitRefreshedAuth to assert the account is not cooled, stats reflect an
auth/storage failure, and no retry loop is scheduled. Ensure references to
commitRefreshedAuth, manager.markAccountCoolingDown, CodexAuthError.retryable,
and the affected stats properties are updated accordingly.

In `@test/refresh-guardian.test.ts`:
- Around line 2-8: The test imports the OAuthAuthDetails type from the wrong
module; replace the import of OAuthAuthDetails that currently targets
lib/accounts.js with an import from lib/types.js so the test references the type
as exported by types (update the import statement that mentions OAuthAuthDetails
accordingly and keep the other imports unchanged).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 31188072-b8f0-41fb-95db-abe0872fbfbf

📥 Commits

Reviewing files that changed from the base of the PR and between 52c7c8b and fe99c82.

📒 Files selected for processing (6)

lib/accounts.ts
lib/refresh-guardian.ts
lib/request/fetch-helpers.ts
test/accounts.test.ts
test/fetch-helpers.test.ts
test/refresh-guardian.test.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (2)

lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

lib/request/fetch-helpers.ts
lib/refresh-guardian.ts
lib/accounts.ts

test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

test/fetch-helpers.test.ts
test/accounts.test.ts
test/refresh-guardian.test.ts

fix: harden fast-expiry token refresh persistence

9cbc72a

greptile-apps Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread lib/request/fetch-helpers.ts Outdated

Comment thread lib/accounts.ts Outdated

coderabbitai Bot requested changes Mar 31, 2026

View reviewed changes

Comment thread lib/accounts.ts

Comment thread lib/accounts.ts Outdated

Comment thread lib/refresh-guardian.ts

Comment thread lib/request/fetch-helpers.ts

fix: harden refresh persistence follow-ups

fe363b1

coderabbitai Bot requested changes Apr 1, 2026

View reviewed changes

Comment thread lib/accounts.ts Outdated

Comment thread lib/refresh-guardian.ts

Comment thread lib/request/fetch-helpers.ts

fix: preserve live refresh state and terminal auth setter failures

52c7c8b

coderabbitai Bot requested changes Apr 1, 2026

View reviewed changes

Comment thread lib/accounts.ts

Comment thread lib/refresh-guardian.ts Outdated

Comment thread lib/request/fetch-helpers.ts

Comment thread test/refresh-guardian.test.ts

fix refresh persistence review followups

fe99c82

test-align-transactional-account-persistence-mocks

412b379

coderabbitai Bot requested changes Apr 1, 2026

View reviewed changes

Comment thread lib/accounts.ts

Comment thread lib/refresh-guardian.ts

Comment thread test/refresh-guardian.test.ts Outdated

greptile-apps Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread lib/accounts.ts

ndycode added 2 commits April 1, 2026 19:40

fix-refresh-guardian-commit-failures

115979c

test: cover trimmed refreshed account ids

77922ec

coderabbitai Bot approved these changes Apr 1, 2026

View reviewed changes

ndycode merged commit 7c06d45 into main Apr 1, 2026
2 checks passed

ndycode deleted the fix/fast-expiry-refresh-persistence branch April 12, 2026 06:00

Conversation

ndycode commented Mar 31, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Validation

Docs and Governance Checklist

Risk and Rollback

Additional Notes

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (3)

Uh oh!

chatgpt-codex-connector Bot commented Mar 31, 2026

Uh oh!

coderabbitai Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

walkthrough

changes

sequence diagram(s)

estimated code review effort

suggested labels

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ndycode commented Apr 1, 2026

Uh oh!

coderabbitai Bot commented Apr 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ndycode commented Mar 31, 2026 •

edited by greptile-apps Bot

Loading

note: greptile review for oc-chatgpt-multi-auth. cite files like `lib/foo.ts:123`. confirm regression tests + windows concurrency/token redaction coverage.

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading