fix fast-expiry token refresh persistence in account pool#323
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 Walkthroughwalkthroughthe pr separates retryable vs terminal oauth refresh failures, adds identity-based account lookup and transactional commit of refreshed auth, and propagates refresh outcomes through proactive refresh and refresh guardian paths. changes
sequence diagram(s)sequenceDiagram
actor client
participant fetch as "lib/request/fetch-helpers.ts"
participant oauth as "oauth service"
participant guardian as "lib/refresh-guardian.ts"
participant manager as "lib/accounts.ts"
participant storage as "storage"
client->>fetch: refreshAndUpdateToken(...)
fetch->>oauth: queuedRefresh()
oauth-->>fetch: result (success | failed {reason,statusCode})
alt success
fetch-->>client: refreshed auth
client->>manager: commitRefreshedAuth(account, auth)
manager->>storage: withAccountStorageTransaction(handler)
storage-->>manager: persist(snapshot)
manager-->>client: updated account
else failed (retryable)
fetch-->>client: throw CodexAuthError(retryable: true)
client->>guardian: treat as network failure
guardian->>manager: forget session affinity immediately
guardian->>guardian: do not increment auth-failure counters
else failed (terminal)
fetch-->>client: throw CodexAuthError(retryable: false)
client->>guardian: treat as auth failure
guardian->>manager: incrementAuthFailures()
guardian->>manager: forget session affinity
end
estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes suggested labels
notes and risks:
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
index.ts (2)
1001-1016:⚠️ Potential issue | 🟠 Majordon't count auth-store write failures as auth failures.
commitRefreshedAuth()persists throughwithAccountStorageTransaction()inlib/accounts.ts:795-847, but this branch only treatsCodexAuthError.retryableas retryable. if the transactional write throws under concurrent refreshes/live-sync writes, or on windows file-lock /eperm/ebusyrename races, we still increment auth failures here and can eventually remove a healthy account. please route storage persistence errors through the retryable path and add a regression for a throwing commit call.suggested fix
- const isRetryableRefreshError = - err instanceof CodexAuthError && err.retryable; + const isRetryableRefreshError = + (err instanceof CodexAuthError && err.retryable) || + err instanceof StorageError;if
withAccountStorageTransaction()can throw raw fs errors too, normalize them inlib/accounts.ts:795-847before they reach this branch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@index.ts` around lines 1001 - 1016, The catch block after calling commitRefreshedAuth should not count storage persistence errors as final auth failures; update the retryable detection so that errors thrown by commitRefreshedAuth/withAccountStorageTransaction (e.g., raw fs errors like EPERM/EBUSY or normalized storage errors) are treated as retryable in addition to CodexAuthError.retryable—adjust the condition around isRetryableRefreshError and the logic that increments runtimeMetrics.authRefreshFailures/failedRequests/accountRotations/lastError accordingly so persistence errors route to the retry path; also add a regression test that makes commitRefreshedAuth throw (simulate concurrent write/fs errors) to assert the code retries instead of removing the account; consider normalizing storage errors inside withAccountStorageTransaction (lib/accounts.ts functions) so they can be reliably recognized here.
1976-1995:⚠️ Potential issue | 🟠 Majordon't let a persistence blip disqualify a valid stream failover account.
by the time
commitRefreshedAuth()runs,refreshAndUpdateToken()has already produced usablefallbackAuth. a transient store failure fromlib/accounts.ts:795-847now makes this catch skip the account entirely, so concurrent storage contention or windows file-locking can turn a recoverable stream stall into a failed failover. please keep the refreshed token usable for the in-flight retry, and add a regression that forcescommitRefreshedAuth()to throw during stream failover.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@index.ts` around lines 1976 - 1995, The current failover loop discards a usable refreshed token if accountManager.commitRefreshedAuth(fallbackAccount, fallbackAuth) throws, causing a transient persistence error to skip the account; change the flow so refreshAndUpdateToken(...) still returns and is used for the in-flight retry even if commitRefreshedAuth throws: call commitRefreshedAuth in its own try/catch, log the persistence error but do NOT continue to the next account on commit failure, and proceed to use fallbackAuth for the subsequent retry logic; also add a regression test that forces accountManager.commitRefreshedAuth to throw during stream failover to ensure the refreshed fallbackAuth is still accepted and retried.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/accounts.ts`:
- Around line 815-818: Replace raw email logging in the auth-rotation failure
warn paths: remove source.email from the log.warn calls (the occurrences that
reference source.email alongside source.index) and either log only sourceIndex
(source.index) or log a redacted/hash of the email (e.g., a stable hash or
masked string) instead; locate the calls to log.warn in lib/accounts.ts that
include source.email (the warn invocations around the auth refresh/persistence
failure) and update them to avoid emitting raw PII while preserving sourceIndex
for correlation.
- Around line 795-864: commitRefreshedAuth currently persists refreshed auth
inside withAccountStorageTransaction (persist) but updates the in-memory account
afterwards, allowing a concurrent saveToDiskDebounced to snapshot stale data and
overwrite the persisted refresh; fix by making the in-memory update atomic with
the disk write: perform the live-account mutation (use getAccountByIdentity +
updateFromAuth + clearAccountCooldown + clearAuthFailures and set enabled)
inside the same withAccountStorageTransaction callback before calling persist
(or extend withAccountStorageTransaction/persist to accept and run a
post-persist-in-memory update atomically), so no interleaving save can observe
the pre-refresh state. Also add a vitest that blocks persist() (e.g., stub
persist to await a promise) and triggers saveToDiskDebounced() to assert no
stale write occurs, and ensure the persistence/queue code (saveToDiskDebounced
and any retry queues) properly handles EBUSY/429 retry semantics per lib/**
guidelines.
In `@lib/refresh-guardian.ts`:
- Around line 119-133: The code mutates in-memory account state
(applyRefreshResult and manager.clearAuthFailures) before persisting via
manager.commitRefreshedAuth, risking memory/disk divergence if
commitRefreshedAuth throws (EBUSY). Change the flow in refresh-guardian (around
applyRefreshResult/getAccountByIdentity/commitRefreshedAuth) so that you do not
call applyRefreshResult or clearAuthFailures until after commitRefreshedAuth
resolves; instead build the refreshedAuth object, call await
manager.commitRefreshedAuth(sourceAccount, refreshedAuth), then locate the live
account with manager.getAccountByIdentity and call applyRefreshResult(account,
result.tokenResult) and manager.clearAuthFailures(account) only on successful
commit; treat commit failures as retryable and ensure stats.refreshed is
incremented only after successful commit; update tests
test/refresh-guardian.test.ts and test/proactive-refresh.test.ts to simulate a
rejecting commitRefreshedAuth (EBUSY) and assert no in-memory mutation and
correct retry classification so 429/network items continue processing.
In `@lib/request/fetch-helpers.ts`:
- Around line 431-449: The classifier is marking "missing_refresh" as retryable
causing possible infinite retry loops; update isRetryableRefreshFailure (in
lib/request/fetch-helpers.ts) to treat "missing_refresh" as terminal (i.e.,
return false) instead of retryable, and ensure behavior aligns with
lib/refresh-guardian.ts's handling of auth-failure; add/adjust vitest in
test/fetch-helpers.test.ts to assert that a TokenResult with reason
"missing_refresh" is not retryable and cite that change in the test message.
---
Outside diff comments:
In `@index.ts`:
- Around line 1001-1016: The catch block after calling commitRefreshedAuth
should not count storage persistence errors as final auth failures; update the
retryable detection so that errors thrown by
commitRefreshedAuth/withAccountStorageTransaction (e.g., raw fs errors like
EPERM/EBUSY or normalized storage errors) are treated as retryable in addition
to CodexAuthError.retryable—adjust the condition around isRetryableRefreshError
and the logic that increments
runtimeMetrics.authRefreshFailures/failedRequests/accountRotations/lastError
accordingly so persistence errors route to the retry path; also add a regression
test that makes commitRefreshedAuth throw (simulate concurrent write/fs errors)
to assert the code retries instead of removing the account; consider normalizing
storage errors inside withAccountStorageTransaction (lib/accounts.ts functions)
so they can be reliably recognized here.
- Around line 1976-1995: The current failover loop discards a usable refreshed
token if accountManager.commitRefreshedAuth(fallbackAccount, fallbackAuth)
throws, causing a transient persistence error to skip the account; change the
flow so refreshAndUpdateToken(...) still returns and is used for the in-flight
retry even if commitRefreshedAuth throws: call commitRefreshedAuth in its own
try/catch, log the persistence error but do NOT continue to the next account on
commit failure, and proceed to use fallbackAuth for the subsequent retry logic;
also add a regression test that forces accountManager.commitRefreshedAuth to
throw during stream failover to ensure the refreshed fallbackAuth is still
accepted and retried.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8f022587-4220-411b-981e-fde3698e8bbc
📒 Files selected for processing (9)
index.tslib/accounts.tslib/proactive-refresh.tslib/refresh-guardian.tslib/request/fetch-helpers.tstest/accounts.test.tstest/fetch-helpers.test.tstest/proactive-refresh.test.tstest/refresh-guardian.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/proactive-refresh.test.tstest/refresh-guardian.test.tstest/fetch-helpers.test.tstest/accounts.test.ts
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/proactive-refresh.tslib/request/fetch-helpers.tslib/refresh-guardian.tslib/accounts.ts
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/accounts.ts`:
- Around line 807-848: The transactional refresh currently clones the persisted
"current" snapshot and writes it back, which overwrites in-memory live mutation
(e.g., activeIndexByFamily, lastUsed, rate-limit/cooldown, workspace changes);
update the withAccountStorageTransaction callback in lib/accounts.ts so it
starts from this.buildStorageSnapshot() (or merges fields from current into that
live snapshot) before patching the refreshed row (findAccountIndexByIdentity,
storedAccount.* updates) and calling persist, ensuring only auth fields change;
add a vitest near test/accounts.test.ts:1203 that mutates activeIndexByFamily
and a rate-limit/cooldown field in-memory, calls the refresh helper
(commitRefreshedAuth/withAccountStorageTransaction flow), and asserts those
unsaved live changes survive the commit; run/update tests to verify and ensure
any new persistence/queue logic handles EBUSY and 429 retry semantics as per
guidelines.
In `@lib/refresh-guardian.ts`:
- Around line 126-142: commitRefreshedAuth can return ManagedAccount|null but
refresh-guardian.ts treats any non-exception path as success; change the
post-commit logic in refresh-guardian.ts so that after calling
manager.commitRefreshedAuth(sourceAccount, refreshedAuth) you check the returned
value: if null, treat it like a failure — call manager.getAccountByIdentity(...)
to locate the account, call manager.markAccountCoolingDown(account,
this.bufferMs, "network-error") when present, increment stats.failed and
stats.networkFailed, and return true (or the same boolean semantics used for
failures); only increment stats.refreshed and return success when
commitRefreshedAuth returned a non-null ManagedAccount. Also add a vitest in
test/refresh-guardian.test.ts (around the existing null-return branch near line
~638) that asserts the null-return case triggers cooling-down and increments the
failed counters, in addition to keeping the existing thrown-EBUSY test (~714);
ensure test coverage mentions EBUSY/429 queue behavior per guidelines.
In `@lib/request/fetch-helpers.ts`:
- Around line 463-465: The check for a missing or malformed authSetter
(authSetter and authSetter.set) currently throws
CodexAuthError(ERROR_MESSAGES.TOKEN_REFRESH_FAILED, { retryable: true }) but
this is a terminal configuration failure; change the thrown error's options to {
retryable: false } so the runtime surfaces the broken setup instead of retrying,
and update the vitest expectations in test/fetch-helpers.test.ts (around the
assertions at lines referenced near 80 and 93) to expect a
non-retryable/terminal CodexAuthError; ensure you run tests and confirm related
queue/retry logic still properly handles EBUSY and 429 transient cases
elsewhere.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 58b8e919-5593-4ae7-86e4-d7df85c5a3b4
📒 Files selected for processing (6)
lib/accounts.tslib/refresh-guardian.tslib/request/fetch-helpers.tstest/accounts.test.tstest/fetch-helpers.test.tstest/refresh-guardian.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/request/fetch-helpers.tslib/refresh-guardian.tslib/accounts.ts
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/fetch-helpers.test.tstest/refresh-guardian.test.tstest/accounts.test.ts
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/accounts.ts`:
- Around line 864-868: The catch in lib/accounts.ts that currently throws new
CodexAuthError(ERROR_MESSAGES.TOKEN_REFRESH_FAILED, { retryable: true, cause:
error }) must be changed to only mark retryable for
transient/storage-concurrency errors (e.g., Windows EBUSY/EACCES, HTTP 429 or
other transient codes) by inspecting error.code / error.status before setting
retryable; for non-transient errors throw a non-retryable CodexAuthError so they
surface as terminal auth-store failures. Update the token refresh error handling
in the same catch (the CodexAuthError construction) to branch on transient
conditions, add a vitest in test/accounts.test.ts that simulates a non-transient
write error and asserts it is not retryable, and ensure any queues that
special-case EBUSY/429 still handle those transient cases.
In `@lib/refresh-guardian.ts`:
- Around line 207-248: The second "post-pass" over refreshResults is unreachable
because refreshExpiringAccounts already awaits the onResult callback for every
refreshed account; remove the dead code that builds snapshotByIndex and the
for-loop iterating refreshResults.entries() (the block that references
snapshotByIndex, processed, and calls applyRefreshOutcome a second time).
Instead rely on the existing callback path passed into refreshExpiringAccounts
(the async onResult which calls applyRefreshOutcome), keep the initial
requiresSave/stat updates and return logic, and delete the snapshotByIndex/map
construction and the duplicate applyRefreshOutcome calls to avoid the
redundant/unreachable path.
In `@test/refresh-guardian.test.ts`:
- Around line 23-50: The test helper findAccountByIdentity currently matches
accounts by index, then refreshToken, then accountId/email which diverges from
the identity-resolution used in lib/accounts.ts; update findAccountByIdentity to
mirror the same identity matching logic used in lib/accounts.ts (use the same
field priority and comparison rules from the resolution routine in
lib/accounts.ts:144-160) so the test selects the exact live account used by
production code—locate the findAccountByIdentity function in the test and change
its matching order/conditions to match the identity resolution in
lib/accounts.ts to ensure the reindex refresh-race scenario is reproduced
deterministically.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 33737776-7244-4de5-899c-8af0acd026b3
📒 Files selected for processing (6)
lib/accounts.tslib/refresh-guardian.tslib/request/fetch-helpers.tstest/accounts.test.tstest/fetch-helpers.test.tstest/refresh-guardian.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/request/fetch-helpers.tslib/refresh-guardian.tslib/accounts.ts
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/fetch-helpers.test.tstest/accounts.test.tstest/refresh-guardian.test.ts
🔇 Additional comments (3)
lib/request/fetch-helpers.ts (1)
431-479: the refresh-failure split matches the runtime policy.
lib/request/fetch-helpers.ts:431-479now makesmissing_refreshterminal while keeping network and 429-style failures retryable.test/fetch-helpers.test.ts:106-153exercises that split, and it lines up with the retryable vs auth-failure branches inindex.ts:1015-1082.As per coding guidelines,
lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.test/accounts.test.ts (1)
1077-1368: good regression coverage for the refresh persistence race.
test/accounts.test.ts:1077-1368covers the live-state merge, the windows-styleebusyfailure, and the blocked-persist/debounced-save interleave fromlib/accounts.ts. that is the right vitest net for this concurrency path.As per coding guidelines,
test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.lib/accounts.ts (1)
762-863: building the transaction from a live snapshot is the right fix.
lib/accounts.ts:762-863now persists refreshed auth fromthis.buildStorageSnapshot()instead of cloning the last saved row, andtest/accounts.test.ts:1162-1368covers both unsaved live-state preservation and the stale-save interleave.As per coding guidelines,
lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/accounts.ts`:
- Around line 835-845: The code trims nextAccountId only when writing
nextStorage but leaves the in-memory auth claim unchanged, causing mismatch;
update the auth normalization path so the same normalized ID is used everywhere:
normalize/trim the result of extractAccountId(auth.access) early (e.g., replace
the current nextAccountId assignment) and ensure updateFromAuth (and any call
sites inside withAccountStorageTransaction, buildStorageSnapshot,
findAccountIndexByIdentity) uses that normalized value (or pass the normalized
value into updateFromAuth/commitRefreshedAuth) so memory and disk stay
identical; add a vitest named commitRefreshedAuth near
test/accounts.test.ts:1077-1230 (and reference test/accounts.test.ts:1702-1736)
exercising a token with padded ID, and when modifying persistence flows verify
Windows IO and concurrency helpers/queues handle EBUSY/429 retry/backoff per
lib/** guidelines.
In `@lib/refresh-guardian.ts`:
- Around line 146-157: The catch block around commitRefreshedAuth in
refresh-guardian.ts incorrectly treats all persist errors as transient network
failures; update the handler to inspect the thrown error (specifically check if
it's a CodexAuthError and use its retryable boolean) and only route retryable
errors to manager.markAccountCoolingDown(...) and increment stats.networkFailed,
while non-retryable storage/persistence errors should increment stats.authFailed
(or a new terminal failure counter) and avoid placing the account back into the
cooling path; keep the existing getAccountByIdentity call and return semantics
but branch on error.retryable to decide cooling vs terminal handling. Also add a
vitest alongside test/refresh-guardian.test.ts (covering lines ~802-874) that
simulates a non-retryable persistence error (e.g., EACCES) from
commitRefreshedAuth to assert the account is not cooled, stats reflect an
auth/storage failure, and no retry loop is scheduled. Ensure references to
commitRefreshedAuth, manager.markAccountCoolingDown, CodexAuthError.retryable,
and the affected stats properties are updated accordingly.
In `@test/refresh-guardian.test.ts`:
- Around line 2-8: The test imports the OAuthAuthDetails type from the wrong
module; replace the import of OAuthAuthDetails that currently targets
lib/accounts.js with an import from lib/types.js so the test references the type
as exported by types (update the import statement that mentions OAuthAuthDetails
accordingly and keep the other imports unchanged).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 31188072-b8f0-41fb-95db-abe0872fbfbf
📒 Files selected for processing (6)
lib/accounts.tslib/refresh-guardian.tslib/request/fetch-helpers.tstest/accounts.test.tstest/fetch-helpers.test.tstest/refresh-guardian.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**
⚙️ CodeRabbit configuration file
focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.
Files:
lib/request/fetch-helpers.tslib/refresh-guardian.tslib/accounts.ts
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/fetch-helpers.test.tstest/accounts.test.tstest/refresh-guardian.test.ts
Summary
What Changed
AccountManager.commitRefreshedAuth(...)and identity-based account matching so refresh results update the correct stored row before any later pool snapshot saveValidation
npm run lintnpm run typechecknpm testnpm test -- test/documentation.test.tsnpm run buildDocs and Governance Checklist
docs/getting-started.mdupdated (if onboarding flow changed)docs/features.mdupdated (if capability surface changed)docs/reference/*pages updated (if commands/settings/paths changed)docs/upgrade.mdupdated (if migration behavior changed)SECURITY.mdandCONTRIBUTING.mdreviewed for alignmentRisk and Rollback
9cbc72aonfix/fast-expiry-refresh-persistenceor revert the PR if the new persistence path causes account rotation regressionsAdditional Notes
mainand re-ran the full validation gate on the rebased branchnote: greptile review for oc-chatgpt-multi-auth. cite files like
lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.Greptile Summary
this pr fixes fast-expiry token refresh persistence by replacing the ad-hoc
updateFromAuth+clearAuthFailures+saveToDiskDebouncedtriple with a singlecommitRefreshedAuthcall that runs underwithStorageLock, builds a fresh pool snapshot, updates both the on-disk record and the live in-memory account atomically, and rolls back in-memory changes if the write fails.key changes:
AccountManager.commitRefreshedAuth— transactional persist with in-memory rollback on storage failure; identity-based account matching so the correct row is updated even if the index has shiftedindex.ts— main loop and stream failover paths usecommitRefreshedAuth; transient refresh/persistence errors (EAGAIN,EBUSY,EPERM,network_error) now trigger a network-class cooldown+continue instead ofincrementAuthFailureslib/proactive-refresh.ts—refreshExpiringAccountsgains anonResultcallback so the guardian can commit each refresh transactionally as results arrive in parallel;shouldRefreshProactivelynow correctly returnstruewhen the access token is missing even ifexpiresis undefinedlib/refresh-guardian.ts— pre-filters cooling-down accounts, routes success results throughcommitRefreshedAuth, and only triggerssaveToDiskDebouncedfor failure-cooldown writes (not for already-persisted successes)lib/request/fetch-helpers.ts—isRetryableRefreshFailureclassifiesnetwork_error/unknown/invalid_responseas retryable and 400/401/403http_erroras terminal;missing_refreshstays terminalConfidence Score: 4/5
solid approach with one P1 data-integrity edge case in
commitRefreshedAuthwhere tokens are written to disk without in-memory update whenliveAccountlookup fails post-snapshotthe lock-based transactional commit, retryable error classification, and rollback logic are all well-designed and well-tested. previous P1 concerns (missing_refresh as retryable, storage write triggering incrementAuthFailures on healthy accounts) are resolved. the remaining P1 is the "disk write proceeds but live pool not updated" path in
commitRefreshedAuthwhengetAccountByIdentityreturns null afterstorageIndexwas found — this leaves the guardian in a perpetual refresh loop for the affected account until restart. the duplicate retryable helper (P2) and missing concurrent-commit test (P2) don't block merge.lib/accounts.ts— specifically theawait persist(nextStorage)call at line ~401 in the "liveAccount not found" branch ofcommitRefreshedAuthImportant Files Changed
commitRefreshedAuthwith transactional disk write, in-memory rollback on persist failure, and identity-based account matching; one P1 issue (disk write proceeds without in-memory update whenliveAccountlookup fails post-snapshot) and one P2 (duplicate retryable error checker)commitRefreshedAuth; retryable vs terminal refresh errors classified correctly;sessionAffinityStoreforget moved before the retryable branch — looks correctonResultcallback torefreshExpiringAccounts, reordersshouldRefreshProactivelychecks so missing access-token triggers refresh even without expiry, extendsapplyRefreshResultwith identity-field updates — all changes look correctapplyRefreshResult+clearAuthFailureswithcommitRefreshedAuthvia newapplyRefreshOutcomehelper; pre-filters cooling-down accounts before passing torefreshExpiringAccounts;requiresSaveflag correctly distinguishes failure cooldowns (debounced save) from successful commits (already persisted)isRetryableRefreshFailureandisRetryableAuthSetterError; wrapsauthSetter.setfailure inCodexAuthError;missing_refreshcorrectly returnsfalse(terminal) — logic is sound butisRetryableAuthSetterErrorduplicatesisRetryableAuthPersistenceErrorin accounts.tscommitRefreshedAuthsuite covers transactional write, rollback on EBUSY, terminal EACCES, pool-state preservation, and debounced-save race; missing test for two concurrent commits on different accountscommitRefreshedAuthmock instead ofapplyRefreshResult; adds cooling-down skip test, null-commit cooldown test, and commit error classification; coverage looks solidonResultinvocation test and identity-field update test forapplyRefreshResult; also adds the access-missing-without-expiry path forshouldRefreshProactively— good coveragewithAccountStorageTransactionmock into edge-case setup; no functional change to test logicvi.hoistedmocks and plumbswithAccountStorageTransaction; assertions updated to use named mocks — cleaner setup, no logic changeSequence Diagram
sequenceDiagram participant RL as RequestLoop / Guardian participant CM as commitRefreshedAuth participant Lock as withStorageLock participant Mem as LivePool (in-memory) participant Disk as Storage (disk) RL->>CM: commitRefreshedAuth(source, auth) CM->>Lock: acquire lock Lock-->>CM: lock held CM->>CM: buildStorageSnapshot() → nextStorage CM->>CM: findAccountIndexByIdentity(nextStorage, source, auth) CM->>CM: update storedAccount in nextStorage (tokens, identity) CM->>Mem: getAccountByIdentity(source, auth) → liveAccount CM->>Mem: updateFromAuth(liveAccount, auth) CM->>Mem: clearAuthFailures / clearCooldown / enabled=true CM->>Disk: persist(nextStorage) alt persist succeeds Disk-->>CM: ok CM->>Lock: release lock CM-->>RL: return liveAccount else persist fails (EBUSY/EPERM) Disk-->>CM: error CM->>Mem: rollback liveAccount to previousState CM->>Lock: release lock CM-->>RL: throw CodexAuthError(retryable=true) else persist fails (EACCES/other) Disk-->>CM: error CM->>Mem: rollback liveAccount to previousState CM->>Lock: release lock CM-->>RL: throw CodexAuthError(retryable=false) endComments Outside Diff (3)
lib/refresh-guardian.ts, line 463-477 (link)in the
"success"branch ofapplyRefreshOutcome, the code:applyRefreshResult(account, result.tokenResult)which writesaccess,refresh,expires,accountId,emailto the live accountmanager.clearAuthFailures(account)on the live accountawait manager.commitRefreshedAuth(sourceAccount, refreshedAuth)which callsthis.updateFromAuth(liveAccount, auth)+this.clearAuthFailures(liveAccount)+this.clearAccountCooldown(liveAccount)on the same account objectsteps 1-2 are fully superseded by step 3's internal calls. consider removing the pre-commit
applyRefreshResult/clearAuthFailureslines and relying solely oncommitRefreshedAuthfor consistency.Prompt To Fix With AI
test/accounts.test.ts, line 718-802 (link)commitRefreshedAuthstorage write failure paththe test suite covers the happy path but has no case where
withAccountStorageTransactionrejects. given the windows EBUSY concern and the p1 issue where a storage throw incorrectly routes throughincrementAuthFailures, a test verifying the error propagation (and the expected type/retryable flag once the fix is applied) would close the gap. add a case like:Prompt To Fix With AI
lib/accounts.ts, line 400-406 (link)when
storageIndexis found in the snapshot butgetAccountByIdentityreturnsnull(account removed from live pool between snapshot and lookup),persist(nextStorage)is still called — writing the new tokens to disk — but in-memorythis.accountsis never updated. the live pool carries the old expiry/access token for the remainder of the process lifetime; the account will keep triggering proactive refresh on every guardian tick until restart.the warning log covers observability, but the correct action here is to skip the write (or roll back), not silently persist tokens that can't be applied in-memory. at minimum, the
persiston the "no live account" path should be gated the same way as the guarded path, or the caller should be notified the commit was a no-op.Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (6): Last reviewed commit: "test: cover trimmed refreshed account id..." | Re-trigger Greptile