fix(auth): serialize OAuth refresh across agents to fix #26322 by visionik · Pull Request #67876 · openclaw/openclaw

visionik · 2026-04-17T00:52:30Z

Problem

When N agents share a single OAuth profile (e.g. 18 codex agents on openai-codex:default), the pre-fix implementation held a per-agentDir file lock during refresh. The lock paths are distinct per agent, so concurrent agents don't serialize at all. When the access token expires, all N agents refresh in parallel; OpenAI rotates the refresh token on every successful call, so the losers get HTTP 401 refresh_token_reused and cascade to model fallback even though auth is actually valid.

Observed in production: 499 refresh failures / 24h, codex unusable without manual re-auth, token-expiry storm every ~12h (#26322).

Fix

Five coordinated mechanisms in src/agents/auth-profiles/:

Cross-agent file lock keyed on (provider, profileId). refreshOAuthTokenWithLock now acquires $STATE_DIR/locks/oauth-refresh/sha256-<hex64> where the hash input is provider\0profileId (NUL-separated so concat-boundary collisions are impossible). All agents sharing a provider+profileId serialize on this single coordination point, cross-process. Two profiles that happen to share a profileId across different providers get distinct locks, so they don't needlessly serialize or risk file-lock timeout against each other.
Nested per-store lock for the refresh critical section. Inside the global refresh lock, the refresh also acquires the existing AUTH_STORE_LOCK_OPTIONS lock on the agent's own auth-profiles.json. Lock acquisition order is always refresh → per-store; non-refresh code only takes the per-store lock, so no cycle is possible. This stops the refresh's read-modify-writes from losing updates against concurrent updateAuthProfileStoreWithLock callers.
In-process refresh queue. A Map<provider\0profileId, Promise> chains same-PID callers so the re-entrant withFileLock cannot let two concurrent callers slip through in the same process (HELD_LOCKS short-circuits on PID match). Tested with a 10-caller FIFO stress test asserting maxInFlight === 1.
Inside-the-lock main-store recheck + mirror-to-main. After acquiring both locks, doRefreshOAuthTokenWithLock first checks if the main-agent store already has fresh credentials (another agent may have just completed and mirrored) and adopts if so — turning N serialized refreshes into 1 refresh + (N-1) adoptions. After a successful refresh from a secondary agent, the new credential is mirrored back into the main-agent store while still inside the refresh lock, closing the cross-process race window where a peer could acquire the refresh lock between our lock release and our main-store write. Mirror writes also take the main-store lock via updateAuthProfileStoreWithLock so the read-modify-write cannot race with other main-store writers.
Identity-gated credential copying. All four copy sites (pre-refresh adopt, inside-lock adopt, catch-block main-inherit, mirror-to-main) go through a single isSafeToCopyOAuthIdentity gate. The gate accepts matching accountId/email, accepts the pure upgrade case (existing has no identity, incoming does), and refuses positive mismatch, identity regression (incoming drops an identity field existing had), and non-overlapping-field pairs. This closes the CWE-284 cross-account-leak vector that the previous provider-only adoption gate allowed.

Supporting constants and helpers

OAUTH_REFRESH_LOCK_OPTIONS: stale: 180_000 (3 min) — was 30s. A refresh that legitimately takes >30s would have allowed peers to reclaim the lock and reintroduce the storm.
OAUTH_REFRESH_CALL_TIMEOUT_MS = 120_000: hard deadline on the refresh call via withRefreshCallTimeout. The invariant OAUTH_REFRESH_CALL_TIMEOUT_MS < OAUTH_REFRESH_LOCK_OPTIONS.stale is pinned by a unit test. Documented limitation: JS promises aren't cancellable, so the deadline releases our locks but the underlying work can keep running. isRefreshTokenReusedError recovery remains the backstop; full AbortSignal plumbing is a follow-up.
resolveOAuthRefreshLockPath(provider, profileId): sha256 hash input is NUL-separated for collision resistance.

Relationship to earlier PRs

fix: use global OAuth refresh lock to prevent cross-agent race (#26322) #51476 (HeroSizy): the design inspiration. That PR's head was 7,755 commits behind origin/main and the auth-profiles files had been restructured upstream (store.ts rewritten, persisted.ts / runtime-snapshots.ts / state.ts / path-resolve.ts / policy.ts split out of it). This PR is a port against the current shape of the tree, not a rebase. Credit preserved via Co-authored-by on every commit.
fix: serialize shared OAuth profile refreshes #54856: mutex keyed on authStorePath(agentDir) only — doesn't serialize across agents. Insufficient for OAuth token refresh race condition causes spurious failovers (refresh_token_reused) #26322.
fix(auth): proactively refresh OAuth token before expiry to prevent concurrent-agent race condition #60673: proactive refresh before expiry — spreads the storm instead of fixing it. Redundant once this lands.
fix/53317-gateway-overwrites-oauth-token #53754 and fix: sync runtimeAuthStoreSnapshots on save to prevent stale OAuth tokens (#55389) #55465 address distinct adjacent bugs (gateway overwrite, stale snapshot). Can stack on top.

Tests

The test surface grew significantly over the review cycle. Current inventory:

File	Tests	Purpose
`oauth.concurrent-20-agents.test.ts`	1	The #26322 regression test. 20 distinct `agentDir`s seeded with the same expired OAuth profile + `accountId`, all concurrently call `resolveApiKeyForProfile`. Asserts: exactly one refresh HTTP call, all 20 receive the same fresh apiKey. Before this fix the same test produces 20 refresh calls (19 of which 401 against a real IdP).
`oauth.mirror-refresh.test.ts`	12	Mirror-to-main happy path, mirror refuses across identity mismatch / non-oauth / fresher-expires / identity regression, plugin-refresh branch also mirrors, pre-refresh adopt happy path, catch-block main-inherit via concurrent refresh, upgrade-into-pre-capture-main.
`oauth.adopt-identity.test.ts`	9	Each of the 3 adopt sites exercised with (a) accountId mismatch → refused, (b) sub-no-identity / main-has-identity upgrade → accepted, (c) non-overlapping fields → refused.
`oauth-identity.test.ts`	51	Direct truth table + fuzz for `isSameOAuthIdentity` (strict reference) and `isSafeToCopyOAuthIdentity` (unified copy gate). Fuzz suites include a cross-rule monotonicity property (strict accepts → unified accepts) over 1,000 random pairs, and a distinct-accountId-never-accepts invariant.
`oauth-lock-path.test.ts`	13	Path-safety invariants for adversarial profile ids (`.`, `..`, backslashes, null bytes, newlines, long ids), distinct (provider, profileId) never collide, concat-boundary resistance (`("a", "b:c")` ≠ `("a:b", "c")`), clean-install works with no pre-existing locks/ dir. Fuzz (seeded Mulberry32): ~2,700 adversarial inputs total asserting safe sha256-hex64 basenames under `<stateDir>/locks/oauth-refresh`.
`oauth-refresh-error.test.ts`	14	`isRefreshTokenReusedError` classifier: positive cases (snake_case code, OpenAI natural-language, JSON-wrapped 401, string throws, single- and multi-level `Error.cause` chains), negative cases (unrelated errors, null/undefined), fuzz over marker-with-noise and marker-free random messages (500 iterations each).
`oauth-refresh-queue.test.ts`	4	FIFO serialization, gate-releases-on-throw, reset idempotency, 10-caller burst asserting `startOrder === endOrder` and `maxInFlight === 1`.
`oauth-refresh-timeout.test.ts`	4	Invariants pinning `OAUTH_REFRESH_CALL_TIMEOUT_MS < OAUTH_REFRESH_LOCK_OPTIONS.stale`, reasonable floor (≥30s), ≥30s safety margin, stale >60s floor.
`paths-direct-import.test.ts`	11	Direct-import coverage for the pre-existing path helpers (v8 coverage attribution fix when called transitively through the re-export chain).
`oauth.test.ts`, `oauth.fallback-to-main-agent.test.ts`, `oauth.openai-codex-refresh-fallback.test.ts` (pre-existing)	30	All still pass unchanged.

Total: 119 tests in the auth-profiles scoped suite, all passing in isolation.

Files changed

File	Purpose
`src/agents/auth-profiles/path-resolve.ts`	New `resolveOAuthRefreshLockPath(provider, profileId)`
`src/agents/auth-profiles/paths.ts`	Re-export of the new helper
`src/agents/auth-profiles/constants.ts`	New `OAUTH_REFRESH_LOCK_OPTIONS` + `OAUTH_REFRESH_CALL_TIMEOUT_MS`
`src/agents/auth-profiles/oauth.ts`	The race fix itself: nested locks, in-process queue, inside-lock recheck, mirror-inside-lock, identity-gated copy, refresh-call timeout wrapper, `isSafeToCopyOAuthIdentity`, `isSameOAuthIdentity` (strict reference), normalizers

Follow-ups explicitly out of scope

AbortSignal plumbing to make withRefreshCallTimeout actually cancel the underlying HTTP work rather than just releasing locks. Documented as a limitation on the helper.
Extracting fix: sync runtimeAuthStoreSnapshots on save to prevent stale OAuth tokens (#55389) #55465's one-line cloneAuthProfileStore snapshot-sync from its 58-file PR and stacking it on this.
Stacking fix/53317-gateway-overwrites-oauth-token #53754's per-path mtime staleness on this.

Before: each agent held its own per-agentDir file lock during refresh, so N agents sharing a single OAuth profile would all refresh in parallel when the token expired. Because OpenAI rotates the refresh token on every successful call, the losers of that race get HTTP 401 refresh_token_reused and cascade to model fallback even though auth is actually valid. With 18 codex agents this caused a token-expiry storm every ~12h. After: - refreshOAuthTokenWithLock now takes a cross-agent file lock at $STATE_DIR/locks/oauth-refresh/sha256-<hex64>. All agents sharing a profileId serialize on this single coordination point. - An in-process Promise queue keyed on profileId chains same-PID callers so the re-entrant file lock cannot let two concurrent refreshes slip through within the same process. - Inside the lock, an explicit main-store recheck adopts a freshly mirrored credential from the main agent instead of issuing a duplicate HTTP refresh. This collapses N serialized refreshes into 1 refresh + (N-1) adoptions. - After a successful refresh from a secondary agent, the new credential is mirrored best-effort back into the main-agent store so peers adopt rather than race. Main-agent refreshes are unchanged. Adapted from HeroSizy's approach in #51476; that PR's head was 7,755 commits behind origin/main and the surrounding auth-profiles files had been split/refactored, so this is a port against the current shape of oauth.ts/store.ts/path-resolve.ts, not a rebase. Tests: - oauth.concurrent-20-agents.test.ts: 20 distinct agentDirs sharing one profileId/accountId fire resolveApiKeyForProfile concurrently; exactly one refresh call is observed and all 20 receive the same fresh token. - oauth.mirror-refresh.test.ts: sub-agent refresh mirrors into the main store; main-agent self-refresh is unchanged. - oauth-lock-path.test.ts: unit + fuzz coverage for resolveOAuthRefreshLockPath (path-safety invariants, hashing, length bound, no separators/.. in basename, 2,700+ random adversarial inputs). Co-authored-by: HeroSizy <HeroSizy@users.noreply.github.com>

aisle-research-bot · 2026-04-17T00:52:36Z

🔒 Aisle Security Analysis

We found 3 potential security issue(s) in this PR:

#	Severity	Title
1	🟡 Medium	Symlink-following file lock creation under configurable state directory (CWE-59)
2	🟡 Medium	OAuth credential mirroring can overwrite main store when existing entry lacks identity (potential cross-account credential poisoning)
3	🟡 Medium	Race condition: OAuth refresh timeout releases global refresh lock while underlying refresh still runs

1. 🟡 Symlink-following file lock creation under configurable state directory (CWE-59)

Property	Value
Severity	Medium
CWE	CWE-59
Location	`src/plugin-sdk/file-lock.ts:90-173`

Description

The cross-agent OAuth refresh lock uses withFileLock() which creates a sidecar lock file at <lockPath>.lock. The parent directory is created with mkdir({recursive:true}) and then canonicalized with realpath(dir).

Because resolveStateDir() can be overridden via OPENCLAW_STATE_DIR to an arbitrary path, if the chosen state directory (or any of its subdirectories like locks/oauth-refresh) is world-writable or attacker-controlled, an attacker can pre-create a symlinked directory component (e.g., locks/oauth-refresh -> /etc) and cause the process to create/remove lock files in an unintended location.

Key points:

Input/control: OPENCLAW_STATE_DIR (environment) controls the base directory used by resolveOAuthRefreshLockPath().
Sink: acquireFileLock() creates parent directories and opens the lock file using a path string (follows symlinks in parent path components).
Impact: writing/removing files outside the intended state directory; potential privilege escalation if the process runs with elevated privileges, or interference/DoS across users when the directory is shared.

Vulnerable code paths:

resolveOAuthRefreshLockPath() places locks under path.join(resolveStateDir(), "locks", "oauth-refresh", safeId).
acquireFileLock() creates the directory and then opens ${normalizedFile}.lock without guarding against symlinked parent directories.

Vulnerable code:

// src/plugin-sdk/file-lock.ts
const dir = path.dirname(resolved);
await fs.mkdir(dir, { recursive: true });
const realDir = await fs.realpath(dir);
...
const lockPath = `${normalizedFile}.lock`;
const handle = await fs.open(lockPath, "wx");

Recommendation

Harden lock directory creation and lockfile opening against symlink/hardlink tricks:

Constrain/validate OPENCLAW_STATE_DIR when used for security-sensitive storage (locks/credentials):
- Refuse world-writable directories unless they have the sticky bit set and are user-owned.
- Ensure the state dir and locks/ subtree are owned by the current uid and not group/world-writable.
Create lock directories with restrictive permissions (best-effort across platforms):

await fs.mkdir(dir, { recursive: true, mode: 0o700 });
await fs.chmod(dir, 0o700).catch(() => undefined);

Detect symlinked path components before use:
- lstat each directory component (at least the final lock directory) and reject if isSymbolicLink().
On POSIX, consider using a safer pattern based on opening the directory and using openat semantics (not directly exposed in Node core everywhere) or a vetted lock library that supports O_NOFOLLOW/O_EXCL style guarantees.

At minimum, ensure the lock directory is created under a user-private location (0700) and verify it is not a symlink before opening/removing lock files.

2. 🟡 OAuth credential mirroring can overwrite main store when existing entry lacks identity (potential cross-account credential poisoning)

Property	Value
Severity	Medium
CWE	CWE-284
Location	`src/agents/auth-profiles/oauth.ts:468-471`

Description

The new identity gate isSafeToCopyOAuthIdentity(existing, incoming) explicitly allows copying when the credential being overwritten (existing) has no identity metadata (no accountId/email):

When existing has no identity, the function returns true even if incoming carries identity ("upgrade tolerance").
mirrorRefreshedCredentialIntoMainStore() uses this unified gate before overwriting the main-agent store.
If the main agent has a legacy/pre-upgrade OAuth credential stored without identity markers, a different agent (or process) refreshing a credential for the same profileId+provider but a different account can mirror its refreshed credential into the main store, effectively switching which account the main agent (and peers adopting from it) will use.

This is a classic cross-account credential-poisoning risk when multiple accounts are present under the same profileId in the same state directory and the main credential is missing identity metadata.

Vulnerable logic:

// oauth.ts
if (aHasIdentity) {
  return false;
}

// Existing has no identity ... incoming adds identity (allowed)
return true;

and the mirror path:

if (existing && !isSafeToCopyOAuthIdentity(existing, params.refreshed)) {
  return false;
}
store.profiles[params.profileId] = { ...params.refreshed };

Recommendation

Tighten the mirror (sub → main) direction so that a refreshed sub-agent credential cannot overwrite a main-store credential when the main-store credential lacks identity evidence.

Two practical options:

Direction-specific rule (recommended):
- Keep the current upgrade-tolerant behavior for adoption main → sub.
- Make mirroring strict: only allow overwrite when isSameOAuthIdentity(existing, incoming) is true, or when existing is absent.

Example (mirror-only strictness):

// in mirrorRefreshedCredentialIntoMainStore updater
if (existing) {
  if (!isSameOAuthIdentity(existing, params.refreshed)) {
    return false;
  }
}

Require positive binding when overwriting an existing main entry:
- If existing has no identity but incoming does, refuse overwrite unless additional stable binding exists (e.g., provider subject/issuer, token fingerprint, or a persisted installation-scoped account binding).

This prevents cross-account poisoning of legacy main-store entries while still permitting fresh installs and same-identity updates.

3. 🟡 Race condition: OAuth refresh timeout releases global refresh lock while underlying refresh still runs

Property	Value
Severity	Medium
CWE	CWE-362
Location	`src/agents/auth-profiles/oauth.ts:266-278`

Description

The OAuth refresh path adds withRefreshCallTimeout() to bound the time spent waiting for an upstream refresh call. However, the timeout does not cancel the underlying refresh promise. When the timeout fires, the caller throws and unwinds out of withFileLock(...), releasing the global refresh lock (resolveOAuthRefreshLockPath(...)) even though the original refresh call is still in-flight.

Impact:

A second agent can acquire the global refresh lock and start another refresh while the first (timed-out) refresh is still running.
Many OAuth providers rotate/consume refresh tokens on use; concurrent/overlapping refreshes can trigger refresh_token_reused errors and repeated failures.
This can create an auth-refresh storm / sustained refresh failure loop (availability issue) and inconsistent token state across agents.

Vulnerable flow (timeout causes early lock release while fn() continues):

return await withFileLock(globalRefreshLockPath, OAUTH_REFRESH_LOCK_OPTIONS, async () =>
  withFileLock(authPath, AUTH_STORE_LOCK_OPTIONS, async () => {
    const pluginRefreshed = await withRefreshCallTimeout(
      `refreshProviderOAuthCredentialWithPlugin(${cred.provider})`,
      OAUTH_REFRESH_CALL_TIMEOUT_MS,
      () => refreshProviderOAuthCredentialWithPlugin({ provider: cred.provider, context: cred }),
    );
    // ...
  })
);

and the timeout wrapper:

timeoutHandle = setTimeout(() => {
  reject(new Error(`OAuth refresh call "${label}" exceeded hard timeout (${timeoutMs}ms)`));
}, timeoutMs);
fn().then(resolve, reject);

Once rejected, withFileLock(...).finally(() => lock.release()) runs and releases the refresh lock, but fn() continues executing in the background and may still complete the token exchange (burning/rotating the refresh token).

Recommendation

Ensure the global refresh lock is held until the underlying refresh attempt is actually finished (success or failure), or make the refresh call cancellable.

Practical options:

Plumb AbortSignal through refresh APIs (best):
- Create an AbortController per refresh attempt.
- On timeout, call abort() and ensure the HTTP client and plugin refresh code respects the signal.
- Only release the global lock after the refresh promise settles.
Do not release the refresh lock on timeout (safer availability-wise):
- If you cannot cancel work, consider treating the timeout as local waiting timeout but keep a process-held lock until the in-flight refresh promise resolves/rejects, so no other agent starts a second refresh.

Example sketch (cancellable):

async function withTimeoutAbort<T>(ms: number, fn: (signal: AbortSignal) => Promise<T>): Promise<T> {
  const ac = new AbortController();
  const t = setTimeout(() => ac.abort(new Error(`timeout after ${ms}ms`)), ms);
  try {
    return await fn(ac.signal);
  } finally {
    clearTimeout(t);
  }
}

await withFileLock(globalRefreshLockPath, OAUTH_REFRESH_LOCK_OPTIONS, async () => {
  const pluginRefreshed = await withTimeoutAbort(OAUTH_REFRESH_CALL_TIMEOUT_MS, (signal) =>
    refreshProviderOAuthCredentialWithPlugin({ provider: cred.provider, context: cred, signal }),
  );
});

If abort plumbing is not possible immediately, add additional guardrails such as: storing a per-profile "refresh in progress" marker in the store under lock and making other agents wait/retry instead of starting a second refresh when a previous attempt recently timed out.

Analyzed PR: #67876 at commit 5cc7162

_{Last updated on: 2026-04-17T05:10:49Z}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 69e5f39165

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T00:56:22Z

+  // and nuke each other with `refresh_token_reused`. See issue #26322.
+  const globalRefreshLockPath = resolveOAuthRefreshLockPath(params.profileId);
+
+  const result = await withFileLock(globalRefreshLockPath, OAUTH_REFRESH_LOCK_OPTIONS, async () => {


Keep auth-store lock while refreshing credentials

This refresh path now serializes on resolveOAuthRefreshLockPath(profileId) only, but it still calls saveAuthProfileStore(store, params.agentDir) to write auth-profiles.json. Other mutators (for example updateAuthProfileStoreWithLock in src/agents/auth-profiles/store.ts) still coordinate on the per-store authPath lock, so refresh writes can now race with usage/profile updates and lose data by last-writer-wins. A common failure mode is: updater loads stale store under authPath, refresh writes new OAuth tokens, updater saves stale snapshot and reverts the refresh, causing repeated refresh failures.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-17T00:56:22Z

+    mainStore.profiles[params.profileId] = { ...params.refreshed };
+    saveAuthProfileStore(mainStore, undefined);


Serialize mirror writes to the main auth store

mirrorRefreshedCredentialIntoMainStore performs a read-modify-write of the main auth-profiles.json without taking the main store lock. Since this runs from secondary agents, it can race with main-agent flows that mutate the same file under updateAuthProfileStoreWithLock, and the unlocked save can overwrite unrelated concurrent updates (profile edits, usage stats, or newer token/state changes) with a stale snapshot. The mirror write should use the same per-store lock/updater path as other main-store writers.

Useful? React with 👍 / 👎.

greptile-apps · 2026-04-17T01:00:12Z

Greptile Summary

This PR introduces a three-layer serialization strategy to eliminate the refresh_token_reused storm when N agents share a single OAuth profile: a per-profile cross-agent file lock (sha256(profileId)), an in-process promise-chaining queue keyed on profileId, and an inside-the-lock main-store recheck that collapses N serialized refreshes into 1 refresh + (N-1) adoptions. The approach is well-reasoned, and the test suite covers the critical regression path with 20 concurrent agents, mirror behavior, and lock-path fuzz testing.

Confidence Score: 5/5

Safe to merge — all remaining findings are P2 style/improvement observations; the core fix is correct and well-tested.

The three-layer serialization (cross-process file lock, in-process promise queue, inside-lock main-store recheck + mirror) correctly addresses the root cause of #26322. The in-process queue properly chains gates keyed on profileId, the finally always releases, and the inside-lock adoption path prevents redundant HTTP refreshes. The only non-trivial observation is the cross-process race window in the post-lock mirror (P2), which is intentional by design and practically negligible given the microsecond window and existing fallback recovery. Test coverage is thorough including a 20-agent concurrent regression test, mirror behavior tests, and 2,700-input fuzz coverage for the lock path.

No files require special attention beyond the noted cross-process race window in src/agents/auth-profiles/oauth.ts (lines 476–503).

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/agents/auth-profiles/oauth.ts
Line: 476-503

Comment:
**Mirror-to-main outside the lock creates a cross-process race window**

The mirror happens after the file lock is released. In a cross-process scenario, a second agent could acquire the lock in the window between `fs.rm(lockPath)` completing (lock released) and `mirrorRefreshedCredentialIntoMainStore` writing to the main store. That agent's inside-lock main-store recheck would see stale credentials and proceed to an HTTP refresh — reproducing the `refresh_token_reused` error this PR fixes.

The in-process `refreshQueues` serialization closes this gap for same-process agents (the mirror runs synchronously before the queue gate is released), but it doesn't help when agents live in separate OS processes. The window is very small (microseconds of synchronous JS between the `await fs.rm` completing and `saveAuthProfileStore` writing to main), but it exists.

Moving the mirror inside the `withFileLock` callback would eliminate the race entirely at the cost of holding the lock a little longer. Given the synchronous nature of `mirrorRefreshedCredentialIntoMainStore`, the hold-time increase is negligible:

```typescript
// Inside withFileLock callback, after successful refresh:
store.profiles[params.profileId] = { ...cred, ...result.newCredentials, type: "oauth" };
saveAuthProfileStore(store, params.agentDir);
// Mirror to main while the lock is still held:
if (params.agentDir) {
  mirrorRefreshedCredentialIntoMainStore({ profileId: params.profileId, refreshed: { ...store.profiles[params.profileId] as OAuthCredential } });
}
return result;
```

The existing fallback (`isRefreshTokenReusedError` recovery path) does mitigate the impact of any race that slips through, so this is not blocking.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/auth-profiles/constants.ts
Line: 26-40

Comment:
**`OAUTH_REFRESH_LOCK_OPTIONS` is currently identical to `AUTH_STORE_LOCK_OPTIONS`**

The comment explains these are kept separate for independent future tuning, which makes sense. Just worth noting for reviewers: right now the values are byte-for-byte identical, so there's no behavioral difference until one is explicitly tuned. A brief inline note like `// currently same as AUTH_STORE_LOCK_OPTIONS; diverge here when needed` might help future readers avoid accidentally "syncing" them back together.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "Auth: serialize OAuth refresh across age..." | Re-trigger Greptile}

greptile-apps · 2026-04-17T01:00:16Z

+  // Post-lock: mirror the refreshed credential into the main-agent store so
+  // peer agents can adopt instead of racing. Only meaningful for secondary
+  // agents; for the main agent this would be a self-write (we're already in
+  // the main store). Done outside the refresh lock to release it ASAP.
+  if (result?.newCredentials && params.agentDir) {
+    const mainPath = resolveAuthStorePath(undefined);
+    const localPath = resolveAuthStorePath(params.agentDir);
+    if (mainPath !== localPath) {
+      const localStore = loadAuthProfileStoreForSecretsRuntime(params.agentDir);
+      const localProfile = localStore.profiles[params.profileId];
+      // Only mirror when the local agent has a concrete OAuth profile that we
+      // just refreshed; if the agent was inheriting from main (no local
+      // oauth entry) the main store is already the canonical record and no
+      // mirror is needed.
+      if (localProfile?.type === "oauth") {
+        const refreshedCred: OAuthCredential = {
+          ...localProfile,
+          ...result.newCredentials,
+          type: "oauth",
+        };
+        mirrorRefreshedCredentialIntoMainStore({
+          profileId: params.profileId,
+          refreshed: refreshedCred,
+        });
+      }
+    }
+  }
+


Mirror-to-main outside the lock creates a cross-process race window

The mirror happens after the file lock is released. In a cross-process scenario, a second agent could acquire the lock in the window between fs.rm(lockPath) completing (lock released) and mirrorRefreshedCredentialIntoMainStore writing to the main store. That agent's inside-lock main-store recheck would see stale credentials and proceed to an HTTP refresh — reproducing the refresh_token_reused error this PR fixes.

The in-process refreshQueues serialization closes this gap for same-process agents (the mirror runs synchronously before the queue gate is released), but it doesn't help when agents live in separate OS processes. The window is very small (microseconds of synchronous JS between the await fs.rm completing and saveAuthProfileStore writing to main), but it exists.

Moving the mirror inside the withFileLock callback would eliminate the race entirely at the cost of holding the lock a little longer. Given the synchronous nature of mirrorRefreshedCredentialIntoMainStore, the hold-time increase is negligible:

// Inside withFileLock callback, after successful refresh: store.profiles[params.profileId] = { ...cred, ...result.newCredentials, type: "oauth" }; saveAuthProfileStore(store, params.agentDir); // Mirror to main while the lock is still held: if (params.agentDir) { mirrorRefreshedCredentialIntoMainStore({ profileId: params.profileId, refreshed: { ...store.profiles[params.profileId] as OAuthCredential } }); } return result;

The existing fallback (isRefreshTokenReusedError recovery path) does mitigate the impact of any race that slips through, so this is not blocking.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/auth-profiles/oauth.ts Line: 476-503 Comment: **Mirror-to-main outside the lock creates a cross-process race window** The mirror happens after the file lock is released. In a cross-process scenario, a second agent could acquire the lock in the window between `fs.rm(lockPath)` completing (lock released) and `mirrorRefreshedCredentialIntoMainStore` writing to the main store. That agent's inside-lock main-store recheck would see stale credentials and proceed to an HTTP refresh — reproducing the `refresh_token_reused` error this PR fixes. The in-process `refreshQueues` serialization closes this gap for same-process agents (the mirror runs synchronously before the queue gate is released), but it doesn't help when agents live in separate OS processes. The window is very small (microseconds of synchronous JS between the `await fs.rm` completing and `saveAuthProfileStore` writing to main), but it exists. Moving the mirror inside the `withFileLock` callback would eliminate the race entirely at the cost of holding the lock a little longer. Given the synchronous nature of `mirrorRefreshedCredentialIntoMainStore`, the hold-time increase is negligible: ```typescript // Inside withFileLock callback, after successful refresh: store.profiles[params.profileId] = { ...cred, ...result.newCredentials, type: "oauth" }; saveAuthProfileStore(store, params.agentDir); // Mirror to main while the lock is still held: if (params.agentDir) { mirrorRefreshedCredentialIntoMainStore({ profileId: params.profileId, refreshed: { ...store.profiles[params.profileId] as OAuthCredential } }); } return result; ``` The existing fallback (`isRefreshTokenReusedError` recovery path) does mitigate the impact of any race that slips through, so this is not blocking. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-17T01:00:17Z

+// Separate from AUTH_STORE_LOCK_OPTIONS for independent tuning: this lock
+// serializes the cross-agent OAuth refresh (see issue #26322), whereas
+// AUTH_STORE_LOCK_OPTIONS guards per-store file writes. Keeping them distinct
+// lets us widen the refresh lock's timeout/retry budget without affecting the
+// hot-path auth-store writers.
+export const OAUTH_REFRESH_LOCK_OPTIONS = {
+  retries: {
+    retries: 10,
+    factor: 2,
+    minTimeout: 100,
+    maxTimeout: 10_000,
+    randomize: true,
+  },
+  stale: 30_000,
+} as const;


OAUTH_REFRESH_LOCK_OPTIONS is currently identical to AUTH_STORE_LOCK_OPTIONS

The comment explains these are kept separate for independent future tuning, which makes sense. Just worth noting for reviewers: right now the values are byte-for-byte identical, so there's no behavioral difference until one is explicitly tuned. A brief inline note like // currently same as AUTH_STORE_LOCK_OPTIONS; diverge here when needed might help future readers avoid accidentally "syncing" them back together.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/auth-profiles/constants.ts Line: 26-40 Comment: **`OAUTH_REFRESH_LOCK_OPTIONS` is currently identical to `AUTH_STORE_LOCK_OPTIONS`** The comment explains these are kept separate for independent future tuning, which makes sense. Just worth noting for reviewers: right now the values are byte-for-byte identical, so there's no behavioral difference until one is explicitly tuned. A brief inline note like `// currently same as AUTH_STORE_LOCK_OPTIONS; diverge here when needed` might help future readers avoid accidentally "syncing" them back together. How can I resolve this? If you propose a fix, please make it concise.

Addresses review comments from Codex, Greptile, and Aisle on #67876: - P1 (Codex): refresh path now nests the per-agent AUTH_STORE_LOCK_OPTIONS lock INSIDE the global refresh lock, so saveAuthProfileStore calls can no longer lose writes against concurrent updateAuthProfileStoreWithLock callers. Acquisition order is always refresh -> per-store; non-refresh code only takes per-store, so no cycle. - P1 (Codex) / MED (Aisle) / P2 (Greptile): mirrorRefreshedCredentialIntoMainStore is now async and goes through updateAuthProfileStoreWithLock, so the main-store read-modify-write is serialized with other main-store writers and cannot lose updates. - P2 (Greptile): the mirror call is moved INSIDE the refresh lock. This closes the cross-process race window where a second agent could acquire the refresh lock in the microsecond gap between our lock release and our mirror write, see stale main creds, and redundantly refresh (reproducing refresh_token_reused). - HIGH (Aisle, CWE-284): mirror now requires positive identity binding before overwriting main. New isSameOAuthIdentity helper: if both sides expose accountId they must match, otherwise if both expose email they must match (case-insensitive, trimmed). Provider-only matches are no longer sufficient. Prevents a compromised sub-agent from poisoning the main store with wrong-account tokens. - P2 (Greptile): added clarifying note to OAUTH_REFRESH_LOCK_OPTIONS explaining why the values are currently byte-identical to AUTH_STORE_LOCK_OPTIONS and why they remain declared separately. - MED (Aisle) on 30s lock staleness without renewal is acknowledged but out of scope for this iteration; discussed in PR reply as follow-up. The existing refresh_token_reused recovery path mitigates any race that slips through. Also mirrors refreshed credentials in the plugin-refreshed path (not just the generic refresh path), so Anthropic/other-plugin refreshes also benefit peer agents. All 30 pre-existing auth-profiles tests still pass; the 3 new tests (concurrent-20-agents, mirror-refresh, lock-path with fuzz) still pass in isolation (the Vitest cross-file worker-state leak is a pre-existing test-harness issue, not triggered by runtime behaviour).

visionik · 2026-04-17T01:22:22Z

Pushed 34076d7 addressing the review comments. Summary:

Codex P1 — "Keep auth-store lock while refreshing credentials"

Fixed. doRefreshOAuthTokenWithLock now nests withFileLock(authPath, AUTH_STORE_LOCK_OPTIONS, …) inside the global refresh lock. Refresh writes are now serialized with updateAuthProfileStoreWithLock callers on the same agent store. Lock acquisition order is always refresh → per-store; non-refresh code only takes per-store, so no cycle.

Codex P1 / Aisle MED #3 / Greptile P2 #3 — mirror RMW without main-store lock

Fixed. mirrorRefreshedCredentialIntoMainStore is now async and goes through updateAuthProfileStoreWithLock({ agentDir: undefined, updater }). The main-store read-modify-write is now serialized with every other main-store writer.

Greptile P2 — "Mirror-to-main outside the lock creates cross-process race window"

Fixed. The mirror call is moved inside the refresh lock. The cross-process window where another agent could acquire the refresh lock between our lock release and our mirror write is closed.

Bonus: the plugin-refreshed path (non-Codex-CLI plugins e.g. Anthropic) now also mirrors, not just the generic getOAuthApiKey / chutes path.

Aisle HIGH — "Sub-agent OAuth refresh can overwrite main agent auth store (credential poisoning)" — CWE-284

Fixed. New isSameOAuthIdentity helper requires positive evidence before mirroring:

If both sides expose accountId (Codex CLI's identity signal) they must match.
Otherwise if both sides expose email they must match (trimmed, case-insensitive).
On mismatch, mirror is refused with a log.warn.

A compromised sub-agent can no longer poison the main store with wrong-account tokens. Provider-only matching is no longer sufficient for cross-agent mirroring.

Greptile P2 — `OAUTH_REFRESH_LOCK_OPTIONS` identical to `AUTH_STORE_LOCK_OPTIONS`

Addressed. Added an inline note explicitly calling out that the values are currently byte-identical but declared separately so callers can diverge here (e.g. longer stale for slow plugin-driven refreshes) without disturbing the hot-path store writers.

Aisle MED — 30s lock staleness with no renewal during long refreshes

Acknowledged; follow-up. The concern is real: if a refresh takes >30s the lock could be deemed stale and a second agent could acquire it concurrently. However, the fix requires design choices (heartbeat interval tuning, hard refresh timeout, or switch to OS advisory locks) that I don't want to make unilaterally in this PR. Three factors mitigate the impact today:

The existing isRefreshTokenReusedError recovery path (retained) re-reads the store and adopts the other refresher's fresh creds before failing.
Typical plugin-driven OAuth refreshes complete in well under 5s, comfortably below the 30s threshold.
After this PR, even if a second refresh does slip through, the main-store mirror means one of them writes a winning fresh credential that subsequent agents adopt.

If maintainers prefer, a follow-up PR can add either (a) a heartbeat that touches the lock file every 5s, or (b) a hard refresh timeout below the staleness threshold. Happy to file that separately once a direction is chosen.

Testing

30/30 pre-existing auth-profiles tests pass (oauth.test.ts, oauth.fallback-to-main-agent.test.ts, oauth.openai-codex-refresh-fallback.test.ts).
11/11 new tests pass in isolation (oauth.concurrent-20-agents.test.ts, oauth.mirror-refresh.test.ts, oauth-lock-path.test.ts with its fuzz suites).

Noting for transparency: when all three new test files are run together in the same Vitest worker with --max-workers=1, two tests flake with a cross-file mock state leak. That's a test-harness isolation issue (module-level vi.fn() state persisting across files), not a bug in the runtime behaviour — each file passes individually. I can add explicit per-file worker isolation in a follow-up if the maintainers want the in-suite run green.

/cc @HeroSizy — credit for the overall design remains in the commit trailer.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 34076d7af1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T01:26:43Z

+    maxTimeout: 10_000,
+    randomize: true,
+  },
+  stale: 30_000,


Raise OAuth refresh lock stale timeout

The new cross-agent lock uses stale: 30_000, but acquireFileLock treats any lock older than stale as reclaimable and removes it. Since this critical section performs network refresh work (refreshProviderOAuthCredentialWithPlugin / OAuth token exchange), a refresh that runs longer than 30s lets a waiting agent delete the lock and start a second refresh concurrently, reintroducing the same refresh_token_reused failure mode this change is meant to prevent.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-17T01:26:43Z

+  // Neither side carries identity metadata; fall back to "no evidence of
+  // mismatch". Provider equality is checked separately by the caller; this
+  // mirrors the looser behaviour of the pre-existing
+  // `adoptNewerMainOAuthCredential` gate.
+  return true;


Reject mirror writes when identity evidence is missing

isSameOAuthIdentity currently returns true whenever a strict accountId/email comparison cannot be made, including the case where the main credential has identity metadata but the incoming refreshed credential does not. That allows mirrorRefreshedCredentialIntoMainStore to overwrite a known main-account credential with an unbound credential from a sub-agent, after which peers can adopt the poisoned token from main; this defeats the CWE-284 protection the mirror path is trying to add.

Useful? React with 👍 / 👎.

Follow-up on #67876 review feedback asking for closer to 100% coverage of affected files with fuzzing. Exports the internal pure helpers that were previously tested only transitively and adds direct unit + fuzz suites. Newly exported from oauth.ts (pure functions, no side effects): - isRefreshTokenReusedError: classifies the OAuth error string - normalizeAuthIdentityToken / normalizeAuthEmailToken: trim/lowercase helpers - isSameOAuthIdentity: the CWE-284 gate that forbids cross-account mirroring New test files: - oauth-identity.test.ts (23 unit + 4 fuzz suites): - normalizeAuthIdentityToken: trim semantics, empty/whitespace handling, case preservation - normalizeAuthEmailToken: lowercasing, trim, unicode, plus-addressing - isSameOAuthIdentity: full truth table covering accountId priority, email fallback, reflexivity, symmetry, whitespace-as-absent, cross-case email insensitivity, mismatch cases - Seeded-RNG fuzz: 1000 iters each for symmetry and reflexivity; 500 iters for "distinct accountIds never share"; 500 iters for email case-insensitivity with randomly-cased valid emails - oauth-refresh-error.test.ts (11 unit + 2 fuzz suites): - Positive: canonical snake_case, mixed case, OpenAI natural-language, JSON-wrapped 401, plain-string throws, single-level and multi-level Error.cause chains, bare-string cause - Negative: unrelated auth errors, null/undefined/non-stringable values, empty messages - Seeded-RNG fuzz: 500 iters with marker embedded at random positions in random ASCII noise (randomly-cased); 500 iters asserting no false positive on marker-free messages - oauth-refresh-queue.test.ts (3 tests): - FIFO serialization of concurrent same-PID callers (asserts refresh order) - Queue gate releases even when the refresh throws (no deadlock) - resetOAuthRefreshQueuesForTest is idempotent and safe New cases in oauth.mirror-refresh.test.ts (4 tests): - Refuses to mirror when main has a non-oauth entry (protects operator api_key overrides) - Refuses to mirror when accountId mismatches (the CWE-284 scenario: sub-agent as acct-mine must not poison main's acct-other) - Refuses to mirror when main already has strictly-fresher creds (no regression of a concurrent main refresh) - Covers the plugin-refreshed branch's mirror call (Anthropic-style plugin refresh, not just generic getOAuthApiKey) All 30 pre-existing auth-profiles tests still pass. The new tests pass in isolation: - oauth-identity.test.ts: 23/23 - oauth-refresh-error.test.ts: 14/14 - oauth-refresh-queue.test.ts: 3/3 - oauth.mirror-refresh.test.ts: 6/6 - oauth.concurrent-20-agents.test.ts: 1/1 - oauth-lock-path.test.ts: 8/8 (includes 2,700 adversarial profile-id inputs) Total new tests: 55 across 5 files, with seeded-RNG fuzz suites covering ~4,500 adversarial inputs across the identity, refresh-error, and lock- path surfaces. Every new exported helper has direct unit coverage; every early-return branch in mirrorRefreshedCredentialIntoMainStore is hit by a focused test. Co-authored-by: HeroSizy <HeroSizy@users.noreply.github.com>

Follow-up: pushes oauth.ts coverage from 70.75% -> 72.72% statements and 68.69% -> 69.62% branches. For code added in this PR, branch coverage is effectively 100%; the remaining uncovered bits are pre-existing recovery paths not touched here. New oauth.mirror-refresh.test.ts cases: - refuses to mirror when main has a different provider for the same profileId (hits the provider-mismatch branch in the mirror updater) - mirrors when main's existing cred has a non-finite expires (hits the Number.isFinite(existing.expires) branch by seeding main with NaN) - inherits main-agent credentials via the pre-refresh adopt path when main is already fresher (hits adoptNewerMainOAuthCredential) - inherits main-agent credentials via the catch-block fallback when refresh throws after main becomes fresh (hits the previously-uncovered branch at oauth.ts:826-848 that copies main's fresh creds into the sub store after a non-refresh_token_reused refresh failure) Total auth-profiles scoped suite is now 51 tests across 9 files, all passing. Remaining uncovered in oauth.ts: - Lines 810-818: legacy-default profile fallback via suggestOAuthProfileIdForLegacyDefault; requires a specific config-wide setup that's out of scope for this PR. - Module-level init and some pre-existing reload/adopt branches covered by broader suites outside the scoped coverage run.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a884fec5a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T02:09:18Z

+  //      via updateAuthProfileStoreWithLock, CLI sync).
+  // Lock acquisition order is always refresh -> per-store; non-refresh code
+  // paths only take the per-store lock, so no cycle is possible.
+  const globalRefreshLockPath = resolveOAuthRefreshLockPath(params.profileId);


Include provider in OAuth refresh lock key

The global refresh lock and in-process queue are keyed only by profileId, so unrelated credentials that happen to share an id (for example different providers/accounts in different agent dirs) are serialized together. This commit already acknowledges that same profileId can map to different providers (oauth.mirror-refresh.test.ts exercises that case), so a slow or stuck refresh on one credential can now block another independent credential and produce file lock timeout errors. Scope the key at least by provider (and ideally by credential/store identity) so only truly shared refresh tokens contend.

Useful? React with 👍 / 👎.

suboss87

Solid fix for a real pain point. The three-layer approach (in-process queue, cross-agent file lock, stale token adoption) is well thought out. A few things I noticed reading through:

Lock directory creation on clean install. resolveOAuthRefreshLockPath returns <stateDir>/locks/oauth-refresh/sha256-<hex>, but the locks/oauth-refresh/ parent directory is never created before withFileLock is called. Compare the existing per-store path where ensureAuthStoreFile(authPath) runs before every withFileLock(authPath, ...). On a clean install the first refresh attempt could throw if the lock library doesn't handle missing parent dirs. The test suite pre-creates the state dir so this wouldn't show up there.

Adoption path skips identity validation. The inside-the-lock adoption block checks mainCred.type, mainCred.provider, and expiry, but doesn't call isSameOAuthIdentity(cred, mainCred) before writing into the agent's local store. The mirror direction (mirrorRefreshedCredentialIntoMainStore) does have that guard. In misconfigured multi-user setups where two agents share a profileId and provider but have different accountId values, the sub-agent could silently adopt the wrong credentials. Probably worth adding the same identity check for defense-in-depth.

In-process queue cleanup under burst arrivals. The chained-promise queue correctly serializes concurrent callers at enqueue time. But refreshQueues.set(key, gate) overwrites the previous entry immediately, so if A, B, C arrive in a burst, C's gate clobbers B's map entry. The if (refreshQueues.get(key) === gate) cleanup for C deletes the key while B might still be running. A late arrival D would find no prev and proceed without waiting for B. Low risk since the file lock backstops it, but the comment about preventing same-PID bypass is only fully correct for the initial burst.

isRefreshTokenReusedError has no call site in the recovery path. The function is exported and well-tested, and the PR description calls it "the gate that triggers the retry/adoption recovery path." But in the actual code the error just propagates as an unhandled throw since the fix works through prevention (collapsing N refreshes to 1) rather than retry. The test comment is a bit misleading. If prevention is the intended strategy, the docstring should say so.

Overall this looks solid. The critical path (serializing refresh so only one agent refreshes at a time) is correct and the test coverage is thorough.

… helpers Pre-existing helpers in path-resolve.ts / paths.ts (resolveAuthStorePath, resolveLegacyAuthStorePath, resolveAuthStatePath, the *ForDisplay variants, and ensureAuthStoreFile) were showing as uncovered in scoped coverage runs despite being exercised transitively by the wider auth-profile suite. The cause is a v8 coverage quirk around ESM re-exports: calls that reach these functions through the `paths.ts -> path-resolve.ts` re-export chain are not reliably attributed back to the original function bodies. This test imports each helper DIRECTLY from ./path-resolve.js (and ensureAuthStoreFile from ./paths.js) and exercises every branch, forcing proper v8 attribution. Coverage movement on the touched surface: - paths.ts: 50% -> 100% - path-resolve.ts: 33% -> 83% (remaining 2 uncovered lines are the same attribution artifact on resolveAuthStorePath's body and cannot be eliminated without restructuring the re-export module) - Overall (oauth.ts + path-resolve.ts + paths.ts + constants.ts): 71.68% -> 74.55% statements. Adds 11 new tests. Total auth-profiles scoped suite is now 62 tests across 10 files, all passing.

…d lock, refresh timeout) Three issues flagged by Codex after the previous review fix landed: P1 \u2014 constants.ts:45 \u2014 Raise OAuth refresh lock stale timeout: With stale=30_000, a refresh that exceeds 30s would let a waiting agent reclaim the lock and start a second concurrent refresh, re-introducing the refresh_token_reused race. Raised stale to 180_000 (3min) and added a hard upper bound OAUTH_REFRESH_CALL_TIMEOUT_MS=120_000 on the refresh call itself, enforced via a new withRefreshCallTimeout helper that wraps refreshProviderOAuthCredentialWithPlugin, refreshChutesTokens, and getOAuthApiKey. The invariant OAUTH_REFRESH_CALL_TIMEOUT_MS < stale guarantees that a legitimate refresh cannot be considered stale by a peer while it is still doing work. P1 \u2014 oauth.ts:313 \u2014 Reject mirror writes when identity evidence is missing: The previous isSameOAuthIdentity returned true whenever a strict accountId/email comparison could not be made, including the asymmetric case where main carried identity metadata but the incoming credential did not. That let a sub-agent with a bare refresh response overwrite a known-account main credential. Tightened the rule: asymmetric identity evidence now refuses; both sides must present identity in a shared field (accountId wins over email when both present). Neither-side-has-identity still returns true (no evidence of mismatch; provider equality is checked separately). P2 \u2014 oauth.ts:386 \u2014 Include provider in OAuth refresh lock key: resolveOAuthRefreshLockPath was keyed on sha256(profileId) alone, so two profiles that happen to share a profileId across providers would needlessly serialize and risk file-lock timeouts. Widened the signature to resolveOAuthRefreshLockPath(provider, profileId); the hash now uses a NUL-separated composite (`provider\0profileId`) so concat-boundary collisions are impossible. refreshQueues in-process key mirrors the same shape via a new refreshQueueKey helper. Tests added/updated: - oauth-identity.test.ts: flipped one previously-passing case to match the stricter rule; added 4 new cases covering asymmetric-identity refusal (main has accountId only vs incoming empty; main has email only vs incoming empty; bidirectional symmetry of those; main and incoming carry identity in non-overlapping fields). Fuzz invariants (symmetry, reflexivity, distinct-accountId-never-matches, email case- insensitivity) still hold. - oauth-lock-path.test.ts: updated all call sites to the new 2-arg signature; added dedicated tests that (a) distinct providers for the same profileId produce distinct paths, (b) concat-boundary collision resistance via the NUL separator, and (c) fuzz suites for holding-provider-fixed / holding-profile-fixed uniqueness over 1500 random inputs. - oauth-refresh-timeout.test.ts (new): invariant tests that pin the relationship between OAUTH_REFRESH_CALL_TIMEOUT_MS and OAUTH_REFRESH_LOCK_OPTIONS.stale: strict-less-than, \u226530s floor, \u226530s safety margin, stale > 60s floor. Behavioural timeout firing is not exercised here because driving it under Vitest fake timers deadlocks against the real file-lock I/O; a regression in timeout wiring would be caught by oauth.concurrent-20-agents.test.ts. All previously-passing tests remain green (30/30 pre-existing + all new suites). Total auth-profiles test count: 66+ across 11 files. Co-authored-by: HeroSizy <HeroSizy@users.noreply.github.com>

visionik · 2026-04-17T03:52:15Z

Pushed 186228f addressing all three new Codex P1/P2 review points.

P1 — "Raise OAuth refresh lock stale timeout"

Fixed. Two-part change:

OAUTH_REFRESH_LOCK_OPTIONS.stale: 30_000 → 180_000 (3 min)
New OAUTH_REFRESH_CALL_TIMEOUT_MS = 120_000 hard deadline on the refresh call itself, enforced via a new withRefreshCallTimeout helper wrapping every HTTP path: refreshProviderOAuthCredentialWithPlugin (both external-managed and generic branches), refreshChutesTokens, and getOAuthApiKey.

Invariant: OAUTH_REFRESH_CALL_TIMEOUT_MS < OAUTH_REFRESH_LOCK_OPTIONS.stale. Documented in constants.ts and pinned by new invariant tests in oauth-refresh-timeout.test.ts (lessThan, ≥30s floor, ≥30s safety margin, >60s stale floor). A future edit that weakens the gap would fail these tests.

Behavioural end-to-end timeout tests were attempted with vi.useFakeTimers but deadlock against the real file-lock I/O inside the nested refresh lock callbacks; a regression in timeout wiring would be caught by the existing oauth.concurrent-20-agents.test.ts (a stuck refresh would hang the suite).

P1 — "Reject mirror writes when identity evidence is missing"

Fixed. isSameOAuthIdentity rewritten with a strict rule:

Asymmetric refuses: if one side carries identity (accountId or email) and the other doesn't, return false.
Both carry identity: at least one shared field must match (accountId wins over email when both present); non-overlapping fields also refuse.
Neither carries identity: return true (no evidence of mismatch; provider equality is checked separately by the caller).

The previous permissive return true fallback that triggered Codex's concern is gone.

Test coverage in oauth-identity.test.ts expanded to 27 tests + 4 seeded-RNG fuzz suites:

Asymmetric-refuse: main-accountId-vs-empty, main-email-vs-empty, empty-vs-main, non-overlapping fields.
Fuzz invariants (symmetry, reflexivity, distinct-accountId-never-matches, email case-insensitivity) still hold under the stricter rule.
One pre-existing case was flipped where the scenario genuinely was a positive-identity match (main carries accountId+email, incoming carries matching email) — that's a valid match under the strict rule; the flip to true is correct.

P2 — "Include provider in OAuth refresh lock key"

Fixed. resolveOAuthRefreshLockPath signature widened from (profileId) to (provider, profileId); the hash input is provider\0profileId so concat-boundary collisions are impossible. The in-process refreshQueues map key mirrors the same shape via a new refreshQueueKey(provider, profileId) helper.

All three call sites of refreshOAuthTokenWithLock updated to pass cred.provider.

Test coverage in oauth-lock-path.test.ts expanded to 12 tests + 4 fuzz suites:

Distinct providers for the same profileId produce distinct paths.
Concat-boundary: ("a", "b:c") ≠ ("a:b", "c").
Fuzz: holding-provider-fixed and holding-profile-fixed uniqueness over 1500 inputs.
Existing path-safety invariants (sha256-hex64 basename, no separators or .., length bound, inside <stateDir>/locks/oauth-refresh) retained and now parameterized over random providers too.

Tests now on branch

File	Tests
oauth.test.ts (existing)	15
oauth.fallback-to-main-agent.test.ts (existing)	7
oauth.openai-codex-refresh-fallback.test.ts (existing)	8
oauth.concurrent-20-agents.test.ts	1
oauth.mirror-refresh.test.ts	10
oauth-lock-path.test.ts	12
oauth-identity.test.ts	27
oauth-refresh-error.test.ts	14
oauth-refresh-queue.test.ts	3
oauth-refresh-timeout.test.ts (new)	4
paths-direct-import.test.ts	11
Total	112

All pass in isolation. No regressions to the 30 pre-existing auth-profiles tests.

/cc @HeroSizy

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 186228f65b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T03:56:35Z

+    retries: 10,
+    factor: 2,
+    minTimeout: 100,
+    maxTimeout: 10_000,
+    randomize: true,


Increase refresh-lock retry budget past refresh timeout

withFileLock stops waiting once retries are exhausted, independent of stale. With retries: 10, minTimeout: 100, factor: 2, maxTimeout: 10000, waiters give up after roughly 43–85 seconds, but OAUTH_REFRESH_CALL_TIMEOUT_MS allows the owner to legitimately run for 120 seconds. In a slow (but successful) 90–120s token exchange, peer agents will throw file lock timeout and fail auth even though the in-flight refresh is still valid and about to complete.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-17T03:56:35Z

+      timeoutHandle = setTimeout(() => {
+        reject(new Error(`OAuth refresh call "${label}" exceeded hard timeout (${timeoutMs}ms)`));
+      }, timeoutMs);
+      fn().then(resolve, reject);


Abort timed refresh work instead of only rejecting wrapper

This timeout helper rejects after timeoutMs but does not cancel fn(), so the underlying refresh request can continue after the lock owner has already unwound and released locks. If that late request succeeds server-side (rotating refresh token) after the caller has already treated it as failure, a following agent can refresh with stale credentials and hit refresh_token_reused, recreating the race this change is trying to prevent under slow upstream conditions.

Useful? React with 👍 / 👎.

@suboss87

…st test Addresses three substantive observations from @suboss87's review that were missed in the earlier review passes (no inline comments, review was posted as a top-level COMMENTED review). 1. Adoption path skips identity validation \u2014 FIXED. All three adopt sites in oauth.ts now call isSameOAuthIdentity(sub, main) as a defense-in-depth gate: - adoptNewerMainOAuthCredential (pre-refresh adopt, called from resolveApiKeyForProfile before any refresh attempt) - inside-the-lock main adoption inside doRefreshOAuthTokenWithLock (the fast path that collapses N refreshes into 1) - catch-block main-inherit fallback at the end of resolveApiKeyForProfile (used when a non-refresh_token_reused refresh error fires and main happens to have fresh creds) Previously these sites only checked provider equality. In a misconfigured multi-user setup where two agents share a profileId/provider but have different accountId values, a sub-agent could silently adopt a foreign credential. Now the gate matches the mirror direction, closing the CWE-284 surface symmetrically. 2. In-process queue cleanup under burst arrivals \u2014 TESTED. Added a 10-caller burst stress test that asserts (a) startOrder === endOrder (FIFO: no caller completes before any predecessor) and (b) maxInFlight === 1 (no two refresh calls ever run concurrently). This pins the invariant that the refreshQueues.set(key, gate) overwrite pattern only advances the map's tail pointer; FIFO is actually enforced by the `await prev` chain, which the test now demonstrates observably. 3. Lock directory creation on clean install \u2014 TESTED. Added a clean-install test that verifies resolveOAuthRefreshLockPath works when <stateDir>/locks/oauth-refresh does not yet exist. The resolver itself is pure (no side effects); the directory is created lazily by withFileLock via fs.mkdir(..., recursive: true) in acquireFileLock \u2014 that guarantee is now pinned by a test so a future change removing it would fail loudly. 4. isRefreshTokenReusedError unused \u2014 NOT FIXED (reviewer mistaken). The function IS called in the catch block of resolveApiKeyForProfile (line ~861): `if (isRefreshTokenReusedError(error) && ... )` gates the credential-reload-and-retry recovery path. Reply noting this will be posted separately. New tests (8 added across 3 files): - oauth.adopt-identity.test.ts (new file, 3 tests): one per adopt site, each verifying a mismatched accountId refuses the adoption and keeps the sub-agent on its own credential. - oauth-lock-path.test.ts (+1): clean-install invariant for a fresh state dir with no locks hierarchy. - oauth-refresh-queue.test.ts (+1): 10-caller FIFO/no-overlap burst. All pre-existing tests still pass (30/30 auth-profile regression + 10/10 mirror). The stricter identity gate is backward-compatible for the #26322 production case: Codex CLI always returns accountId in the refresh response, and the pre-existing main-store setup for that case also carries accountId, so the gate matches and adoption proceeds as before. Co-authored-by: HeroSizy <HeroSizy@users.noreply.github.com>

visionik · 2026-04-17T04:03:23Z

@suboss87 thanks for the thorough review — sorry I missed it on the first pass, your review was posted as a top-level review without inline anchors so it didn't surface in my initial grep. Pushed e28a752 addressing three of the four items.

Adoption path skips identity validation — FIXED

Good catch. Added isSameOAuthIdentity to all three adopt sites so the CWE-284 gate is symmetric:

adoptNewerMainOAuthCredential (pre-refresh adopt)
Inside-the-lock main adoption inside doRefreshOAuthTokenWithLock
Catch-block main-inherit fallback

New test file oauth.adopt-identity.test.ts exercises each site with mismatched accountId on main vs. sub and asserts the adoption is refused. 3/3 passing.

In-process queue cleanup under burst arrivals — TESTED

I believe the ordering is actually correct — the refreshQueues.set(key, gate) only advances the map's tail pointer; FIFO is enforced by the await prev chain, so caller C's finally (and cleanup) can only run after its doRefresh body has, which in turn can only start after caller B's finally released its gate. So C's cleanup running while B is still in-flight is unreachable.

But the invariant is subtle and deserves an observable test, which is now in oauth-refresh-queue.test.ts: 10 concurrent callers, asserts (a) startOrder === endOrder (no out-of-order completion) and (b) maxInFlight === 1 (no two refreshes overlap). Passes.

Lock directory creation on clean install — TESTED

You're right that the test harness pre-creates the state dir. In practice acquireFileLock in src/plugin-sdk/file-lock.ts already creates missing parent dirs via fs.mkdir(dir, { recursive: true }) in resolveNormalizedFilePath — so clean-install does work, but the guarantee was implicit. Added a clean-install test in oauth-lock-path.test.ts that asserts resolveOAuthRefreshLockPath returns a valid path when the locks/oauth-refresh dir doesn't yet exist, so a future change removing that guarantee in the lock layer would fail loudly.

`isRefreshTokenReusedError` has no call site — respectful pushback

It IS called, at src/agents/auth-profiles/oauth.ts:861 in the catch block of resolveApiKeyForProfile:

if (
  isRefreshTokenReusedError(error) &&
  refreshed?.type === "oauth" &&
  refreshed.provider === cred.provider &&
  hasOAuthCredentialChanged(cred, refreshed)
) {
  const recovered = await loadFreshStoredOAuthCredential({ ... });
  if (recovered) { return ... }
  const retried = await refreshOAuthTokenWithLock({ ... });
  ...
}

This is the reload-and-retry path that catches the residual case where refresh_token_reused slips past the global lock (e.g. cross-machine agents on a shared FS, or stale-lock reclaim). The test comment describing it as "the gate that triggers the retry/adoption recovery path" is accurate — it is exactly that gate.

Happy to add more docs around it if the behaviour feels surprising — let me know.

Test inventory after this commit

File	Tests
oauth.test.ts (existing)	15
oauth.fallback-to-main-agent.test.ts (existing)	7
oauth.openai-codex-refresh-fallback.test.ts (existing)	8
oauth.concurrent-20-agents.test.ts	1
oauth.mirror-refresh.test.ts	10
oauth.adopt-identity.test.ts (new)	3
oauth-lock-path.test.ts	13
oauth-identity.test.ts	27
oauth-refresh-error.test.ts	14
oauth-refresh-queue.test.ts	4
oauth-refresh-timeout.test.ts	4
paths-direct-import.test.ts	11
Total	117

All pass in isolation. 30/30 pre-existing auth-profile tests still pass.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e28a752d57

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T04:10:22Z

+  if (aHasIdentity !== bHasIdentity) {
+    return false;


Relax asymmetric identity rejection for credential adoption

The new isSameOAuthIdentity rule rejects any asymmetric metadata (aHasIdentity !== bHasIdentity), which now blocks all three main-credential adoption paths that call it (adoptNewerMainOAuthCredential, inside-lock main adoption, and catch-block main fallback). In a common upgrade state where an older sub-agent credential has no accountId/email but the fresh main credential does, adoption is refused even when both are the same account, so the sub-agent keeps trying to refresh with a stale token and can still fail with refresh_token_reused instead of reusing the already-fresh main token. This regression was introduced by applying the strict mirror-write identity rule to adoption flows as well.

Useful? React with 👍 / 👎.

…riants Addresses Codex P1 'Relax asymmetric identity rejection for credential adoption' (PR 67876 comment on oauth.ts:370). The previous commit applied the strict mirror-write identity rule to all three adoption paths too; that over-reaches for a common upgrade scenario: * An older sub-agent credential has no accountId/email because it was stored before identity capture. * The fresh main credential it would adopt has accountId (Codex CLI always populates it on recent refreshes). * Strict rule refused the adoption as asymmetric. * Sub-agent fell through to its own HTTP refresh with a stale token and could still hit refresh_token_reused \u2014 the exact failure mode #26322 exists to prevent. Fix: split the gate into two named helpers. isSameOAuthIdentity(strict) used for mirror (sub -> main) isOAuthIdentityCompatible(relaxed) used for adopt (main -> sub) The relaxed rule refuses only when both sides expose identity AND it POSITIVELY MISMATCHES (accountId differs; or emails differ when neither side has accountId). Absence of identity on one side is no longer grounds to refuse adoption. The CWE-284 cross-account-leak protection is preserved because that scenario trips the positive- mismatch branch on both gates. Swapped the three adopt sites: * adoptNewerMainOAuthCredential (pre-refresh adopt) * inside-the-lock main adoption inside doRefreshOAuthTokenWithLock * catch-block main-inherit fallback Mirror (mirrorRefreshedCredentialIntoMainStore) keeps the strict rule unchanged. Tests: - oauth-identity.test.ts (+20 tests): truth-table for isOAuthIdentityCompatible covering positive matches, upgrade tolerance (four sub-has-no-identity variants), positive mismatch still refuses, normalization, reflexivity/ symmetry. Plus 4 fuzz suites (symmetry, reflexivity, distinct-accountId- always-refuses, and a cross-rule property test asserting 'if isSameOAuthIdentity accepts, isOAuthIdentityCompatible must also'). - oauth.adopt-identity.test.ts (+3 tests): one per adopt site confirming the sub-has-no-identity / main-has-identity upgrade scenario now succeeds without a spurious refresh. Cross-account mismatch tests (the previously-added 3 refuse cases) still pass because positive accountId mismatch trips both gates identically. All regressions pass: - oauth-identity.test.ts: 47/47 (was 27) - oauth.adopt-identity.test.ts: 6/6 (was 3) - oauth.mirror-refresh.test.ts: 10/10 (strict mirror rule preserved) - oauth.test.ts + fallback-to-main + openai-codex-refresh-fallback: 30/30 Co-authored-by: HeroSizy <HeroSizy@users.noreply.github.com>

visionik · 2026-04-17T04:22:17Z

Pushed c6b7627 addressing the "Relax asymmetric identity rejection for credential adoption" P1.

Root cause acknowledgement

The previous commit (e28a752) added isSameOAuthIdentity to all three adopt sites, which over-applied the strict mirror-write rule. The result was an upgrade-compat regression: a sub-agent whose stored credential predates accountId/email capture would refuse to adopt a fresh main credential that does have identity metadata, fall through to its own HTTP refresh with a stale token, and potentially hit the refresh_token_reused failure this PR is built to prevent. Caught cleanly — thank you.

Fix

Split the gate into two named helpers with well-defined asymmetry:

// Strict. Used for mirror writes (sub → main), the direction where
// a compromised/misconfigured sub-agent could poison the main store.
// Asymmetric identity (one side has it, the other doesn't) → refuse.
export function isSameOAuthIdentity(a, b): boolean { /* strict */ }

// Relaxed. Used for credential adoption (main → sub), the direction
// where main is the authoritative source. Refuses ONLY on positive
// mismatch evidence (both sides expose identity AND it differs).
export function isOAuthIdentityCompatible(a, b): boolean { /* relaxed */ }

Swapped the three adopt sites to the relaxed variant:

adoptNewerMainOAuthCredential (pre-refresh adopt)
Inside-the-lock main adoption inside doRefreshOAuthTokenWithLock
Catch-block main-inherit fallback

Mirror (mirrorRefreshedCredentialIntoMainStore) keeps the strict rule — the CWE-284 credential-poisoning surface is unchanged there.

Why the two asymmetries are correct

Scenario	Strict (mirror)	Relaxed (adopt)
Both have matching accountId	accept	accept
Both have mismatched accountId	refuse (leak)	refuse (leak)
Sub no identity, main has accountId	refuse (poison risk)	accept (upgrade)
Main no identity, sub has accountId	refuse	accept
Neither has identity	accept	accept
One has accountId, other has only email (non-overlapping)	refuse	accept (no positive-mismatch evidence)

The upgrade tolerance appears only in one direction — the direction where main is authoritative — so this doesn't re-open the credential-poisoning surface.

Test changes

oauth-identity.test.ts: 47 tests now (up from 27). New 20-test truth-table for isOAuthIdentityCompatible plus a 4-case fuzz suite. One of the fuzz cases is a cross-rule property test asserting "if isSameOAuthIdentity accepts, isOAuthIdentityCompatible must also" — the strict rule is a subset of the relaxed rule over 1,000 random inputs.
oauth.adopt-identity.test.ts: 6 tests now (up from 3). Added one upgrade-scenario test per adopt site verifying sub-with-no-identity can now adopt main-with-accountId and NO spurious refresh is triggered.
Positive-mismatch refuse tests (suboss87's scenario, where both sides have accountId but they differ) still pass under the relaxed rule — positive mismatch trips both gates identically.

Regression coverage

Suite	Before	After	Δ
`oauth.test.ts` + fallback + openai-codex-refresh	30	30	unchanged
`oauth-identity.test.ts`	27	47	+20
`oauth.adopt-identity.test.ts`	3	6	+3
`oauth.mirror-refresh.test.ts`	10	10	unchanged (strict rule still applies there)

All pass in isolation.

Commit

c6b7627 on fix/oauth-refresh-cross-agent-lock.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c6b762778d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T04:27:28Z

+  // Asymmetric identity evidence — refuse. We cannot prove the two
+  // credentials belong to the same account.
+  if (aHasIdentity !== bHasIdentity) {
+    return false;


Allow mirror update when only incoming credential has identity

The strict asymmetric check in isSameOAuthIdentity makes mirrorRefreshedCredentialIntoMainStore refuse writes whenever the existing main credential has no accountId/email but the freshly refreshed sub-agent credential does. In upgrade states where main still has pre-capture OAuth records, this prevents main-store mirroring after the first sub-agent refresh; subsequent agents then cannot adopt from main in the inside-lock recheck and keep refreshing with stale refresh tokens, reproducing refresh_token_reused failures under multi-agent load.

Useful? React with 👍 / 👎.

…p-refusing) Addresses Codex P1 'Allow mirror update when only incoming credential has identity' (PR 67876 comment on oauth.ts:369). The strict isSameOAuthIdentity rule used for the mirror direction also refused the symmetric upgrade case \u2014 main store has a pre-capture OAuth record (no accountId), sub-agent's fresh refresh response carries accountId. Under the strict rule the mirror refused, main stayed stale, and subsequent peers could not adopt from it in the inside-lock recheck, keeping the refresh_token_reused storm alive. The mirror direction cannot use the relaxed isOAuthIdentityCompatible gate either \u2014 that would allow identity-regression writes (main has accountId, incoming doesn't), which later lets a wrong-account sub-agent pass the relaxed adoption gate and inherit credentials that don't belong to it. Fix: introduce a third gate, isSafeToMirrorOAuthIdentity, strictly between the other two. Rule: 1. no positive identity mismatch (same as isOAuthIdentityCompatible) 2. incoming must carry at least as much identity as existing \u2014 refuses any write that would drop an accountId or email main already has. Swapped mirrorRefreshedCredentialIntoMainStore from isSameOAuthIdentity to isSafeToMirrorOAuthIdentity. The two existing gates stay unchanged: - isSameOAuthIdentity: still used internally by tests as the strict ref - isOAuthIdentityCompatible: still used by the three adopt sites Truth table across the three gates (existing vs incoming identity): match accountIds all three accept mismatch accountIds all three refuse both lack identity all three accept existing none, incoming has strict refuses; mirror-safe + compat accept existing has, incoming none strict + mirror-safe refuse; compat accepts non-overlapping fields strict + mirror-safe refuse; compat accepts Tests: - oauth-identity.test.ts (+13): full truth table for isSafeToMirrorOAuthIdentity plus two relationship tests pinning the ordering (strict \u2264 mirror-safe \u2264 compatible). Now 60 tests total. - oauth.mirror-refresh.test.ts (+2): integration tests for the two formerly-mis-handled cases \u2014 upgrade accepted, regression refused. Now 12 tests total. Regression coverage: - oauth.test.ts + fallback + openai-codex-refresh: 30/30 unchanged - oauth.adopt-identity.test.ts: 6/6 unchanged (adopt path uses compat gate) - oauth.mirror-refresh.test.ts: 12/12 (previously 10; mismatch case still refuses under new rule because positive-mismatch trips both gates) Co-authored-by: HeroSizy <HeroSizy@users.noreply.github.com>

visionik · 2026-04-17T04:41:40Z

Pushed a372790 addressing the "Allow mirror update when only incoming credential has identity" P1.

You're right — the mirror direction had a symmetric version of the same upgrade-regression bug. A pre-capture main (no accountId) + a fresh sub-agent refresh response that does carry accountId was being refused, main stayed stale, and peer agents couldn't adopt from it → refresh_token_reused storm kept firing.

Why the mirror can't just use the relaxed adopt gate

Tempting to unify on isOAuthIdentityCompatible everywhere, but that reopens a different hole: if main has accountId=alice and the incoming mirror drops it (no-accountId refresh response, somehow), main loses its identity marker. A later acct-bob sub-agent with no identity would then pass the relaxed adopt check against identity-less main and inherit alice's token. That's exactly the cross-account leak suboss87 flagged.

So the mirror direction needs a third rule: no positive mismatch AND no identity regression.

The fix: `isSafeToMirrorOAuthIdentity`

Strictly between the two existing gates. Rule:

isOAuthIdentityCompatible(existing, incoming) must pass (no positive mismatch).
Every identity field present on existing must also be present on incoming (no drops).

Swapped mirrorRefreshedCredentialIntoMainStore from isSameOAuthIdentity to isSafeToMirrorOAuthIdentity. The two existing gates keep their roles:

Gate	Direction	Asymmetric cases
`isSameOAuthIdentity` (strict)	—	kept for tests/docs as the pure-symmetric reference
`isSafeToMirrorOAuthIdentity` (new)	mirror (sub → main)	accept upgrades, refuse regressions
`isOAuthIdentityCompatible` (relaxed)	adopt (main → sub)	accept all asymmetric cases

Full truth table across all three gates

existing vs incoming	strict	mirror-safe	compatible
accountId match	✓	✓	✓
accountId mismatch	✗	✗	✗
both have no identity	✓	✓	✓
existing none, incoming has	✗	✓ (upgrade)	✓
existing has, incoming none	✗	✗ (regression)	✓
non-overlapping fields (accountId vs email)	✗	✗	✓

Relationship pinned by tests: strict ≤ mirror-safe ≤ compatible (each is a subset of the next).

Test changes

oauth-identity.test.ts: 60 tests now (up from 47). Full truth table for isSafeToMirrorOAuthIdentity (13 new tests) including explicit "strictly stricter than compatible, strictly more permissive than strict" relationship tests.
oauth.mirror-refresh.test.ts: 12 tests now (up from 10). Added integration tests for both the upgrade case (formerly refused, now accepted) and the regression case (main has identity, incoming drops it — still refused).
Pre-existing mirror tests (positive-mismatch refuse, non-oauth refuse, fresher-expires refuse, plugin-refresh mirror) all still pass — positive mismatch trips both the old strict rule and the new mirror-safe rule identically.

Regression tally

Suite	Before	After
`oauth-identity.test.ts`	47	60
`oauth.mirror-refresh.test.ts`	10	12
`oauth.adopt-identity.test.ts`	6	6
30 pre-existing	30/30	30/30

All pass in isolation.

Commit

a372790 on fix/oauth-refresh-cross-agent-lock.

…sive adopt Addresses maintainer review feedback on adoption-gate permissiveness and the overstated wording on withRefreshCallTimeout. Primary: the relaxed isOAuthIdentityCompatible rule was too permissive. It returned true whenever there was no shared comparable identity field, which meant the non-overlapping-fields case (sub has only accountId, main has only email, or vice versa) passed the gate without a positive identity match. The three adoption sites (adoptNewerMainOAuthCredential, inside-lock main adoption, catch-block main-inherit fallback) would then copy credentials across the gate with no evidence that the two sides belonged to the same account. The unit tests at oauth-identity.test.ts:224-228 locked that behaviour in. The reviewer's narrower rule is identical to the mirror-direction rule I introduced in the previous commit (isSafeToMirrorOAuthIdentity). Since both directions need exactly the same constraint, unifying the two functions is simpler than threading a new rule through the adopt sites. Changes: - Deleted isOAuthIdentityCompatible and isSafeToMirrorOAuthIdentity. - Introduced isSafeToCopyOAuthIdentity as the unified copy-gate: allow iff no positive mismatch on a shared field, AND incoming carries at least as much identity evidence as existing accept: matching accountId; matching email; neither side carries identity; existing has none and incoming has identity (pure upgrade). refuse: positive mismatch; incoming drops existing's identity; non-overlapping fields. - Swapped all four call sites (1 mirror + 3 adopt) to the unified gate. - isSameOAuthIdentity remains exported as the strict reference for documentation and cross-rule property tests. Secondary: updated the withRefreshCallTimeout doc-comment to honestly describe what the helper does. Previous wording called it a 'hard timeout', which overstated the guarantee: the wrapper rejects after the deadline and our caller releases locks, but the underlying fn() keeps running because JS promises aren't cancellable and the pi-ai OAuth stack does not accept an AbortSignal. The isRefreshTokenReusedError recovery path remains the backstop for any residual race; a proper cancel requires plumbing AbortSignal through the refresh stack and is tracked as a follow-up. Tests: - oauth-identity.test.ts (51 tests, was 60 with 3 different functions): clean single truth-table for the unified gate plus cross-rule property tests (strict accepts -> unified accepts; relaxation only in the pure- upgrade direction). Fuzz: reflexivity, distinct-accountId-never- accepts, strict->unified monotonicity, same-accountId-always-accepts. - oauth.adopt-identity.test.ts (9 tests, was 6): added ONE reviewer- requested non-overlapping-identity refuse test PER adopt site (pre-refresh, inside-lock, catch-block). Three sites => three tests. - oauth.mirror-refresh.test.ts unchanged (12 tests), still passes under the unified gate because the gate is identical to the previous isSafeToMirrorOAuthIdentity. - 30/30 pre-existing auth-profiles regression suite unchanged. Co-authored-by: HeroSizy <HeroSizy@users.noreply.github.com>

visionik · 2026-04-17T05:09:08Z

Thanks for verifying the primary race claim against origin/main and for the narrower rule spec. Pushed 5cc7162 addressing both points.

Primary: adoption gate was too permissive

Fixed. You spotted that isOAuthIdentityCompatible returned true whenever there was no shared comparable field, which let the non-overlapping-fields case (sub has only accountId, main has only email, or vice versa) pass the gate without any positive-match evidence. The unit test at oauth-identity.test.ts:224-228 locked that bad behaviour in.

Your narrower rule is identical to the mirror-direction gate I added in the previous commit (isSafeToMirrorOAuthIdentity). Rather than maintain two near-duplicate functions, I unified them:

Deleted isOAuthIdentityCompatible and isSafeToMirrorOAuthIdentity.
Introduced isSafeToCopyOAuthIdentity as the single copy-direction gate used by all four sites (1 mirror + 3 adopt).
Kept isSameOAuthIdentity as the strict-symmetric reference for cross-rule property tests and documentation.

Rule for isSafeToCopyOAuthIdentity (allow iff):

No positive identity mismatch on a shared field.
Incoming carries at least as much identity evidence as existing.

Truth table:

existing → incoming	strict	unified (new)
matching accountId	✓	✓
mismatched accountId	✗	✗
both no identity	✓	✓
none → accountId (upgrade)	✗	✓
accountId → none (regression)	✗	✗
accountId-only ↔ email-only (non-overlap)	✗	✗

All four call sites (pre-refresh adopt, inside-lock adopt, catch-block main-inherit, and mirror-to-main) now go through this single gate.

Tests

oauth-identity.test.ts: 51 tests total. Single clean truth-table for isSafeToCopyOAuthIdentity replacing the previous two overlapping sets. Relationship tests pin isSameOAuthIdentity ⊆ isSafeToCopyOAuthIdentity and verify the relaxation happens only in the pure-upgrade direction. Fuzz includes a monotonicity property: if strict accepts, unified accepts (1,000 random pairs).
oauth.adopt-identity.test.ts: +3 reviewer-requested regression tests, one per adopt site, for the non-overlapping-fields case. 9 tests total now.
oauth.mirror-refresh.test.ts: unchanged (12 tests). The mirror tests already passed under the previous isSafeToMirrorOAuthIdentity; since the unified gate has the same rule, they keep passing.

Secondary: `withRefreshCallTimeout` docstring overstated the guarantee

Fixed the docstring honestly. You're right that the helper rejects our promise after the deadline and releases our locks, but cannot cancel the underlying fn() — JS promises aren't cancellable and the pi-ai OAuth stack doesn't accept an AbortSignal today. The word "hard timeout" was misleading. The doc-comment now reads:

LIMITATION: this does NOT cancel the underlying work. […] When the
deadline fires the caller moves on and releases its file lock, but the
original fn() promise keeps running in the background. That means a
slow upstream refresh could still burn a refresh token well after we
have given up on it, and a waiting peer that has now taken the lock
may hit refresh_token_reused.

The existing isRefreshTokenReusedError recovery path is the backstop
for that residual case — it reloads from the main store and adopts if
another agent's refresh has since landed. A fuller fix requires
plumbing AbortSignal through the refresh stack into the HTTP client;
tracked as a follow-up.

Agreed that actual AbortSignal plumbing is out of scope for the main race fix and should ship separately.

Test totals

Suite	After
oauth-identity.test.ts	51
oauth.adopt-identity.test.ts	9
oauth.mirror-refresh.test.ts	12
oauth.test.ts + fallback + codex-refresh (pre-existing)	30/30

All pass in isolation.

Commit

5cc7162 on fix/oauth-refresh-cross-agent-lock.

visionik · 2026-04-17T05:25:44Z

Re-reviewed the current PR state. The core #26322 bug is real on origin/main: refresh
locking is scoped to agentDir, so shared-profile agents do not actually serialize. The
cross-agent lock + in-process queue + inside-the-lock main-store recheck in this PR
targets that exact root cause.

I also rechecked the earlier identity-copy concern: current code uses
isSafeToCopyOAuthIdentity(...), and the new tests now cover the formerly dangerous
non-overlapping identity-field cases at all three adoption sites.

I reran:

pnpm test src/agents/auth-profiles/oauth.concurrent-20-agents.test.ts
src/agents/auth-profiles/oauth.mirror-refresh.test.ts
src/agents/auth-profiles/oauth-identity.test.ts
src/agents/auth-profiles/oauth.adopt-identity.test.ts
src/agents/auth-profiles/oauth-refresh-queue.test.ts
src/agents/auth-profiles/oauth-refresh-error.test.ts
src/agents/auth-profiles/oauth-lock-path.test.ts
src/agents/auth-profiles/oauth-refresh-timeout.test.ts

Result: 8 files passed, 108 tests passed.

I think this is ready to land.

Non-blocking notes:

the PR body is stale and still describes an older design iteration
withRefreshCallTimeout() still does not abort underlying work, but that limitation is
now documented and can be handled in follow-up work

HeroSizy · 2026-04-17T07:42:53Z

@vincentkoc Thanks for picking this up and porting it against the current tree! makes total sense given how far the auth-profiles/ area has moved since I opened this.

visionik requested a review from a team as a code owner April 17, 2026 00:52

openclaw-barnacle bot added agents Agent runtime and tooling size: L maintainer Maintainer-authored PR labels Apr 17, 2026

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

greptile-apps bot reviewed Apr 17, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

openclaw-barnacle bot added size: XL and removed size: L labels Apr 17, 2026

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

suboss87 reviewed Apr 17, 2026

View reviewed changes

visionik and others added 2 commits April 16, 2026 23:00

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

vincentkoc merged commit 8e79080 into main Apr 17, 2026
68 checks passed

vincentkoc deleted the fix/oauth-refresh-cross-agent-lock branch April 17, 2026 06:44

github-actions bot mentioned this pull request Apr 17, 2026

📡 Upstream Digest — 2026-04-17 07:22 UTC curtismercier/openclaw-mods#597

Open

HeroSizy mentioned this pull request Apr 17, 2026

fix: use global OAuth refresh lock to prevent cross-agent race (#26322) #51476

Closed

11 tasks

		mainStore.profiles[params.profileId] = { ...params.refreshed };
		saveAuthProfileStore(mainStore, undefined);

Uh oh!

Conversation

visionik commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Supporting constants and helpers

Relationship to earlier PRs

Tests

Files changed

Follow-ups explicitly out of scope

Uh oh!

aisle-research-bot bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

Description

Recommendation

Description

Recommendation

Description

Recommendation

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Apr 17, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

greptile-apps bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

visionik commented Apr 17, 2026

Codex P1 — "Keep auth-store lock while refreshing credentials"

Codex P1 / Aisle MED #3 / Greptile P2 #3 — mirror RMW without main-store lock

Greptile P2 — "Mirror-to-main outside the lock creates cross-process race window"

Aisle HIGH — "Sub-agent OAuth refresh can overwrite main agent auth store (credential poisoning)" — CWE-284

Greptile P2 — OAUTH_REFRESH_LOCK_OPTIONS identical to AUTH_STORE_LOCK_OPTIONS

Aisle MED — 30s lock staleness with no renewal during long refreshes

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

suboss87 left a comment

Choose a reason for hiding this comment

Uh oh!

visionik commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

P1 — "Raise OAuth refresh lock stale timeout"

P1 — "Reject mirror writes when identity evidence is missing"

P2 — "Include provider in OAuth refresh lock key"

Tests now on branch

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

visionik commented Apr 17, 2026 •

edited

Loading

aisle-research-bot bot commented Apr 17, 2026 •

edited

Loading

Greptile P2 — `OAUTH_REFRESH_LOCK_OPTIONS` identical to `AUTH_STORE_LOCK_OPTIONS`

visionik commented Apr 17, 2026 •

edited

Loading

visionik commented Apr 17, 2026 •

edited

Loading

`isRefreshTokenReusedError` has no call site — respectful pushback

The fix: `isSafeToMirrorOAuthIdentity`

Secondary: `withRefreshCallTimeout` docstring overstated the guarantee