Skip to content

Comments

fix(skills): refresh stale skill snapshot after gateway restart#12209

Open
mcaxtr wants to merge 5 commits intoopenclaw:mainfrom
mcaxtr:fix/12092-skills-stale-session
Open

fix(skills): refresh stale skill snapshot after gateway restart#12209
mcaxtr wants to merge 5 commits intoopenclaw:mainfrom
mcaxtr:fix/12092-skills-stale-session

Conversation

@mcaxtr
Copy link
Contributor

@mcaxtr mcaxtr commented Feb 9, 2026

Summary

  • Fix stale skills in existing sessions after gateway restart (Skills added after session creation not visible until new session #12092)
  • When the gateway restarts, the in-memory skills version resets to 0 while sessions retain snapshots from the prior process (version > 0)
  • The shouldRefreshSnapshot check required snapshotVersion > 0, so it never triggered a rebuild after restart
  • Add restart detection: when in-memory version is 0 but persisted version > 0, rebuild the snapshot

Root Cause

getSkillsSnapshotVersion() returns from in-memory state (workspaceVersions / globalVersion in refresh.ts), which resets to 0 on process restart. The comparison at session-updates.ts:147-148 was:

const shouldRefreshSnapshot =
  snapshotVersion > 0 && (nextEntry?.skillsSnapshot?.version ?? 0) < snapshotVersion;

Since snapshotVersion is 0 after restart, the condition snapshotVersion > 0 is always false, so stale snapshots are reused forever.

Fix

Extend the condition to also detect the restart scenario:

const shouldRefreshSnapshot =
  (snapshotVersion > 0 && persistedVersion < snapshotVersion) ||
  (snapshotVersion === 0 && persistedVersion > 0);

The second clause detects: "process just started (version 0) but session has a snapshot from a prior lifetime (version > 0)" → rebuild.

Test Plan

  • Write failing test reproducing the restart scenario (stale snapshot returned)
  • Confirm test fails before fix
  • Implement fix (2-line change in session-updates.ts)
  • Confirm all 4 tests pass after fix
  • pnpm build passes
  • pnpm check passes (lint + format)
  • codex review --base main returns zero issues

TDD: All 4 new tests fail before, pass after

  1. Restart scenario — in-memory version 0, persisted version > 0 → rebuilds snapshot
  2. Normal operation — in-memory version matches persisted → reuses snapshot
  3. Watcher fired — in-memory version higher than persisted → rebuilds snapshot
  4. No prior snapshot — no existing snapshot, version 0 → builds fresh

Fixes #12092

Greptile Overview

Greptile Summary

This PR updates ensureSkillSnapshot to correctly refresh a session’s persisted skills snapshot after a gateway restart. It introduces a restart-detection clause: if the in-memory snapshot version resets to 0 (fresh process) but the session already has a persisted snapshot with version > 0 from a prior process lifetime, the snapshot is rebuilt instead of reused indefinitely.

It also adds a focused Vitest suite covering:

  • restart scenario (memory version 0 + persisted > 0 → rebuild)
  • normal reuse when versions match
  • rebuild when watcher bumps in-memory version above persisted
  • edge case when sessionEntry is missing but sessionStore contains the stale snapshot
  • fresh snapshot build when no prior snapshot exists.

These changes fit into the existing skills refresh model where getSkillsSnapshotVersion() is maintained in-memory (and resets on restart), while session snapshots are persisted in the session store.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk.
  • The change is narrowly scoped to the snapshot refresh decision logic, aligns with the described restart root cause, and is covered by targeted tests for restart, normal reuse, watcher refresh, and missing-sessionEntry edge cases. No additional call sites were affected beyond ensureSkillSnapshot, and the new logic deterministically rebuilds only when the persisted snapshot is known-stale relative to the process lifetime or version bump.
  • No files require special attention

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 9, 2026

Additional Comments (1)

src/auto-reply/reply/session-updates.ts
Missed refresh on first turn

shouldRefreshSnapshot is computed from nextEntry (initially sessionEntry), but in the isFirstTurnInSession block you may actually use current = sessionStore[sessionKey] when sessionEntry is undefined. In that case persistedVersion becomes 0 even if sessionStore already has a skillsSnapshot.version > 0 from a prior process, so the restart refresh case (snapshotVersion === 0 && persistedVersion > 0) won’t trigger and the stale snapshot can be reused on the first turn.

Consider computing persistedVersion from current (the entry you actually read from) or recomputing shouldRefreshSnapshot inside the first-turn branch after current is resolved.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/session-updates.ts
Line: 186:188

Comment:
**Missed refresh on first turn**

`shouldRefreshSnapshot` is computed from `nextEntry` (initially `sessionEntry`), but in the `isFirstTurnInSession` block you may actually use `current = sessionStore[sessionKey]` when `sessionEntry` is undefined. In that case `persistedVersion` becomes `0` even if `sessionStore` already has a `skillsSnapshot.version > 0` from a prior process, so the restart refresh case (`snapshotVersion === 0 && persistedVersion > 0`) won’t trigger and the stale snapshot can be reused on the first turn.

Consider computing `persistedVersion` from `current` (the entry you actually read from) or recomputing `shouldRefreshSnapshot` inside the first-turn branch after `current` is resolved.

How can I resolve this? If you propose a fix, please make it concise.

@mcaxtr
Copy link
Contributor Author

mcaxtr commented Feb 9, 2026

Addressed in c5fc94c — good catch on the semantic inconsistency.

persistedVersion now falls back to sessionStore[sessionKey]?.skillsSnapshot?.version when sessionEntry has no snapshot. This ensures shouldRefreshSnapshot is correct even when the stale snapshot only exists in the store, not in the direct sessionEntry parameter.

In practice, the behavior was already correct for all paths:

  • First turn: isFirstTurnInSession independently forces a rebuild regardless of shouldRefreshSnapshot
  • Non-first turn with undefined sessionEntry: the fallback at line 196 rebuilds via buildWorkspaceSkillSnapshot because nextEntry?.skillsSnapshot is also undefined

But the fix makes the code robust against future refactors that might remove those independent guards. Added a dedicated test for the edge case (test 5 in the suite).

@mcaxtr
Copy link
Contributor Author

mcaxtr commented Feb 9, 2026

@greptile review

1 similar comment
@mcaxtr
Copy link
Contributor Author

mcaxtr commented Feb 9, 2026

@greptile review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@mcaxtr
Copy link
Contributor Author

mcaxtr commented Feb 9, 2026

@greptile review

arun-dev-des added a commit to arun-dev-des/moltbot that referenced this pull request Feb 10, 2026
Fixes openclaw#13377

Sessions snapshot allowAgents permissions at creation time and never
refresh them, even after a full gateway restart. This adds version-
tracked allowAgents snapshots with dual-condition restart detection
(mirroring the skills snapshot pattern from PR openclaw#12209) so that
existing sessions pick up the current config on their next turn.
@mcaxtr mcaxtr force-pushed the fix/12092-skills-stale-session branch from 16b81b4 to 40c2467 Compare February 12, 2026 04:15
@mcaxtr mcaxtr force-pushed the fix/12092-skills-stale-session branch from 40c2467 to 02adee9 Compare February 13, 2026 02:20
@openclaw-barnacle openclaw-barnacle bot added the trusted-contributor Contributor with 4+ merged PRs label Feb 13, 2026
@mcaxtr mcaxtr force-pushed the fix/12092-skills-stale-session branch from 02adee9 to 4989f4a Compare February 13, 2026 14:35
@openclaw-barnacle openclaw-barnacle bot added the experienced-contributor Contributor with 10+ merged PRs label Feb 13, 2026
@mcaxtr mcaxtr force-pushed the fix/12092-skills-stale-session branch 9 times, most recently from e6ddb34 to 2f52958 Compare February 15, 2026 14:46
@mcaxtr mcaxtr force-pushed the fix/12092-skills-stale-session branch 2 times, most recently from b2ec290 to f562f13 Compare February 17, 2026 01:27
After a gateway restart, the in-memory skills version resets to 0 while
existing sessions retain snapshots with version > 0 from the prior process.
The shouldRefreshSnapshot check required snapshotVersion > 0, so it never
triggered a rebuild — leaving sessions with permanently stale skill lists.

Detect the restart scenario (in-memory version 0, persisted version > 0)
and rebuild the snapshot on the next message.

Fixes openclaw#12092
@mcaxtr mcaxtr force-pushed the fix/12092-skills-stale-session branch from f562f13 to d3d0a50 Compare February 19, 2026 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

experienced-contributor Contributor with 10+ merged PRs size: M trusted-contributor Contributor with 4+ merged PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skills added after session creation not visible until new session

1 participant