fix: clear cron session in DB when not cached (cold-cache after restart)#424
fix: clear cron session in DB when not cached (cold-cache after restart)#424
Conversation
Fixes #365 PGSessionStore.Reset() only cleared the in-memory cache. After a server restart the cache is empty, so Reset was a no-op — the next GetOrCreate loaded the full accumulated history from DB, causing LLM tool loops from contradictory context. Add a DB fallback: when the session isn't in cache, issue a direct UPDATE to clear messages and summary in PostgreSQL. This ensures cron sessions always start clean regardless of server restarts. The #294 fix (Reset+Save before each cron run) already had the right intent but only worked within the same server lifetime.
|
@claude review |
|
Claude finished @mrgoonie's task in 1m 36s —— View job PR Review: fix/cron-session-reset-cold-cacheTasks:
SummaryThe fix is correct and addresses the root cause cleanly. Two issues worth addressing before merge, one of them a real concurrency bug. Issues🔴 Lock held during DB call (
|
ExecContext was running under s.mu.Lock(), blocking all session cache operations during DB round-trip. Release the lock before the DB call. Also log ExecContext errors instead of silently discarding them — silent failure defeats the purpose of the cold-cache fix.
…n reliability, storage Merged 42 upstream commits from nextlevelbuilder/goclaw origin/main. Key upstream changes included: - feat(config): system_configs DB table with per-tenant isolation + System Settings modal - fix: cron reliability — clear session in DB on cold-cache restart (nextlevelbuilder#294, nextlevelbuilder#424) - refactor(security): remove cross-tenant bypass, enforce strict tenant isolation - fix(agent): set TenantID on streaming events in runLoop emitRun - fix: add panic recovery to tool, cron, and summarization goroutines - fix: check errors in cron store Scan/Unmarshal to prevent data corruption - feat(teams): smart post-turn task decision + stale detection - fix(skills): unblock agent read_file on skill paths - feat(storage): bump upload limit to 50MB, add file upload to Team Workspace - feat(gateway): version update checker - fix(channels): propagate tenant_id to pending history Conflict resolutions: - gateway_cron.go: adopt upstream sessionMgr.Reset()+Save() pattern; revert Save() to UPDATE-only - sessions_list.go: revert UPSERT → UPDATE-only (upstream fix covers cold-cache via explicit Save after Reset) - upgrade/version.go: bump RequiredSchemaVersion 32→33 (adds upstream migration 33_system_configs) - migrations: renumber upstream 000029_system_configs → 000033_system_configs (avoids conflict with local 000029_kg_entity_embeddings) - gateway.go: keep local projects wiring + File ACL MediaServeHandler; drop WithCrossTenant on LoadAll per security refactor - gateway_http_handlers.go: keep local wireHTTP signature with token + projectsH return - media_serve.go: keep local File ACL implementation (scoped token, rate limiting) - i18n/index.ts: union both namespace sets; add upstream "system-settings" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes #365
Problem
Cron-triggered agents get stuck in tool call loops after server restarts. The LLM sees contradictory history from previous runs and enters a loop trying to reconcile.
Root cause:
PGSessionStore.Reset()only cleared the in-memory cache. After a restart the cache is empty, soResetwas a silent no-op. The subsequentGetOrCreateloaded the full accumulated history from DB — defeating the reset added in #294.Flow (before fix)
Fix
When the session isn't in cache,
Resetnow issues a direct DB UPDATE to clear messages and summary:Warm-cache path is unchanged (early return before DB code).
Impact
All 3 callers of
Reset()benefit:gateway_cron.go) — target fixgateway_consumer_handlers.go) — same improvementsessions.go) — harmless, UI sessions are always warmTest plan
go build ./...passesgo vet ./...passesgo test -race ./internal/store/pg/— all passTestReset_ColdCache_FallsBackToDB— proves DB fallback is reachedTestReset_WarmCache_ClearsHistory— warm path unchangedTestSave_ColdCache_IsNoOp— acceptable after fixTestResetAfterGetOrCreate_FixVerification— alternative fix path works