Skip to content

feat: add SQLite persistence for sidecar task results#45

Merged
bdchatham merged 4 commits intomainfrom
feat/sqlite-result-store
Mar 30, 2026
Merged

feat: add SQLite persistence for sidecar task results#45
bdchatham merged 4 commits intomainfrom
feat/sqlite-result-store

Conversation

@bdchatham
Copy link
Copy Markdown
Contributor

@bdchatham bdchatham commented Mar 29, 2026

Summary

  • Replace the in-memory 10-entry ring buffer with a SQLite-backed ResultStore so task results survive pod restarts
  • Uses modernc.org/sqlite (pure Go, no CGo) — zero changes to the Dockerfile, GoReleaser, or CI pipelines
  • DB file lives at {homeDir}/sidecar.db on the existing PVC — no new Kubernetes volumes needed
  • Cloud-agnostic: identical deployment across AWS, GCP, and bare metal

Details

  • New ResultStore interface (Save, Get, List, Delete, Close) in sidecar/engine/store.go
  • SQLiteStore implementation with WAL mode, busy_timeout=5000, synchronous=NORMAL, MaxOpenConns=1
  • Schema migrations via PRAGMA user_version at startup — no migration framework
  • Engine refactored to use the store instead of ring buffer; mutex not held during store I/O on read paths
  • Marker files (.sei-sidecar-*) intentionally left untouched — separate failure domain
  • 9 store-level tests using :memory: SQLite (no disk artifacts)
  • All existing engine and server tests updated and passing

Test plan

  • CGO_ENABLED=0 go build . — confirms pure-Go SQLite works without CGo
  • go test ./sidecar/... — all existing + new tests pass
  • gofmt -s — no formatting issues
  • docker build — verify distroless image builds cleanly in CI
  • Manual: seictl serve, POST a task, kill process, restart, GET task by ID — result persists

Replace the in-memory 10-entry ring buffer with a SQLite-backed
ResultStore so task results survive pod restarts. Uses modernc.org/sqlite
(pure Go, no CGo) — zero changes to the build pipeline, Dockerfile, or
GoReleaser config.

Key decisions:
- DB file at {homeDir}/sidecar.db on the existing PVC
- WAL mode, busy_timeout=5000, synchronous=NORMAL, MaxOpenConns=1
- Schema migrations via PRAGMA user_version at startup
- Marker files (.sei-sidecar-*) left untouched
- Cloud-agnostic: identical deployment on AWS/GCP/bare metal

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
}

func scanInto(s scanner) (*TaskResult, error) {
var (
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a type and not make this anonymous.

- Add doc comment explaining :memory: DSN on NewMemoryStore
- Enforce single file-backed store per process via mutex guard
- Move migrations to sqlite_migrations.go
- Promote scanner to named RowScanner type in store.go
- Reorganize: store.go (types+interface), sqlite_store.go (impl),
  sqlite_migrations.go (schema)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bdchatham and others added 2 commits March 30, 2026 11:34
Address remaining PR comments:

- Store is now the single source of truth for all task lifecycle states.
  Tasks are saved with status=running on submit and updated on completion.
- Remove active and scheduled in-memory maps. Only cancel funcs
  (non-serializable) and the scheduled-task overlap guard remain in memory.
- Remove maxResults cap — RecentResults now returns up to 100 from store.
  The artificial ring buffer limit no longer applies.
- Add ListScheduled(now) to ResultStore for efficient due-task queries.
  EvalSchedules queries the store directly instead of scanning an
  in-memory map.
- Add partial index on (next_run_at) WHERE schedule IS NOT NULL.
- GetResult and RecentResults are now simple store queries with no
  lock contention, which addresses the lock change question.
- Add TestStoreListScheduled covering due/not-due/one-shot filtering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three end-to-end tests using real file-backed SQLite databases in
t.TempDir (auto-cleaned, including WAL/SHM files):

- TestE2E_TaskLifecycle: Full lifecycle across submit, fail, dedup,
  mark-ready, scheduled tasks, removal, and simulated process restart
  (close store, reopen same DB, verify persistence and dedup).
- TestE2E_ScheduledTaskExecution: Scheduled task fires via EvalSchedules,
  verifies completion status and NextRunAt advancement in the store.
- TestE2E_ConcurrentSubmit: 10 concurrent task submissions against a
  file-backed store, verifying no races or data loss.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham marked this pull request as ready for review March 30, 2026 19:28
@bdchatham bdchatham merged commit ee76fe1 into main Mar 30, 2026
2 checks passed
@bdchatham bdchatham deleted the feat/sqlite-result-store branch March 30, 2026 19:28
bdchatham added a commit that referenced this pull request Mar 30, 2026
## Summary

Builds on #45 and addresses all findings from the Kubernetes and
Platform specialist reviews.

### Must-fix (4 items)
- **Goroutine drain on shutdown**: `Engine.Wait()` + `sync.WaitGroup`
ensures all in-flight task goroutines complete before `store.Close()`.
Wired into `serve.go` between server stop and DB close.
- **Stale task recovery**: `RecoverStaleTasks()` marks one-shot tasks
left as `running` from a previous crash as `failed` on startup.
Scheduled tasks are preserved for re-evaluation.
- **Transactional migrations**: v1 migration DDL + `PRAGMA user_version`
wrapped in `BEGIN`/`COMMIT` for atomicity.
- **UTC timestamps**: All `time.Now()` and format calls normalized to
`.UTC()` to prevent incorrect string-based comparisons in SQLite.

### Should-fix (3 items)
- **RemoveResult race fix**: Completion goroutine checks if its cancel
func is still active before saving. Skips save if `RemoveResult` already
cleaned it up.
- **Singleton guard removed**: Dropped the `storeMu`/`storeCreated`
guard on `NewSQLiteStore`. Rely on `serve.go` wiring.
- **PVC constraint documented**: `NewSQLiteStore` doc now warns that WAL
mode requires block-device-backed storage (not NFS).

### Test improvements
- `TestE2E_StaleTaskRecovery` — verifies crash recovery behavior
- `TestE2E_ShutdownDrainsGoroutines` — verifies `Wait()` drains before
close
- `TestE2E_ConcurrentSubmit` — replaced `time.Sleep` with
`sync.WaitGroup`

## Test plan
- [x] `go test ./sidecar/...` — all tests pass
- [x] `CGO_ENABLED=0 go build .` — binary builds
- [x] `gofmt -s` — clean

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant