Skip to content

Fix DuckDB file corruption during maintenance (#218)#221

Merged
erikdarlingdata merged 1 commit intodevfrom
feature/fix-duckdb-reader-corruption
Feb 21, 2026
Merged

Fix DuckDB file corruption during maintenance (#218)#221
erikdarlingdata merged 1 commit intodevfrom
feature/fix-duckdb-reader-corruption

Conversation

@erikdarlingdata
Copy link
Owner

Summary

Fixes #218 — Lite app crashes with "Reached the end of the file" DuckDB errors after ~1 hour of uptime.

  • Root cause: Archival DELETEs + CHECKPOINT reorganized/truncated the DuckDB file while UI connections had stale file offsets
  • Fix: ReaderWriterLockSlim coordinates UI readers (read locks via LockedConnection wrapper) with maintenance writers (exclusive write locks on CHECKPOINT and archive DELETEs)
  • Maintenance overhaul: Replaced compaction cycle (File.Replace race condition) with archive-all-and-reset at 512MB threshold
  • Parquet naming: Per-reset timestamps (20260221_1925_table.parquet) — no merge logic, trivial 90-day retention by date prefix
  • Logging: Wired AppLoggerAdapter<T> for all 8 services that had null loggers
  • Removed: Dead CompactAsync method (~140 lines)

Test plan

  • Built clean, zero warnings
  • 4 SQL Servers under HammerDB load generating ~500MB/hour
  • 3 successful archive+reset cycles (515MB → 19MB in <1s each)
  • Archive views correctly query across hot table + multiple parquet file sets (legacy monthly + new timestamped)
  • Verified all view queries filter by server_id — no cross-server contamination
  • RetentionService parses both yyyyMMdd and yyyy-MM filename formats
  • Zero "end of file" errors during extended run

🤖 Generated with Claude Code

…ystem (#218)

Root cause: archival DELETEs + CHECKPOINT reorganized/truncated the DuckDB file
while UI connections had stale file offsets, causing "Reached the end of the file"
crashes after ~1 hour of uptime.

Fix: ReaderWriterLockSlim coordinates UI readers with maintenance writers. UI queries
hold read locks (unlimited concurrency) via LockedConnection wrapper. CHECKPOINT and
archive DELETEs hold exclusive write locks (<1s duration).

Maintenance overhaul:
- Replaced compaction cycle (File.Replace race condition) with archive-all-and-reset
- Size-based trigger at 512MB archives ALL data to parquet, deletes .duckdb, reinits
- Tested: 515MB → 19MB in <1s, 65K rows to ~400KB ZSTD parquet per cycle
- Per-reset timestamped parquet naming (20260221_1925_table.parquet) eliminates
  merge logic and simplifies 90-day retention (delete by date prefix)
- RetentionService handles both new timestamped and legacy monthly formats
- Wired AppLoggerAdapter<T> for all 8 services that had null loggers
- Removed dead CompactAsync method (~140 lines)

Tested with 4 SQL Servers under HammerDB load, 3 successful archive+reset cycles,
zero errors, archive views correctly serve data from all parquet file sets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit 3546ebc into dev Feb 21, 2026
3 checks passed
@erikdarlingdata erikdarlingdata deleted the feature/fix-duckdb-reader-corruption branch February 23, 2026 21:07
@erikdarlingdata erikdarlingdata mentioned this pull request Mar 3, 2026
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant