You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DuckDB's storage engine is append-oriented. When ArchiveService DELETEs old rows after exporting to parquet, the space is marked free but never reclaimed. VACUUM does not fix append-fragmented space. Over time, the database file grows monotonically — we observed 3.8GB for just 35MB of real data (110x bloat), causing 3-7 second collector times and 50-60% CPU.
The only fix is export/reimport (compaction), which we did manually to go from 3.8GB → 323MB.
Proposed solution
Add automated daily compaction to prevent bloat from recurring:
Daily compaction — after the hourly archive cycle, once per day:
Pause collection
CHECKPOINT (flush WAL)
Export all tables to a temp DuckDB file
Close connections, swap files (old → .bak, temp → primary)
Reopen connections, resume collection
Delete .bak
Size watchdog — log a warning if database file exceeds a threshold (e.g., 1GB) between compaction cycles, so runaway bloat is caught early.
Context
Compaction takes ~2-5 seconds for a 300MB database
Collection gap is negligible (at most one 1-minute cycle missed)
VACUUM is not an alternative — it does not reclaim append-fragmented space in DuckDB
Problem
DuckDB's storage engine is append-oriented. When ArchiveService DELETEs old rows after exporting to parquet, the space is marked free but never reclaimed.
VACUUMdoes not fix append-fragmented space. Over time, the database file grows monotonically — we observed 3.8GB for just 35MB of real data (110x bloat), causing 3-7 second collector times and 50-60% CPU.The only fix is export/reimport (compaction), which we did manually to go from 3.8GB → 323MB.
Proposed solution
Add automated daily compaction to prevent bloat from recurring:
Daily compaction — after the hourly archive cycle, once per day:
Size watchdog — log a warning if database file exceeds a threshold (e.g., 1GB) between compaction cycles, so runaway bloat is caught early.
Context
VACUUMis not an alternative — it does not reclaim append-fragmented space in DuckDBcheckpoint_threshold=1GBsetting (PR DuckDB checkpoint optimization and timing fix #159) prevents checkpoint stalls but does not address long-term file growthFiles involved
Lite/Database/DuckDbInitializer.cs— compaction logic (owns file path and connection string)Lite/Services/CollectionBackgroundService.cs— daily compaction timer alongside existing archive/retention timers