Skip to content

Lite: fix compaction OOM by setting DuckDB temp_directory (#933)#935

Merged
erikdarlingdata merged 1 commit intodevfrom
feature/933-compaction-temp-directory
May 5, 2026
Merged

Lite: fix compaction OOM by setting DuckDB temp_directory (#933)#935
erikdarlingdata merged 1 commit intodevfrom
feature/933-compaction-temp-directory

Conversation

@erikdarlingdata
Copy link
Copy Markdown
Owner

Summary

  • The in-memory DuckDB connections used for parquet compaction had a 4 GB memory_limit pragma but no temp_directory, so the cap acted as a hard wall — DuckDB had nowhere to spill and OOM'd the moment it was hit.
  • Set temp_directory to <archive>/duckdb_tmp/ on both compaction connections (small-group and incremental pair-merge). Co-locating with the archive keeps spill writes on the same volume as the parquet files.

Verification

Tested end-to-end against 4 monitored SQL Servers under HammerDB load:

  • First 512 MB reset (single-file groups) — 26 groups compacted, no errors.
  • Second 512 MB reset (multi-file groups, the path that OOM'd in [BUG] Memory usage on client #933) — 21 groups merged with 2 source files each, completed in ~3.5s, duckdb_tmp/ cleaned up on connection close, no OOM.

Closes #933.

Test plan

  • Builds clean (0 errors / 0 warnings on incremental build)
  • First reset (single-file compaction) succeeds with new code path
  • Second reset (multi-file pair-merge) succeeds — the actual OOM-prone path from [BUG] Memory usage on client #933
  • duckdb_tmp/ directory created on first archive cycle and cleaned by DuckDB on connection close
  • No regression in collector health during archive cycles
  • Reporter (000al000) confirms fix on their 4-server environment

🤖 Generated with Claude Code

The in-memory DuckDB connections used for parquet compaction had a 4 GB
memory_limit pragma but no temp_directory, so the cap acted as a hard
wall — DuckDB had nowhere to spill and OOM'd the moment it was hit.

Co-locate the spill dir with the archive folder so the writes land on
the same volume as the parquet files. Verified end-to-end: 4-server
HammerDB load, second 512 MB reset triggered ArchiveAllAndResetAsync,
all 21 groups went through the multi-file pair-merge path with two
sources each, completed in ~3.5s with no OOM.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit e0cd22b into dev May 5, 2026
2 checks passed
MisterZeus pushed a commit to MisterZeus/PerformanceMonitor that referenced this pull request May 8, 2026
…don't OOM

erikdarlingdata#935 added temp_directory so DuckDB could spill, but on wider workloads
the working set still blew past the 4 GB cap before spill caught up
(reporter saw OOM at 3.7 GiB compacting 15 query_snapshots files).
Three knobs combined to feed that:

- memory_limit = 4 GB was too high — DuckDB held off spilling until late
- threads defaulted to N cores, multiplying per-thread row-group buffers
- ROW_GROUP_SIZE 122880 buffered up to 122k wide-VARCHAR rows per group

Drop memory_limit to 1 GB, cap threads to 2, and shrink ROW_GROUP_SIZE
to 8192. On 1.7 M rows of real query_stats data this drops peak working
set from 1236 MB → 166 MB (87% reduction) at a 31% wall-time cost.
Memory now plateaus instead of growing with row count, which is the
load-bearing change for issue erikdarlingdata#933.

Adds tools/CompactionRepro — a standalone reproducer that splits a real
monthly parquet file into N per-cycle-shaped chunks and runs the same
pair-merge logic with the tuning knobs exposed on the command line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant