Skip to content

[r3.4] etl: munmap temp files in Dispose to prevent disk space leak#20440

Merged
AskAlexSharov merged 5 commits intorelease/3.4from
fix/etl-munmap-leak
Apr 10, 2026
Merged

[r3.4] etl: munmap temp files in Dispose to prevent disk space leak#20440
AskAlexSharov merged 5 commits intorelease/3.4from
fix/etl-munmap-leak

Conversation

@sudeepdino008
Copy link
Copy Markdown
Member

@sudeepdino008 sudeepdino008 commented Apr 9, 2026

ETL's zero-copy mmap optimization (March 2026) skipped munmap in Dispose(), keeping disk blocks allocated for deleted temp files until process exit. On long-running stage_exec, 111k+ deleted-but-mmap'd sortable-buf files accumulated 2.8TB of phantom disk space (df vs du gap), causing ENOSPC.

  • Add mmap.Munmap() in Dispose() before file close/delete
  • Remove defer c.Close() from Load() — callers already have defer collector.Close() (enforced by closeCollector ruleguard rule in rules.go) -- it can help avoid using bytes.Clone when some bytes are used temporarily after Load.
    • e.g: Move wal.Close() after buildFileRange in domain collation so zero-copy mmap slices stay valid during kvs sort/write — no bytes.Clone needed
  • analyzed the callsites and we should not have any munmapped bytes accessed; but if we do, it'll fail fast and report it.

Test plan

  • go test -short ./db/etl/ — all pass including TestSortable and TestReuseCollectorAfterLoad
  • make lint — 0 issues
  • make erigon integration — builds clean
  • Deploy and verify grep -c 'sortable-buf.*deleted' /proc/<pid>/maps stays near 0; no premature "out of disk" errors; no sigfaults due to munmap bytes access

@sudeepdino008 sudeepdino008 marked this pull request as draft April 9, 2026 11:52
@sudeepdino008
Copy link
Copy Markdown
Member Author

testing on chiado rn

@sudeepdino008 sudeepdino008 marked this pull request as ready for review April 9, 2026 12:54
@AskAlexSharov AskAlexSharov enabled auto-merge (squash) April 10, 2026 00:15
@AskAlexSharov AskAlexSharov merged commit 1660c99 into release/3.4 Apr 10, 2026
20 checks passed
@AskAlexSharov AskAlexSharov deleted the fix/etl-munmap-leak branch April 10, 2026 00:27
AskAlexSharov added a commit that referenced this pull request Apr 12, 2026
…20440)

ETL's zero-copy mmap optimization (March 2026) skipped `munmap` in
`Dispose()`, keeping disk blocks allocated for deleted temp files until
process exit. On long-running `stage_exec`, 111k+ deleted-but-mmap'd
sortable-buf files accumulated **2.8TB** of phantom disk space (`df` vs
`du` gap), causing ENOSPC.

- Add `mmap.Munmap()` in `Dispose()` before file close/delete
- Remove `defer c.Close()` from `Load()` — callers already have `defer
collector.Close()` (enforced by `closeCollector` ruleguard rule in
`rules.go`) -- it can help avoid using `bytes.Clone` when some bytes are
used temporarily after Load.
- e.g: Move `wal.Close()` after `buildFileRange` in domain collation so
zero-copy mmap slices stay valid during kvs sort/write — no
`bytes.Clone` needed
- analyzed the callsites and we should not have any munmapped bytes
accessed; but if we do, it'll fail fast and report it.

## Test plan
- [x] `go test -short ./db/etl/` — all pass including `TestSortable` and
`TestReuseCollectorAfterLoad`
- [x] `make lint` — 0 issues
- [x] `make erigon integration` — builds clean
- [x] Deploy and verify `grep -c 'sortable-buf.*deleted'
/proc/<pid>/maps` stays near 0; no premature "out of disk" errors; no
sigfaults due to munmap bytes access

---------

Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
github-merge-queue Bot pushed a commit that referenced this pull request Apr 13, 2026
Cherry-pick from `release/3.4` to `main`:

- #19677 agg: workers presets. ressplit workers
- #19919 Revert "flush: use etl.IdentityLoadFunc instead custom. part2"
- #19780 etl: zero-copy memDataProvider
- #19941 d_lru: disable for commitment
- #19942 TemporalMemBatch: re-use vals-slice when can
- #19996 etl: pool of bufwriter
- #19995 collate: replace bitmap by array
- #20002 skill creator review results
- #20033 seg: revert global limiter
- #20046 Caplin: prevent calling `glob` per file in
`BuildMissingIndices`
- #20113 seg: more usage of bufio
- #20194 execution/state: revert CodeSizePath in codeChange journal
entry
- #20431 remove `flush complete` log line
- #20440 etl: munmap temp files in Dispose to prevent disk space leak

93 commits skipped due to conflicts (branches diverged significantly).

---------

Co-authored-by: lystopad <oleksandr.lystopad@erigon.tech>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: moskud <sudeepdino008@gmail.com>
Co-authored-by: info@weblogix.biz <admin@10gbps.weblogix.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants