Skip to content

fix(ilp): fix disk-full handling for SF buffer creation#25

Merged
bluestreak01 merged 7 commits into
mainfrom
mt_allocate-cross-platform
May 19, 2026
Merged

fix(ilp): fix disk-full handling for SF buffer creation#25
bluestreak01 merged 7 commits into
mainfrom
mt_allocate-cross-platform

Conversation

@mtopolnik

Copy link
Copy Markdown
Contributor

Summary

Splits the responsibilities of Files.openCleanRW and Files.allocate and unifies the allocate contract across Linux, macOS, and Windows.

The two functions previously overlapped: both advanced logical EOF, and the old allocate on POSIX simply called ftruncate (leaving the file sparse) while on macOS it also over-allocated by currentSize on non-empty files. The single production caller pattern -- openCleanRW(path, sz) then allocate(fd, sz) -- left a sparse file on POSIX and a correctly-reserved one on Windows. When the disk filled later, a producer thread storing into the mmap'd region would trigger a SIGBUS that aborts the JVM (Linux/macOS) or an in-page exception (Windows).

After:

  • openCleanRW(path) owns the file lifecycle: open RW, truncate to empty. No size parameter.
  • allocate(fd, size) owns "extend EOF and reserve real disk blocks for [0, target)." target = max(size, currentSize); the file never shrinks. ENOSPC / ERROR_DISK_FULL surface as a clean false return. Same observable behaviour on all three platforms.

End-user impact: SF buffer creation hitting a full disk now fails synchronously with a MmapSegmentException the sender can recover from, instead of crashing the JVM later.

Tradeoffs

  • API change: Files.openCleanRW and FilesFacade.openCleanRW lost their size parameter. Callers that need a sized file follow with Files.allocate (reserves blocks; fails on ENOSPC) or Files.truncate (sparse; faster). Two production call sites and several test sites updated.
  • On Linux/macOS, if the filesystem rejects posix_fallocate / F_PREALLOCATE, allocate falls back to ftruncate and leaves blocks sparse; the SIGBUS risk re-emerges on that filesystem only. Documented on Files.allocate's Javadoc. Windows has no equivalent fallback.
  • macOS allocate now passes target - currentSize to F_PREALLOCATE (the correct beyond-EOF length under F_PEOFPOSMODE) instead of the full target. Behaviour change on non-empty files: stops requesting duplicate allocation for the existing region.

Test plan

  • `mvn -pl core test` passes (2219 / 2219, 1 skipped, no failures)
  • FilesTest pins the no-shrink and zero-on-fresh-file invariants on `allocate`
  • MmapSegmentTest covers the full create path including fault-injected `openCleanRW` failure and post-`allocate` cleanup
  • AckWatermarkTest covers the wrong-/missing-size branch that now uses `openCleanRW + allocate`

mtopolnik and others added 6 commits May 14, 2026 15:29
openCleanRW(path, size) and allocate(fd, size) both advanced the
logical EOF, which became a problem once allocate gained a
never-shrinks contract: callers that did openCleanRW(path, sz) then
allocate(fd, sz) (the only production pattern) hit the
target == currentSize short-circuit and got no block reservation.
SIGBUS protection on mmap stores was silently disabled.

Clean split of responsibilities:

- openCleanRW(path): sole owner of file lifecycle -- create or
  truncate to empty, open RW. The size parameter is gone from the
  JNI, Java, and FilesFacade API.

- allocate(fd, size): sole owner of "extend EOF and reserve real
  blocks for [0, target)." Cross-platform contract:
    * Linux: posix_fallocate(fd, currentSize, target - currentSize),
      followed by ftruncate only on the sparse-fallback path.
    * macOS: fcntl(F_PREALLOCATE) passes newBytes (not the full
      target) to fst_length, fixing a long-standing over-allocation
      on non-empty files; ftruncate(fd, target) advances EOF.
    * Windows: FILE_ALLOCATION_INFO + FILE_END_OF_FILE_INFO,
      short-circuiting when the file is already at target.

Production callers updated:

- MmapSegment.create: openCleanRW(ptr) + allocate(fd, sizeBytes).
  Restores ENOSPC-at-create semantics for SF buffers.
- AckWatermark.open: openCleanRW(path) + allocate(fd, FILE_SIZE) on
  the wrong-/missing-size branch; the correct-size branch still uses
  openRW to preserve the previous session's watermark.

Tests updated for the new signatures; the full client test suite
passes (2219 / 2219).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
testRotationWakeupTriggersImmediateSparePrep used the same tight
timing budgets as testProducerWakeupBeatsThePollInterval, which was
already relaxed: two 500ms waitFor deadlines plus a 1000ms secondary
bound. On a loaded CI agent the producer-to-manager unpark hop plus
open + allocate + mmap can exceed 500ms even when the wakeup did fire,
making the test flaky for the same reason.

Raise both waitFor budgets to 2000ms and the secondary bound to
4000ms. These stay well below the 5s manager poll interval, so a pass
still proves the rotation-time wakeup fired rather than the poll tick,
while tolerating cross-thread scheduling jitter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AckWatermark.open() closed the fd but left the truncated file on
disk when Files.allocate() returned false. The FilesFacade.allocate
javadoc explicitly states callers must close the fd AND unlink the
partial file -- MmapSegment.create already follows that contract,
AckWatermark did not.

Self-healing on the next session via the size-mismatch branch
masked the leak in practice (FILE_SIZE is 16 bytes), but the
contract is documented in this same change set and the second
caller should honour it consistently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@bluestreak01 bluestreak01 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Level 2 review done. Pushed dcc8496 to honour the FilesFacade.allocate contract in AckWatermark.open() (close + unlink, matching MmapSegment.create).

@mtopolnik

Copy link
Copy Markdown
Contributor Author

[PR Coverage check]

😍 pass : 8 / 12 (66.67%)

file detail

path covered line new line coverage
🔵 io/questdb/client/std/DefaultFilesFacade.java 1 2 50.00%
🔵 io/questdb/client/cutlass/qwp/client/sf/cursor/AckWatermark.java 4 7 57.14%
🔵 io/questdb/client/std/Files.java 2 2 100.00%
🔵 io/questdb/client/cutlass/qwp/client/sf/cursor/MmapSegment.java 1 1 100.00%

@bluestreak01 bluestreak01 merged commit fcbea0b into main May 19, 2026
12 checks passed
@bluestreak01 bluestreak01 deleted the mt_allocate-cross-platform branch May 19, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants