Skip to content

v7.31.1 — fix saveBinaryBlob rename race

Choose a tag to compare

@dpsifr dpsifr released this 09 Jun 17:37
· 18 commits to main since this release

Fix

A same-key rename race in FileSystemStorage.saveBinaryBlob caused brain.flush() to throw ENOENT and broke downstream jobs (GCS backups, snapshot exports) for any consumer that triggers concurrent flush + compaction.

[job-queue] gcs-backup: failed — ENOENT: no such file or directory,
  rename '/data/brainy-data/.../_column_index/owner/DELETED.bin.tmp'
       -> '/data/brainy-data/.../_column_index/owner/DELETED.bin'

Root cause: saveBinaryBlob used a bare ${filePath}.tmp suffix. Two concurrent same-key calls computed the same temp path; both writeFiled, the first rename succeeded, the second rename fired against a missing temp and threw.

Fix: unique per-writer temp suffix (matches the pattern at every other atomic-write site in the same file) + defensive ENOENT swallow on rename + temp cleanup on any other failure.

Audit

One bug site, scoped audit confirms no other similar patterns:

  • FileSystemStorage — six sibling atomic-write sites already used unique suffixes. Only saveBinaryBlob was the outlier. All clean now.
  • OPFSStorage — WritableStream (no tmp+rename).
  • Object-store adapters (GCS / R2 / Azure / S3) — PUT is atomic.
  • MemoryStorage / HistoricalStorageAdapter — not affected.
  • COW / versioning / HNSW / aggregation / snapshot — all delegate to storage adapters; they get the fix automatically.

Beneficiaries beyond the reported bug

HNSW connection persistence (hnswIndex.ts:252) also writes via saveBinaryBlob and was structurally susceptible to the same race. No production reports of HNSW failures (probably because the HNSW write path is more naturally serialized by the index lock), but the fix removes the latent issue.

Tests

  • New tests/integration/savebinaryblob-concurrent-rename.test.ts (4 tests)
  • 1468 / 1468 unit suite passing

See RELEASES.md for the full entry.