fix(controller): checkpoint the library DB WAL and shut down cleanly#829
Merged
Conversation
The library.db WAL sidecar could grow unbounded (730MB on a 303MB DB in #786): nothing ever ran a TRUNCATE checkpoint, the bulk tag/analyze CLIs exited via process.exit() without closing the DB, and the AIO supervisor tore the container down before node's (nonexistent) SIGTERM handler could run. Every later query walked that giant WAL on better-sqlite3's synchronous thread, stalling the whole event loop — sluggish admin pages at ~0% CPU. - library-db: set journal_size_limit (64 MiB) so checkpoints shrink the sidecar; TRUNCATE-checkpoint in close(); new checkpointWal() helper - library facade: checkpoint() + shutdown() passthroughs - scheduler: hourly best-effort WAL checkpoint in the cleanup job - tag-library/analyze-library CLIs: close the DB on every exit path - server: SIGTERM/SIGINT handler closes the DB; unhandledRejection now logs instead of crashing (the supervisor-restart 502s) - AIO supervisor: wait for children after kill -TERM 0 so node's shutdown handler actually runs before the namespace is torn down - subsonic: 30s per-request timeout (NAVIDROME_TIMEOUT_MS); /dj/recent album fan-out bounded to 5 concurrent calls - docs/template: recommend the direct pool path (/mnt/cache/...) over /mnt/user/... on Unraid — SQLite WAL over shfs/FUSE is a known-bad combo Fixes #786
This was referenced Jul 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #786 — slow admin UI on the Unraid AIO caused by an unbounded
library.db-wal(730MB on a 303MB DB) plus SQLite-over-FUSE on/mnt/userappdata paths.Root cause
library-db.tssetjournal_mode = WALat open and nothing else, ever:journal_size_limit, so once a bulk write pass (acoustic analysis writes fat*_jsonblobs per track) ballooned the WAL, it stayed at its high-water mark forever.npm run tag/npm run analyze) exit viaprocess.exit()without closing the DB, so the pass that grew the WAL never folded it back.kill -TERM 0; exit 0— as PID 1, its immediate exit tears the namespace down and SIGKILLs node before any shutdown work; andserver.tshad no SIGTERM handler anyway.Separately,
server.tshad nounhandledRejectionhandler (Node's default since v15 is to crash), so a stray rejection killed the controller and the supervisor bounced it — the random 502s. And Subsonic fetches had no timeout while/dj/recentfanned out ~21 parallel Navidrome calls — the "recent failed (500)".Changes
WAL management
journal_size_limit = 64 MiBat open, so any checkpoint truncates the sidecar back down.close()runs a best-effortwal_checkpoint(TRUNCATE)first (SQLite only auto-checkpoints on the last connection close, and controller/tagger/analyzer can hold the DB concurrently).checkpointWal()helper; the scheduler's hourly cleanup job now calls it, so the WAL can't balloon unbounded even mid-run.process.on('exit')hook that closes the DB on every exit path (better-sqlite3 close is synchronous, so this is safe).Clean shutdown
server.ts: SIGTERM/SIGINT handler closes the library DB before exiting.kill -TERM 0,waitfor the supervise loops + a 2s grace for their reparented children before PID 1 exits, so node's handler actually runs. Docker's stop timeout still hard-caps it.Resilience
unhandledRejectionnow logs loudly and continues instead of killing the controller.NAVIDROME_TIMEOUT_MSto override) with a readable error./dj/recent's album expansion is bounded to 5 concurrent Navidrome calls via the existingmapPoolutil.Docs / template (FUSE)
docs/unraid.md, the/setup/unraidpage, and the CA template description now recommend the direct pool path (/mnt/cache/appdata/subwave) over/mnt/user/...— SQLite WAL over Unraid's shfs/FUSE layer is a documented-bad combo. Template default stays/mnt/usersince/mnt/cacheonly exists when a pool is namedcache.Verified
npm run lintclean in bothcontroller/andweb/.wal_checkpoint(TRUNCATE)and the sidecar is removed on close.