fix(health): fix four DB-ordering and state-management bugs in health worker by javi11 · Pull Request #344 · javi11/altmount

javi11 · 2026-02-26T10:40:44Z

Summary

Bug ci: publish multi‑arch manifests with buildx imagetools and lowercase image name #1 (CRITICAL): SetCorrupted overwritten by bulk update — triggerFileRepair wrote corrupted to the DB directly, then UpdateHealthStatusBulk ran with the pre-prepared repair_triggered status and silently overwrote it, leaving the file stuck in repair_triggered forever. Fixed by making triggerFileRepair return a repairOutcome enum; the sideEffect captures a pointer to the update and applies the outcome before the bulk write.
Bug feat: add documentation and multiple fix #2 (HIGH): repair_retry_count prematurely incremented on first trigger — both initial trigger and re-checks used the same stmtRepair (repair_retry_count + 1), burning one retry immediately. Fixed with new UpdateTypeRepairTrigger / stmtRepairTrigger that set repair_triggered without incrementing the counter.
Bug saving configuration #3 (HIGH): Repair notification loop silently deleted health records — CheckFile was called on files whose metadata had already been moved to corrupted_metadata/; ReadFileMetadata returned nil → DeleteHealthRecord → record gone. Fixed with prepareRepairNotificationUpdate that re-triggers ARR directly via retriggerFileRepair, never touching the original metadata path.
Bug altmount completion bash Error #4 (MEDIUM): EventTypeFileRemoved unhandled in prepareUpdateForResult — fell through to the retry/repair path and tried to bulk-update an already-deleted record. Fixed with an explicit case that sets update.Skip = true. Also replaced _ = on DeleteHealthRecord in checker.go with proper error logging.

Root cause pattern

Bugs #1 and #2 share the same root: side effects (SetRepairTriggered, SetCorrupted) wrote to the DB inside goroutines before UpdateHealthStatusBulk, which then overwrote their results. The invariant is now enforced: only the bulk update owns all DB state writes.

Test plan

go test ./internal/health/... -v -run TestE2E — all existing E2E tests pass
go test ./internal/database/... -v — all DB tests pass
go build ./... — clean build
Manual: trigger a file with max_repair_retries=3, confirm repair_retry_count stays 0 after first trigger and only increments on re-notifications
Manual: trigger a file where ARR returns a generic error, confirm final DB status is corrupted (not repair_triggered)
Manual: confirm repair notification files are not deleted from the health DB after 1 hour

🤖 Generated with Claude Code

When auth.login_required=false the WebDAV handler was still requiring a JWT cookie or Basic-Auth credentials, causing the Files page to show "offline" and return 401 to every PROPFIND. Fix: check configGetter for login_required at request time and skip all authentication when it is false, granting anonymous access so the file browser works without a login session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

StreamHandler.authenticate() always required a download_key query param and matched it against user API keys, so /api/files/stream returned 401 for all requests when auth.login_required=false (no users exist to match). STRM generator also failed with "no admin user with API key found" when login is not required, preventing STRM file creation entirely. Fixes: - StreamHandler: check configGetter at request time; return anonymous user when loginRequired=false so all stream requests are allowed - strm_generator: when loginRequired=false generate URL without download_key - setup.go: pass configManager.GetConfigGetter() to NewStreamHandler Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… worker Bug #1 (CRITICAL): SetCorrupted was overwritten by bulk update when ARR returned a generic error. triggerFileRepair called SetCorrupted directly, then UpdateHealthStatusBulk ran with the pre-prepared repair_triggered status and overwrote it. Fix: triggerFileRepair now returns a repairOutcome enum instead of writing to the DB; the sideEffect closure captures a pointer to the update and applies the outcome before the bulk write. Bug #2 (HIGH): repair_retry_count was prematurely incremented on the very first repair trigger because both initial trigger and re-check used the same stmtRepair statement (repair_retry_count + 1). Fix: add UpdateTypeRepairTrigger and stmtRepairTrigger that set repair_triggered status without incrementing the counter; UpdateTypeRepairRetry (increment) is now only used for re-checks of already-triggered files. Bug #3 (HIGH): GetFilesForRepairNotification silently deleted health records by running CheckFile on files whose metadata had already been moved to corrupted_metadata/. ReadFileMetadata returned nil → DeleteHealthRecord → health record gone. Fix: add prepareRepairNotificationUpdate that re-triggers ARR directly (via retriggerFileRepair) instead of calling CheckFile, preserving the health record throughout the repair lifecycle. Bug #4 (MEDIUM): EventTypeFileRemoved was not handled in prepareUpdateForResult; it fell through to the corrupted/retry path and attempted to bulk-update an already-deleted record. Fix: add explicit EventTypeFileRemoved case that sets update.Skip = true. Also remove the silent error ignore (_ =) on DeleteHealthRecord in checker.go. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…V handlers Remove the auth bypass that allowed unauthenticated access to the stream API and WebDAV when LoginRequired was false. download_key is now always required for streaming, and JWT/Basic Auth is always required for WebDAV. Also clean up the now-unused configGetter field and parameter from StreamHandler and its constructor chain. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Extract applyRepairOutcome to map repairOutcome → HealthStatusUpdate fields - Extract resolvePathForRescan for LibraryPath/ImportDir/MountPath lookup - Extract cleanupZombieRecord for health record + metadata deletion - Merge duplicate ErrEpisodeAlreadySatisfied/ErrPathMatchFailed branches - Fix indentation in prepareUpdateForResult - Remove unused cfg field from repairTestEnv test struct Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… worker (#344) * fix(health): fix four DB-ordering and state-management bugs in health worker Bug #1 (CRITICAL): SetCorrupted was overwritten by bulk update when ARR returned a generic error. triggerFileRepair called SetCorrupted directly, then UpdateHealthStatusBulk ran with the pre-prepared repair_triggered status and overwrote it. Fix: triggerFileRepair now returns a repairOutcome enum instead of writing to the DB; the sideEffect closure captures a pointer to the update and applies the outcome before the bulk write. Bug #2 (HIGH): repair_retry_count was prematurely incremented on the very first repair trigger because both initial trigger and re-check used the same stmtRepair statement (repair_retry_count + 1). Fix: add UpdateTypeRepairTrigger and stmtRepairTrigger that set repair_triggered status without incrementing the counter; UpdateTypeRepairRetry (increment) is now only used for re-checks of already-triggered files. Bug #3 (HIGH): GetFilesForRepairNotification silently deleted health records by running CheckFile on files whose metadata had already been moved to corrupted_metadata/. ReadFileMetadata returned nil → DeleteHealthRecord → health record gone. Fix: add prepareRepairNotificationUpdate that re-triggers ARR directly (via retriggerFileRepair) instead of calling CheckFile, preserving the health record throughout the repair lifecycle. Bug #4 (MEDIUM): EventTypeFileRemoved was not handled in prepareUpdateForResult; it fell through to the corrupted/retry path and attempted to bulk-update an already-deleted record. Fix: add explicit EventTypeFileRemoved case that sets update.Skip = true. Also remove the silent error ignore (_ =) on DeleteHealthRecord in checker.go.

javi11 and others added 7 commits February 25, 2026 19:16

refactor(health): remove explanatory comments from health worker

9a88673

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Discard changes to internal/importer/postprocessor/strm_generator.go

018ef03

javi11 merged commit c67aff2 into main Feb 26, 2026
1 check passed

javi11 deleted the fix/auth-disabled-files branch February 26, 2026 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(health): fix four DB-ordering and state-management bugs in health worker#344

fix(health): fix four DB-ordering and state-management bugs in health worker#344
javi11 merged 7 commits intomainfrom
fix/auth-disabled-files

javi11 commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

javi11 commented Feb 26, 2026

Summary

Root cause pattern

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant