fix(health): fix four DB-ordering and state-management bugs in health worker#344
Merged
fix(health): fix four DB-ordering and state-management bugs in health worker#344
Conversation
When auth.login_required=false the WebDAV handler was still requiring a JWT cookie or Basic-Auth credentials, causing the Files page to show "offline" and return 401 to every PROPFIND. Fix: check configGetter for login_required at request time and skip all authentication when it is false, granting anonymous access so the file browser works without a login session. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
StreamHandler.authenticate() always required a download_key query param and matched it against user API keys, so /api/files/stream returned 401 for all requests when auth.login_required=false (no users exist to match). STRM generator also failed with "no admin user with API key found" when login is not required, preventing STRM file creation entirely. Fixes: - StreamHandler: check configGetter at request time; return anonymous user when loginRequired=false so all stream requests are allowed - strm_generator: when loginRequired=false generate URL without download_key - setup.go: pass configManager.GetConfigGetter() to NewStreamHandler Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… worker Bug #1 (CRITICAL): SetCorrupted was overwritten by bulk update when ARR returned a generic error. triggerFileRepair called SetCorrupted directly, then UpdateHealthStatusBulk ran with the pre-prepared repair_triggered status and overwrote it. Fix: triggerFileRepair now returns a repairOutcome enum instead of writing to the DB; the sideEffect closure captures a pointer to the update and applies the outcome before the bulk write. Bug #2 (HIGH): repair_retry_count was prematurely incremented on the very first repair trigger because both initial trigger and re-check used the same stmtRepair statement (repair_retry_count + 1). Fix: add UpdateTypeRepairTrigger and stmtRepairTrigger that set repair_triggered status without incrementing the counter; UpdateTypeRepairRetry (increment) is now only used for re-checks of already-triggered files. Bug #3 (HIGH): GetFilesForRepairNotification silently deleted health records by running CheckFile on files whose metadata had already been moved to corrupted_metadata/. ReadFileMetadata returned nil → DeleteHealthRecord → health record gone. Fix: add prepareRepairNotificationUpdate that re-triggers ARR directly (via retriggerFileRepair) instead of calling CheckFile, preserving the health record throughout the repair lifecycle. Bug #4 (MEDIUM): EventTypeFileRemoved was not handled in prepareUpdateForResult; it fell through to the corrupted/retry path and attempted to bulk-update an already-deleted record. Fix: add explicit EventTypeFileRemoved case that sets update.Skip = true. Also remove the silent error ignore (_ =) on DeleteHealthRecord in checker.go. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…V handlers Remove the auth bypass that allowed unauthenticated access to the stream API and WebDAV when LoginRequired was false. download_key is now always required for streaming, and JWT/Basic Auth is always required for WebDAV. Also clean up the now-unused configGetter field and parameter from StreamHandler and its constructor chain. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract applyRepairOutcome to map repairOutcome → HealthStatusUpdate fields - Extract resolvePathForRescan for LibraryPath/ImportDir/MountPath lookup - Extract cleanupZombieRecord for health record + metadata deletion - Merge duplicate ErrEpisodeAlreadySatisfied/ErrPathMatchFailed branches - Fix indentation in prepareUpdateForResult - Remove unused cfg field from repairTestEnv test struct Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
drondeseries
referenced
this pull request
in drondeseries/altmount_old
Apr 16, 2026
… worker (#344) * fix(health): fix four DB-ordering and state-management bugs in health worker Bug #1 (CRITICAL): SetCorrupted was overwritten by bulk update when ARR returned a generic error. triggerFileRepair called SetCorrupted directly, then UpdateHealthStatusBulk ran with the pre-prepared repair_triggered status and overwrote it. Fix: triggerFileRepair now returns a repairOutcome enum instead of writing to the DB; the sideEffect closure captures a pointer to the update and applies the outcome before the bulk write. Bug #2 (HIGH): repair_retry_count was prematurely incremented on the very first repair trigger because both initial trigger and re-check used the same stmtRepair statement (repair_retry_count + 1). Fix: add UpdateTypeRepairTrigger and stmtRepairTrigger that set repair_triggered status without incrementing the counter; UpdateTypeRepairRetry (increment) is now only used for re-checks of already-triggered files. Bug #3 (HIGH): GetFilesForRepairNotification silently deleted health records by running CheckFile on files whose metadata had already been moved to corrupted_metadata/. ReadFileMetadata returned nil → DeleteHealthRecord → health record gone. Fix: add prepareRepairNotificationUpdate that re-triggers ARR directly (via retriggerFileRepair) instead of calling CheckFile, preserving the health record throughout the repair lifecycle. Bug #4 (MEDIUM): EventTypeFileRemoved was not handled in prepareUpdateForResult; it fell through to the corrupted/retry path and attempted to bulk-update an already-deleted record. Fix: add explicit EventTypeFileRemoved case that sets update.Skip = true. Also remove the silent error ignore (_ =) on DeleteHealthRecord in checker.go.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SetCorruptedoverwritten by bulk update —triggerFileRepairwrotecorruptedto the DB directly, thenUpdateHealthStatusBulkran with the pre-preparedrepair_triggeredstatus and silently overwrote it, leaving the file stuck inrepair_triggeredforever. Fixed by makingtriggerFileRepairreturn arepairOutcomeenum; the sideEffect captures a pointer to the update and applies the outcome before the bulk write.repair_retry_countprematurely incremented on first trigger — both initial trigger and re-checks used the samestmtRepair(repair_retry_count + 1), burning one retry immediately. Fixed with newUpdateTypeRepairTrigger/stmtRepairTriggerthat setrepair_triggeredwithout incrementing the counter.CheckFilewas called on files whose metadata had already been moved tocorrupted_metadata/;ReadFileMetadatareturnednil→DeleteHealthRecord→ record gone. Fixed withprepareRepairNotificationUpdatethat re-triggers ARR directly viaretriggerFileRepair, never touching the original metadata path.EventTypeFileRemovedunhandled inprepareUpdateForResult— fell through to the retry/repair path and tried to bulk-update an already-deleted record. Fixed with an explicit case that setsupdate.Skip = true. Also replaced_ =onDeleteHealthRecordinchecker.gowith proper error logging.Root cause pattern
Bugs #1 and #2 share the same root: side effects (
SetRepairTriggered,SetCorrupted) wrote to the DB inside goroutines beforeUpdateHealthStatusBulk, which then overwrote their results. The invariant is now enforced: only the bulk update owns all DB state writes.Test plan
go test ./internal/health/... -v -run TestE2E— all existing E2E tests passgo test ./internal/database/... -v— all DB tests passgo build ./...— clean buildmax_repair_retries=3, confirmrepair_retry_countstays0after first trigger and only increments on re-notificationscorrupted(notrepair_triggered)🤖 Generated with Claude Code