Skip to content

fix(health): fix four DB-ordering and state-management bugs in health worker#344

Merged
javi11 merged 7 commits intomainfrom
fix/auth-disabled-files
Feb 26, 2026
Merged

fix(health): fix four DB-ordering and state-management bugs in health worker#344
javi11 merged 7 commits intomainfrom
fix/auth-disabled-files

Conversation

@javi11
Copy link
Copy Markdown
Owner

@javi11 javi11 commented Feb 26, 2026

Summary

  • Bug ci: publish multi‑arch manifests with buildx imagetools and lowercase image name #1 (CRITICAL): SetCorrupted overwritten by bulk update — triggerFileRepair wrote corrupted to the DB directly, then UpdateHealthStatusBulk ran with the pre-prepared repair_triggered status and silently overwrote it, leaving the file stuck in repair_triggered forever. Fixed by making triggerFileRepair return a repairOutcome enum; the sideEffect captures a pointer to the update and applies the outcome before the bulk write.
  • Bug feat: add documentation and multiple fix #2 (HIGH): repair_retry_count prematurely incremented on first trigger — both initial trigger and re-checks used the same stmtRepair (repair_retry_count + 1), burning one retry immediately. Fixed with new UpdateTypeRepairTrigger / stmtRepairTrigger that set repair_triggered without incrementing the counter.
  • Bug saving configuration #3 (HIGH): Repair notification loop silently deleted health records — CheckFile was called on files whose metadata had already been moved to corrupted_metadata/; ReadFileMetadata returned nilDeleteHealthRecord → record gone. Fixed with prepareRepairNotificationUpdate that re-triggers ARR directly via retriggerFileRepair, never touching the original metadata path.
  • Bug altmount completion bash Error #4 (MEDIUM): EventTypeFileRemoved unhandled in prepareUpdateForResult — fell through to the retry/repair path and tried to bulk-update an already-deleted record. Fixed with an explicit case that sets update.Skip = true. Also replaced _ = on DeleteHealthRecord in checker.go with proper error logging.

Root cause pattern

Bugs #1 and #2 share the same root: side effects (SetRepairTriggered, SetCorrupted) wrote to the DB inside goroutines before UpdateHealthStatusBulk, which then overwrote their results. The invariant is now enforced: only the bulk update owns all DB state writes.

Test plan

  • go test ./internal/health/... -v -run TestE2E — all existing E2E tests pass
  • go test ./internal/database/... -v — all DB tests pass
  • go build ./... — clean build
  • Manual: trigger a file with max_repair_retries=3, confirm repair_retry_count stays 0 after first trigger and only increments on re-notifications
  • Manual: trigger a file where ARR returns a generic error, confirm final DB status is corrupted (not repair_triggered)
  • Manual: confirm repair notification files are not deleted from the health DB after 1 hour

🤖 Generated with Claude Code

javi11 and others added 7 commits February 25, 2026 19:16
When auth.login_required=false the WebDAV handler was still
requiring a JWT cookie or Basic-Auth credentials, causing the
Files page to show "offline" and return 401 to every PROPFIND.

Fix: check configGetter for login_required at request time and
skip all authentication when it is false, granting anonymous
access so the file browser works without a login session.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
StreamHandler.authenticate() always required a download_key query param
and matched it against user API keys, so /api/files/stream returned 401
for all requests when auth.login_required=false (no users exist to match).

STRM generator also failed with "no admin user with API key found" when
login is not required, preventing STRM file creation entirely.

Fixes:
- StreamHandler: check configGetter at request time; return anonymous user
  when loginRequired=false so all stream requests are allowed
- strm_generator: when loginRequired=false generate URL without download_key
- setup.go: pass configManager.GetConfigGetter() to NewStreamHandler

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… worker

Bug #1 (CRITICAL): SetCorrupted was overwritten by bulk update when ARR
returned a generic error. triggerFileRepair called SetCorrupted directly,
then UpdateHealthStatusBulk ran with the pre-prepared repair_triggered
status and overwrote it. Fix: triggerFileRepair now returns a repairOutcome
enum instead of writing to the DB; the sideEffect closure captures a pointer
to the update and applies the outcome before the bulk write.

Bug #2 (HIGH): repair_retry_count was prematurely incremented on the very
first repair trigger because both initial trigger and re-check used the same
stmtRepair statement (repair_retry_count + 1). Fix: add UpdateTypeRepairTrigger
and stmtRepairTrigger that set repair_triggered status without incrementing the
counter; UpdateTypeRepairRetry (increment) is now only used for re-checks of
already-triggered files.

Bug #3 (HIGH): GetFilesForRepairNotification silently deleted health records
by running CheckFile on files whose metadata had already been moved to
corrupted_metadata/. ReadFileMetadata returned nil → DeleteHealthRecord →
health record gone. Fix: add prepareRepairNotificationUpdate that re-triggers
ARR directly (via retriggerFileRepair) instead of calling CheckFile, preserving
the health record throughout the repair lifecycle.

Bug #4 (MEDIUM): EventTypeFileRemoved was not handled in prepareUpdateForResult;
it fell through to the corrupted/retry path and attempted to bulk-update an
already-deleted record. Fix: add explicit EventTypeFileRemoved case that sets
update.Skip = true. Also remove the silent error ignore (_ =) on
DeleteHealthRecord in checker.go.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…V handlers

Remove the auth bypass that allowed unauthenticated access to the stream
API and WebDAV when LoginRequired was false. download_key is now always
required for streaming, and JWT/Basic Auth is always required for WebDAV.

Also clean up the now-unused configGetter field and parameter from
StreamHandler and its constructor chain.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Extract applyRepairOutcome to map repairOutcome → HealthStatusUpdate fields
- Extract resolvePathForRescan for LibraryPath/ImportDir/MountPath lookup
- Extract cleanupZombieRecord for health record + metadata deletion
- Merge duplicate ErrEpisodeAlreadySatisfied/ErrPathMatchFailed branches
- Fix indentation in prepareUpdateForResult
- Remove unused cfg field from repairTestEnv test struct

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@javi11 javi11 merged commit c67aff2 into main Feb 26, 2026
1 check passed
@javi11 javi11 deleted the fix/auth-disabled-files branch February 26, 2026 11:22
drondeseries referenced this pull request in drondeseries/altmount_old Apr 16, 2026
… worker (#344)

* fix(health): fix four DB-ordering and state-management bugs in health worker

Bug #1 (CRITICAL): SetCorrupted was overwritten by bulk update when ARR
returned a generic error. triggerFileRepair called SetCorrupted directly,
then UpdateHealthStatusBulk ran with the pre-prepared repair_triggered
status and overwrote it. Fix: triggerFileRepair now returns a repairOutcome
enum instead of writing to the DB; the sideEffect closure captures a pointer
to the update and applies the outcome before the bulk write.

Bug #2 (HIGH): repair_retry_count was prematurely incremented on the very
first repair trigger because both initial trigger and re-check used the same
stmtRepair statement (repair_retry_count + 1). Fix: add UpdateTypeRepairTrigger
and stmtRepairTrigger that set repair_triggered status without incrementing the
counter; UpdateTypeRepairRetry (increment) is now only used for re-checks of
already-triggered files.

Bug #3 (HIGH): GetFilesForRepairNotification silently deleted health records
by running CheckFile on files whose metadata had already been moved to
corrupted_metadata/. ReadFileMetadata returned nil → DeleteHealthRecord →
health record gone. Fix: add prepareRepairNotificationUpdate that re-triggers
ARR directly (via retriggerFileRepair) instead of calling CheckFile, preserving
the health record throughout the repair lifecycle.

Bug #4 (MEDIUM): EventTypeFileRemoved was not handled in prepareUpdateForResult;
it fell through to the corrupted/retry path and attempted to bulk-update an
already-deleted record. Fix: add explicit EventTypeFileRemoved case that sets
update.Skip = true. Also remove the silent error ignore (_ =) on
DeleteHealthRecord in checker.go.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant