Skip to content

fix(integration): categorize HealthErrorType via explicit map, panic on missing (Phase 2.3: T1)#190

Merged
mescon merged 1 commit into
mainfrom
fix/health-error-type-category-map
May 22, 2026
Merged

fix(integration): categorize HealthErrorType via explicit map, panic on missing (Phase 2.3: T1)#190
mescon merged 1 commit into
mainfrom
fix/health-error-type-category-map

Conversation

@mescon
Copy link
Copy Markdown
Owner

@mescon mescon commented May 21, 2026

Closes audit finding T1 — the first PR in Phase 2 (Foundational types). With Phase 1 complete (#189), this kicks off the type-design work.

The bug

IsRecoverable and IsTrueCorruption used switch statements with implicit default: false. When a new ErrorType constant was added (a new corruption detection method, a new failure mode), the new type silently classified as neither recoverable nor true-corruption — meaning the remediation pipeline silently bypassed it entirely.

// Before
func (e *HealthCheckError) IsRecoverable() bool {
    switch e.Type {
    case ErrorTypeAccessDenied, ErrorTypePathNotFound, ...:
        return true
    default:
        return false  // ← new error types fall here
    }
}

A scan would hit the new error type, classify as neither, log nothing about the gap, and pretend the file was fine. Operator never knows; CI never catches it.

The fix

type ErrorCategory int
const (
    CategoryUnknown ErrorCategory = iota
    CategoryRecoverable
    CategoryTrueCorruption
)

var errorCategories = map[string]ErrorCategory{
    ErrorTypeZeroByte:      CategoryTrueCorruption,
    ErrorTypeCorruptHeader: CategoryTrueCorruption,
    // ... every constant must appear here
    ErrorTypeMountLost:     CategoryRecoverable,
}

func (e *HealthCheckError) category() ErrorCategory {
    if cat, ok := errorCategories[e.Type]; ok {
        return cat
    }
    if testing.Testing() {
        panic("...unregistered error type...")  // CI catches the gap
    }
    logger.Errorf("...")                          // operator sees in prod
    return CategoryRecoverable                     // conservative: retry, don't delete
}

IsRecoverable and IsTrueCorruption now both delegate to category().

Why this design

Property Before After
New error type without remediation logic Silently ignored Panics in CI tests
Production hit with unregistered type Silently ignored Errorf log + treat as Recoverable
Conservative fallback (don't delete user data on unknown error) Yes (vacuously) Yes (explicit)
Code repetition between IsRecoverable / IsTrueCorruption Two parallel switches One map, two delegators

The conservative fallback choice — unknown → Recoverable rather than → TrueCorruption — matters: IsTrueCorruption=true triggers the file-deletion-and-redownload pipeline. Defaulting unknown errors to "don't delete the user's data" is the safe failure mode.

Tests

  • TestHealthCheckError_IsRecoverable / _IsTrueCorruption: existing per-type tables continue to pass (every registered type categorizes the same way as before).
  • TestHealthCheckError_UnregisteredTypePanics (new): creates a HealthCheckError{Type: "ThisErrorTypeIsNotRegistered"}, calls IsRecoverable(), asserts panic with "unregistered error type" in the message. This is the regression guard.
  • TestHealthCheckError_CategoryConsistency (new): asserts every errorCategories entry has a non-CategoryUnknown value. CategoryUnknown is only meant to be returned for unregistered lookups; registering an error type as CategoryUnknown would defeat the purpose.

Backward compatibility

Zero behavior change for any existing ErrorType — every registered type categorizes the same way it did before. The only behavior change is for unregistered types, which previously fell through silently and now either panic (tests) or log + retry (prod).

Test plan

  • go build ./... — clean
  • golangci-lint run ./... — 0 issues
  • go test ./internal/integration/... -run TestHealthCheckError — 6 subtests + 2 new top-level tests, all pass
  • go test ./... — full suite passes

Context

Phase 1 complete (last item was #189 — webhook secret separation). Phase 2 (Foundational types) started. Phase 2 has 4 PRs total:

  • ✅ 2.3 Categorized HealthErrorType (this PR)
  • 2.1 Typed enum sweep coordinated Go+TS (M)
  • 2.2 Typed event envelope (M)
  • 2.4 ScanProgress encapsulation (S)

Summary by CodeRabbit

  • Tests

    • Added validation tests to ensure error classifications remain consistent and complete.
  • Improvements

    • Enhanced error categorization system with recoverable and corruption error types for more precise error routing.

Review Change Stack

…on missing

Closes audit finding T1. IsRecoverable and IsTrueCorruption used switch
statements with implicit "default: false" — when a new ErrorType
constant was added without updating both switches, the new type
silently classified as neither recoverable nor true-corruption, making
it invisible to the remediation pipeline. New errors would just be
ignored, with no signal to the operator or developer.

Replace with an authoritative errorCategories map keyed by ErrorType
string:

	var errorCategories = map[string]ErrorCategory{
		ErrorTypeZeroByte:      CategoryTrueCorruption,
		// ... every constant must appear here
		ErrorTypeMountLost:     CategoryRecoverable,
	}

IsRecoverable and IsTrueCorruption now delegate to a category()
helper that consults the map. On lookup miss:

- testing.Testing() == true → panic with a message naming the
  unregistered type and pointing at the source file to update. This
  is the property that closes T1: a new ErrorType added without map
  entry will fail every test that exercises remediation routing,
  caught at CI rather than in production.

- production binary → log Errorf and fall back to CategoryRecoverable
  (the conservative choice — retry rather than delete user data).
  Unknown errors no longer silently bypass remediation; they're
  treated as transient infrastructure issues with operator-visible
  logging.

Tests:
- Existing TestHealthCheckError_IsRecoverable / _IsTrueCorruption
  test tables continue to pass unchanged (registered types still
  classify the same way).
- New TestHealthCheckError_UnregisteredTypePanics asserts the panic
  message contains "unregistered error type" so future refactors
  can't accidentally weaken the regression guard.
- New TestHealthCheckError_CategoryConsistency asserts every entry
  in errorCategories has a non-Unknown category (a CategoryUnknown
  registration would defeat the purpose).

This is the first PR in Phase 2 (Foundational types) of the
remediation plan. Phase 1 (Critical correctness — security & silent
failures) is complete as of PR #189.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

This PR introduces an ErrorCategory classification system for health check errors. A new exported type and three category constants categorize error types for remediation routing. A centralized category() method resolves the category for any error, with tests ensuring all error types are registered and valid.

Changes

Error Categorization System

Layer / File(s) Summary
ErrorCategory type, constants, and registration map
internal/integration/interfaces.go
Import statements updated to include fmt, testing, and logger. ErrorCategory int type introduced with CategoryUnknown, CategoryRecoverable, and CategoryTrueCorruption exported constants. errorCategories map registers all existing ErrorType* constants to their assigned categories.
Category resolution method and public method refactoring
internal/integration/interfaces.go
New (*HealthCheckError).category() method provides centralized category lookup: in test binaries it panics if the error type is missing, while in production it logs an error and returns CategoryRecoverable conservatively. IsRecoverable() and IsTrueCorruption() are refactored to derive results from category() instead of inline switch statements.
Category registration and consistency tests
internal/integration/health_checker_test.go
Two new tests verify category system integrity: TestHealthCheckError_UnregisteredTypePanics confirms unregistered error types trigger a panic during test execution, and TestHealthCheckError_CategoryConsistency iterates all errorCategories entries to assert none are CategoryUnknown and all are limited to recoverable or true-corruption categories.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through error lanes,
Categories mapped, no type remains,
Tests catch the stragglers that try to hide,
Logic simplified, nowhere to bide—
Remedies routed with newfound grace!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: introducing an explicit map-based categorization system for HealthErrorTypes with panic on missing entries during testing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/health-error-type-category-map

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/integration/health_checker_test.go`:
- Around line 95-108: TestHealthCheckError_CategoryConsistency only validates
keys present in errorCategories and misses any newly declared ErrorType that
isn't registered; add an authoritative list of all declared ErrorType values
(e.g., a new slice AllErrorTypes or reuse an existing constant list of ErrorType
values) and update TestHealthCheckError_CategoryConsistency to iterate over that
list, asserting that each ErrorType has an entry in errorCategories and that its
category is not CategoryUnknown and is either CategoryRecoverable or
CategoryTrueCorruption; reference the existing symbols errorCategories,
CategoryUnknown, CategoryRecoverable, CategoryTrueCorruption, and
TestHealthCheckError_CategoryConsistency when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 96ee107a-f250-4681-a800-d576aa16d14b

📥 Commits

Reviewing files that changed from the base of the PR and between bca5ff7 and 17d40f7.

📒 Files selected for processing (2)
  • internal/integration/health_checker_test.go
  • internal/integration/interfaces.go

Comment on lines +95 to +108
// TestHealthCheckError_CategoryConsistency asserts every error type registered
// in errorCategories has a NON-Unknown category. Registering with
// CategoryUnknown defeats the purpose — that value is only returned when an
// unregistered type is looked up.
func TestHealthCheckError_CategoryConsistency(t *testing.T) {
for errType, cat := range errorCategories {
if cat == CategoryUnknown {
t.Errorf("error type %q registered as CategoryUnknown; use Recoverable or TrueCorruption", errType)
}
if cat != CategoryRecoverable && cat != CategoryTrueCorruption {
t.Errorf("error type %q has unknown category value %d", errType, cat)
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This test still won't catch a newly added ErrorType that's missing from the map.

It only validates entries that already exist in errorCategories. If someone adds ErrorTypeFoo and forgets the registration, CI stays green unless some other test happens to reference it, so T1 is still only partially covered. Use a single authoritative list of all declared error types and assert that every one has a map entry.

As per coding guidelines, **/*_test.go: Focus on: Test coverage for edge cases.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/integration/health_checker_test.go` around lines 95 - 108,
TestHealthCheckError_CategoryConsistency only validates keys present in
errorCategories and misses any newly declared ErrorType that isn't registered;
add an authoritative list of all declared ErrorType values (e.g., a new slice
AllErrorTypes or reuse an existing constant list of ErrorType values) and update
TestHealthCheckError_CategoryConsistency to iterate over that list, asserting
that each ErrorType has an entry in errorCategories and that its category is not
CategoryUnknown and is either CategoryRecoverable or CategoryTrueCorruption;
reference the existing symbols errorCategories, CategoryUnknown,
CategoryRecoverable, CategoryTrueCorruption, and
TestHealthCheckError_CategoryConsistency when making the change.

@mescon mescon merged commit fc05469 into main May 22, 2026
8 of 9 checks passed
@mescon mescon deleted the fix/health-error-type-category-map branch May 22, 2026 07:12
mescon added a commit that referenced this pull request May 22, 2026
…ive mutex state (#191)

Closes audit finding T4. The Scanner public surface returned []ScanProgress
to handlers, which then JSON-marshaled the values. ScanProgress embeds a
sync.Mutex; reading exported fields without holding the mutex races with
writers (notably markFileProcessed bumping FilesDone while
emitProgress/GetActiveScans is publishing). The previous workaround in
GetActiveScans manually copied fields under the per-scan lock and silenced
the resulting copylocks warning with //nolint:govet — the type itself was
still wrong, the linter was just being told to stop noticing.

Changes:

1. New ScanProgressView value type — fields only, no mutex, JSON tags
   on the right type. This is what the public API now returns.

2. New (*ScanProgress).Snapshot() ScanProgressView method — takes the
   per-scan lock, copies fields atomically, returns by value. This is
   the only sanctioned way to read ScanProgress fields from code that
   doesn't already hold the mutex.

3. Scanner.GetActiveScans() returns []ScanProgressView (was []ScanProgress).
   Implementation now calls Snapshot() on each entry rather than the
   nolint:govet-suppressed manual copy.

4. emitProgress uses Snapshot() to build event data instead of the
   inline mu.Lock/Unlock block.

5. ScanProgress's exported fields are now json:"-" since marshaling
   should go through ScanProgressView. Internal field access stays
   (writers already hold the mutex correctly).

6. internal/testutil renames its local ScanProgress mirror to
   ScanProgressView — same fields, just matches the new public name.
   Mock and test scanner implementations updated accordingly.

The nolint:govet comment is gone; the race-prone pattern is no longer
even expressible at the public boundary.

Tests: existing scanner + handler tests pass unchanged. No behavior
change — just type-level enforcement that no race-prone field reads
escape across the service boundary.

Phase 2 of the remediation plan, item 2.4. Phase 2 progress:
- ✅ 2.3 (HealthErrorType category map, #190 just merged)
- ✅ 2.4 (this PR)
- 2.1 typed enum sweep (M, coordinated Go+TS)
- 2.2 typed event envelope (M)

Co-authored-by: mescon <mescon@users.noreply.github.com>
mescon added a commit that referenced this pull request May 22, 2026
…ation (#192)

Closes audit finding T2 (ArrType drift between Go and TS) on the Go side.
Previously every site that compared "instance.Type == ArrTypeSonarr" was
working with a bare string field; misspelled strings could be inserted via
SQL or API without compiler complaint, and switch statements on unknown
types silently fell through.

Changes:

1. New `type ArrType string` in integration package. Existing constants
   are now typed: `const ArrTypeSonarr ArrType = "sonarr"` etc.

2. ParseArrType(s string) (ArrType, error) — the only sanctioned way to
   convert a raw string to ArrType. Used at API write boundaries.

3. (*ArrType).Scan(value any) error — implements sql.Scanner so
   `rows.Scan(&i.Type)` validates at row read time. Unknown DB values
   produce errors rather than propagating silently.

4. ArrType.Value() driver.Value — implements driver.Valuer for symmetric
   parameter passing.

5. ArrInstanceInfo.Type and the internal ArrInstance.Type are now
   typed ArrType. All ~10 comparison and switch sites switch from
   string-vs-string to ArrType-vs-ArrType — compiler enforces matches.

6. handlers_arr.createArrInstance now calls ParseArrType(req.Type)
   immediately after URL validation. Unknown types get 400 with the
   list of accepted values.

7. MediaDetails.ArrType remains a string (it's a JSON DTO going to the
   frontend); assignment sites cast explicitly via string(instance.Type)
   so the boundary is visible.

8. health_monitor.go event payloads cast `string(instance.Type)` at the
   publish site — events are JSON-marshaled, the typed enum is purely
   for internal correctness.

9. getInstanceForPath logs Errorf on row-scan failures (was silent
   continue) — surfaces "ArrType.Scan: unknown ArrType <foo>" so a
   typo'd manual INSERT doesn't silently produce "no instance found"
   downstream.

Tests:
- TestHTTPArrClient_GetFilePath_UnsupportedType updated. Previously
  this asserted "unsupported instance type" because the bare string
  Type="unknown" reached a downstream switch. Now Scan rejects the
  row up front (the better behavior — invalid types never reach
  business logic) and the caller sees "no instance found" instead.
  Test accepts either error pattern for resilience.
- TestHealthMonitorService_checkInstanceHealth_PublishesInstanceHealthyOnRecovery
  was failing because event payload had typed ArrType vs string in
  the test's `!= "sonarr"` comparison. Fixed by the boundary cast in
  health_monitor.go.

This is the first enum in the Phase 2.1 sweep. The same template
applies to ProviderType, ScanStatus, DownloadState, DownloadStatus,
Protocol, DetectionMode — each gets its own focused PR with the
exact same shape (typed const + Parse + Scanner + Valuer + boundary
validation). Doing one enum at a time keeps each PR reviewable.

Phase 2 progress:
- ✅ 2.3 HealthErrorType category map (#190)
- ✅ 2.4 ScanProgressView snapshot (#191)
- 🟡 2.1.a ArrType typed enum (this PR — first of 7 enums)
- 2.2 typed event envelope

Co-authored-by: mescon <mescon@users.noreply.github.com>
mescon added a commit that referenced this pull request May 22, 2026
…idation (#193)

Closes audit finding T2 on the notifier side. Second enum in the
Phase 2.1 sweep, applying the same template as ArrType (#192):
typed string + Scan/Value + ParseProviderType for boundary validation.

ProviderType is 21 constants (Discord, Slack, Telegram, …). Previously
NotificationConfig.ProviderType was a bare string field; JSON
unmarshaling, DB scans, and config imports all accepted arbitrary
values that would later fail in unhelpful ways at urlBuilders lookup.

Changes:

1. `type ProviderType string` with all 21 constants now typed.
2. `validProviderTypes` map for authoritative validation.
3. `ParseProviderType(s string) (ProviderType, error)` for API boundary.
4. `(*ProviderType).Scan(value any) error` — DB read validation.
5. `ProviderType.Value() driver.Value` — symmetric DB write.
6. `NotificationConfig.ProviderType` field is now `ProviderType`.
7. `providerLabels` map keyed by `ProviderType`.
8. `urlBuilders` map keyed by `ProviderType`.
9. `getProviderLabel(providerType ProviderType) string` — typed param.

Boundary validation wired in:
- handlers_notifications.createNotification: ParseProviderType after BindJSON
- handlers_notifications.updateNotification: same
- handlers_config.importNotifications: ParseProviderType per row, skip
  with Errorf on invalid (consistent with the import resilience pattern
  from prior PRs)

Tests:
- TestProviderConstants: changed tt.constant from string to ProviderType
  and compare via string(tt.constant) (the typed enum has no value
  comparison with a raw string).
- TestUrlBuilders_MapCompleteness: expectedProviders slice typed.
- TestGetProviderLabel: provider field typed.
- t.Run() calls cast string(provider) since t.Run takes a name string.
- BenchmarkUrlBuilderLookup: typed slice.

Frontend coordination (matching TS literal unions) still deferred to
the end of the enum sweep.

Phase 2 progress:
- ✅ 2.3 HealthErrorType category map (#190)
- ✅ 2.4 ScanProgressView snapshot (#191)
- ✅ 2.1.a ArrType typed enum (#192)
- 🟡 2.1.b ProviderType typed enum (this PR — second of seven enums)
- 2.1.c-g remaining (ScanStatus, DownloadState, DownloadStatus, Protocol, DetectionMode)
- 2.2 typed event envelope (M)

Co-authored-by: mescon <mescon@users.noreply.github.com>
mescon added a commit that referenced this pull request May 24, 2026
…dation (#194)

Closes audit finding T5 (detection_method / detection_mode accept any
string into the DB). Third enum in the Phase 2.1 sweep — same template
as ArrType (#192) and ProviderType (#193).

DetectionMethod was already a typed string but lacked Parse / Scan /
Value. DetectionMode (quick / thorough) was bare const strings. Both
now have the full typed-enum treatment.

Design choice for DetectionMode constants: they're declared as UNTYPED
string constants (`const ModeQuick = "quick"`). This is deliberate —
30+ existing comparison sites in health_checker.go use `mode ==
ModeThorough` against a bare `string` variable. With typed constants
those sites would each need an explicit conversion; with untyped
constants the comparison works against both `string` and the new
`DetectionMode` type. The typed enum is the boundary primitive
(validation, DB Scan, API parse); internal usage stays as bare strings
until a future PR threads the type through.

Changes:

1. `type DetectionMode string` declared alongside the existing typed
   `DetectionMethod`.
2. ParseDetectionMethod and ParseDetectionMode validators with helpful
   error messages naming the allowed values.
3. Scan / Value methods on both types for DB symmetry.
4. validDetectionMethods and validDetectionModes maps as authoritative
   allowlists.

Boundary validation wired into prepareScanPathRequest (the shared
helper for both createScanPath and updateScanPath in handlers_paths.go).
Defaults are still applied first (so empty fields don't get rejected),
then the Parse calls validate the resulting non-empty values.

Phase 2 progress:
- ✅ 2.3 HealthErrorType category map (#190)
- ✅ 2.4 ScanProgressView snapshot (#191)
- ✅ 2.1.a ArrType (#192)
- ✅ 2.1.b ProviderType (#193)
- 🟡 2.1.c DetectionMethod + DetectionMode (this PR)
- 2.1.d-f remaining (ScanStatus, DownloadState, DownloadStatus, Protocol)
- 2.2 typed event envelope (M)

Co-authored-by: mescon <mescon@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant