fix(integration): categorize HealthErrorType via explicit map, panic on missing (Phase 2.3: T1)#190
Conversation
…on missing
Closes audit finding T1. IsRecoverable and IsTrueCorruption used switch
statements with implicit "default: false" — when a new ErrorType
constant was added without updating both switches, the new type
silently classified as neither recoverable nor true-corruption, making
it invisible to the remediation pipeline. New errors would just be
ignored, with no signal to the operator or developer.
Replace with an authoritative errorCategories map keyed by ErrorType
string:
var errorCategories = map[string]ErrorCategory{
ErrorTypeZeroByte: CategoryTrueCorruption,
// ... every constant must appear here
ErrorTypeMountLost: CategoryRecoverable,
}
IsRecoverable and IsTrueCorruption now delegate to a category()
helper that consults the map. On lookup miss:
- testing.Testing() == true → panic with a message naming the
unregistered type and pointing at the source file to update. This
is the property that closes T1: a new ErrorType added without map
entry will fail every test that exercises remediation routing,
caught at CI rather than in production.
- production binary → log Errorf and fall back to CategoryRecoverable
(the conservative choice — retry rather than delete user data).
Unknown errors no longer silently bypass remediation; they're
treated as transient infrastructure issues with operator-visible
logging.
Tests:
- Existing TestHealthCheckError_IsRecoverable / _IsTrueCorruption
test tables continue to pass unchanged (registered types still
classify the same way).
- New TestHealthCheckError_UnregisteredTypePanics asserts the panic
message contains "unregistered error type" so future refactors
can't accidentally weaken the regression guard.
- New TestHealthCheckError_CategoryConsistency asserts every entry
in errorCategories has a non-Unknown category (a CategoryUnknown
registration would defeat the purpose).
This is the first PR in Phase 2 (Foundational types) of the
remediation plan. Phase 1 (Critical correctness — security & silent
failures) is complete as of PR #189.
📝 WalkthroughWalkthroughThis PR introduces an ChangesError Categorization System
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/integration/health_checker_test.go`:
- Around line 95-108: TestHealthCheckError_CategoryConsistency only validates
keys present in errorCategories and misses any newly declared ErrorType that
isn't registered; add an authoritative list of all declared ErrorType values
(e.g., a new slice AllErrorTypes or reuse an existing constant list of ErrorType
values) and update TestHealthCheckError_CategoryConsistency to iterate over that
list, asserting that each ErrorType has an entry in errorCategories and that its
category is not CategoryUnknown and is either CategoryRecoverable or
CategoryTrueCorruption; reference the existing symbols errorCategories,
CategoryUnknown, CategoryRecoverable, CategoryTrueCorruption, and
TestHealthCheckError_CategoryConsistency when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 96ee107a-f250-4681-a800-d576aa16d14b
📒 Files selected for processing (2)
internal/integration/health_checker_test.gointernal/integration/interfaces.go
| // TestHealthCheckError_CategoryConsistency asserts every error type registered | ||
| // in errorCategories has a NON-Unknown category. Registering with | ||
| // CategoryUnknown defeats the purpose — that value is only returned when an | ||
| // unregistered type is looked up. | ||
| func TestHealthCheckError_CategoryConsistency(t *testing.T) { | ||
| for errType, cat := range errorCategories { | ||
| if cat == CategoryUnknown { | ||
| t.Errorf("error type %q registered as CategoryUnknown; use Recoverable or TrueCorruption", errType) | ||
| } | ||
| if cat != CategoryRecoverable && cat != CategoryTrueCorruption { | ||
| t.Errorf("error type %q has unknown category value %d", errType, cat) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This test still won't catch a newly added ErrorType that's missing from the map.
It only validates entries that already exist in errorCategories. If someone adds ErrorTypeFoo and forgets the registration, CI stays green unless some other test happens to reference it, so T1 is still only partially covered. Use a single authoritative list of all declared error types and assert that every one has a map entry.
As per coding guidelines, **/*_test.go: Focus on: Test coverage for edge cases.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/integration/health_checker_test.go` around lines 95 - 108,
TestHealthCheckError_CategoryConsistency only validates keys present in
errorCategories and misses any newly declared ErrorType that isn't registered;
add an authoritative list of all declared ErrorType values (e.g., a new slice
AllErrorTypes or reuse an existing constant list of ErrorType values) and update
TestHealthCheckError_CategoryConsistency to iterate over that list, asserting
that each ErrorType has an entry in errorCategories and that its category is not
CategoryUnknown and is either CategoryRecoverable or CategoryTrueCorruption;
reference the existing symbols errorCategories, CategoryUnknown,
CategoryRecoverable, CategoryTrueCorruption, and
TestHealthCheckError_CategoryConsistency when making the change.
…ive mutex state (#191) Closes audit finding T4. The Scanner public surface returned []ScanProgress to handlers, which then JSON-marshaled the values. ScanProgress embeds a sync.Mutex; reading exported fields without holding the mutex races with writers (notably markFileProcessed bumping FilesDone while emitProgress/GetActiveScans is publishing). The previous workaround in GetActiveScans manually copied fields under the per-scan lock and silenced the resulting copylocks warning with //nolint:govet — the type itself was still wrong, the linter was just being told to stop noticing. Changes: 1. New ScanProgressView value type — fields only, no mutex, JSON tags on the right type. This is what the public API now returns. 2. New (*ScanProgress).Snapshot() ScanProgressView method — takes the per-scan lock, copies fields atomically, returns by value. This is the only sanctioned way to read ScanProgress fields from code that doesn't already hold the mutex. 3. Scanner.GetActiveScans() returns []ScanProgressView (was []ScanProgress). Implementation now calls Snapshot() on each entry rather than the nolint:govet-suppressed manual copy. 4. emitProgress uses Snapshot() to build event data instead of the inline mu.Lock/Unlock block. 5. ScanProgress's exported fields are now json:"-" since marshaling should go through ScanProgressView. Internal field access stays (writers already hold the mutex correctly). 6. internal/testutil renames its local ScanProgress mirror to ScanProgressView — same fields, just matches the new public name. Mock and test scanner implementations updated accordingly. The nolint:govet comment is gone; the race-prone pattern is no longer even expressible at the public boundary. Tests: existing scanner + handler tests pass unchanged. No behavior change — just type-level enforcement that no race-prone field reads escape across the service boundary. Phase 2 of the remediation plan, item 2.4. Phase 2 progress: - ✅ 2.3 (HealthErrorType category map, #190 just merged) - ✅ 2.4 (this PR) - 2.1 typed enum sweep (M, coordinated Go+TS) - 2.2 typed event envelope (M) Co-authored-by: mescon <mescon@users.noreply.github.com>
…ation (#192) Closes audit finding T2 (ArrType drift between Go and TS) on the Go side. Previously every site that compared "instance.Type == ArrTypeSonarr" was working with a bare string field; misspelled strings could be inserted via SQL or API without compiler complaint, and switch statements on unknown types silently fell through. Changes: 1. New `type ArrType string` in integration package. Existing constants are now typed: `const ArrTypeSonarr ArrType = "sonarr"` etc. 2. ParseArrType(s string) (ArrType, error) — the only sanctioned way to convert a raw string to ArrType. Used at API write boundaries. 3. (*ArrType).Scan(value any) error — implements sql.Scanner so `rows.Scan(&i.Type)` validates at row read time. Unknown DB values produce errors rather than propagating silently. 4. ArrType.Value() driver.Value — implements driver.Valuer for symmetric parameter passing. 5. ArrInstanceInfo.Type and the internal ArrInstance.Type are now typed ArrType. All ~10 comparison and switch sites switch from string-vs-string to ArrType-vs-ArrType — compiler enforces matches. 6. handlers_arr.createArrInstance now calls ParseArrType(req.Type) immediately after URL validation. Unknown types get 400 with the list of accepted values. 7. MediaDetails.ArrType remains a string (it's a JSON DTO going to the frontend); assignment sites cast explicitly via string(instance.Type) so the boundary is visible. 8. health_monitor.go event payloads cast `string(instance.Type)` at the publish site — events are JSON-marshaled, the typed enum is purely for internal correctness. 9. getInstanceForPath logs Errorf on row-scan failures (was silent continue) — surfaces "ArrType.Scan: unknown ArrType <foo>" so a typo'd manual INSERT doesn't silently produce "no instance found" downstream. Tests: - TestHTTPArrClient_GetFilePath_UnsupportedType updated. Previously this asserted "unsupported instance type" because the bare string Type="unknown" reached a downstream switch. Now Scan rejects the row up front (the better behavior — invalid types never reach business logic) and the caller sees "no instance found" instead. Test accepts either error pattern for resilience. - TestHealthMonitorService_checkInstanceHealth_PublishesInstanceHealthyOnRecovery was failing because event payload had typed ArrType vs string in the test's `!= "sonarr"` comparison. Fixed by the boundary cast in health_monitor.go. This is the first enum in the Phase 2.1 sweep. The same template applies to ProviderType, ScanStatus, DownloadState, DownloadStatus, Protocol, DetectionMode — each gets its own focused PR with the exact same shape (typed const + Parse + Scanner + Valuer + boundary validation). Doing one enum at a time keeps each PR reviewable. Phase 2 progress: - ✅ 2.3 HealthErrorType category map (#190) - ✅ 2.4 ScanProgressView snapshot (#191) - 🟡 2.1.a ArrType typed enum (this PR — first of 7 enums) - 2.2 typed event envelope Co-authored-by: mescon <mescon@users.noreply.github.com>
…idation (#193) Closes audit finding T2 on the notifier side. Second enum in the Phase 2.1 sweep, applying the same template as ArrType (#192): typed string + Scan/Value + ParseProviderType for boundary validation. ProviderType is 21 constants (Discord, Slack, Telegram, …). Previously NotificationConfig.ProviderType was a bare string field; JSON unmarshaling, DB scans, and config imports all accepted arbitrary values that would later fail in unhelpful ways at urlBuilders lookup. Changes: 1. `type ProviderType string` with all 21 constants now typed. 2. `validProviderTypes` map for authoritative validation. 3. `ParseProviderType(s string) (ProviderType, error)` for API boundary. 4. `(*ProviderType).Scan(value any) error` — DB read validation. 5. `ProviderType.Value() driver.Value` — symmetric DB write. 6. `NotificationConfig.ProviderType` field is now `ProviderType`. 7. `providerLabels` map keyed by `ProviderType`. 8. `urlBuilders` map keyed by `ProviderType`. 9. `getProviderLabel(providerType ProviderType) string` — typed param. Boundary validation wired in: - handlers_notifications.createNotification: ParseProviderType after BindJSON - handlers_notifications.updateNotification: same - handlers_config.importNotifications: ParseProviderType per row, skip with Errorf on invalid (consistent with the import resilience pattern from prior PRs) Tests: - TestProviderConstants: changed tt.constant from string to ProviderType and compare via string(tt.constant) (the typed enum has no value comparison with a raw string). - TestUrlBuilders_MapCompleteness: expectedProviders slice typed. - TestGetProviderLabel: provider field typed. - t.Run() calls cast string(provider) since t.Run takes a name string. - BenchmarkUrlBuilderLookup: typed slice. Frontend coordination (matching TS literal unions) still deferred to the end of the enum sweep. Phase 2 progress: - ✅ 2.3 HealthErrorType category map (#190) - ✅ 2.4 ScanProgressView snapshot (#191) - ✅ 2.1.a ArrType typed enum (#192) - 🟡 2.1.b ProviderType typed enum (this PR — second of seven enums) - 2.1.c-g remaining (ScanStatus, DownloadState, DownloadStatus, Protocol, DetectionMode) - 2.2 typed event envelope (M) Co-authored-by: mescon <mescon@users.noreply.github.com>
…dation (#194) Closes audit finding T5 (detection_method / detection_mode accept any string into the DB). Third enum in the Phase 2.1 sweep — same template as ArrType (#192) and ProviderType (#193). DetectionMethod was already a typed string but lacked Parse / Scan / Value. DetectionMode (quick / thorough) was bare const strings. Both now have the full typed-enum treatment. Design choice for DetectionMode constants: they're declared as UNTYPED string constants (`const ModeQuick = "quick"`). This is deliberate — 30+ existing comparison sites in health_checker.go use `mode == ModeThorough` against a bare `string` variable. With typed constants those sites would each need an explicit conversion; with untyped constants the comparison works against both `string` and the new `DetectionMode` type. The typed enum is the boundary primitive (validation, DB Scan, API parse); internal usage stays as bare strings until a future PR threads the type through. Changes: 1. `type DetectionMode string` declared alongside the existing typed `DetectionMethod`. 2. ParseDetectionMethod and ParseDetectionMode validators with helpful error messages naming the allowed values. 3. Scan / Value methods on both types for DB symmetry. 4. validDetectionMethods and validDetectionModes maps as authoritative allowlists. Boundary validation wired into prepareScanPathRequest (the shared helper for both createScanPath and updateScanPath in handlers_paths.go). Defaults are still applied first (so empty fields don't get rejected), then the Parse calls validate the resulting non-empty values. Phase 2 progress: - ✅ 2.3 HealthErrorType category map (#190) - ✅ 2.4 ScanProgressView snapshot (#191) - ✅ 2.1.a ArrType (#192) - ✅ 2.1.b ProviderType (#193) - 🟡 2.1.c DetectionMethod + DetectionMode (this PR) - 2.1.d-f remaining (ScanStatus, DownloadState, DownloadStatus, Protocol) - 2.2 typed event envelope (M) Co-authored-by: mescon <mescon@users.noreply.github.com>
Closes audit finding T1 — the first PR in Phase 2 (Foundational types). With Phase 1 complete (#189), this kicks off the type-design work.
The bug
IsRecoverableandIsTrueCorruptionusedswitchstatements with implicitdefault: false. When a newErrorTypeconstant was added (a new corruption detection method, a new failure mode), the new type silently classified as neither recoverable nor true-corruption — meaning the remediation pipeline silently bypassed it entirely.A scan would hit the new error type, classify as neither, log nothing about the gap, and pretend the file was fine. Operator never knows; CI never catches it.
The fix
IsRecoverableandIsTrueCorruptionnow both delegate tocategory().Why this design
Errorflog + treat as RecoverableThe conservative fallback choice — unknown → Recoverable rather than → TrueCorruption — matters:
IsTrueCorruption=truetriggers the file-deletion-and-redownload pipeline. Defaulting unknown errors to "don't delete the user's data" is the safe failure mode.Tests
HealthCheckError{Type: "ThisErrorTypeIsNotRegistered"}, callsIsRecoverable(), asserts panic with "unregistered error type" in the message. This is the regression guard.errorCategoriesentry has a non-CategoryUnknownvalue.CategoryUnknownis only meant to be returned for unregistered lookups; registering an error type asCategoryUnknownwould defeat the purpose.Backward compatibility
Zero behavior change for any existing ErrorType — every registered type categorizes the same way it did before. The only behavior change is for unregistered types, which previously fell through silently and now either panic (tests) or log + retry (prod).
Test plan
go build ./...— cleangolangci-lint run ./...— 0 issuesgo test ./internal/integration/... -run TestHealthCheckError— 6 subtests + 2 new top-level tests, all passgo test ./...— full suite passesContext
Phase 1 complete (last item was #189 — webhook secret separation). Phase 2 (Foundational types) started. Phase 2 has 4 PRs total:
Summary by CodeRabbit
Tests
Improvements