fix(webhook): per-instance webhook secret separate from master API key (Phase 1.1c — Phase 1 complete)#189
Conversation
Closes the P0 finding S2 from the audit — the LAST remaining Phase 1
item. The webhook handler accepted the master API key (used by the
admin UI and CLI integrations) as the credential for incoming
Sonarr/Radarr webhook calls. Anyone who could read the webhook config
in a compromised *arr instance gained full Healarr admin access; the
two credentials shared one secret with no separation.
Changes:
1. Migration 006_webhook_secret.sql — adds nullable webhook_secret
column to arr_instances. Additive, no breakage for existing rows.
2. createArrInstance — generates a per-instance webhook secret via
auth.GenerateAPIKey (32-byte cryptographic token), stores
encrypted, and returns the plaintext in the response so the user
can paste it into the Sonarr/Radarr webhook config. The instance
ID is also returned so the UI can link to the new instance.
3. getArrInstances — surfaces webhook_secret (decrypted for display)
alongside api_key. Legacy rows with NULL webhook_secret report
webhook_secret=null so the UI can prompt the user to generate one.
4. regenerateWebhookSecret — new endpoint
POST /api/config/arr/:id/webhook-secret that rotates the secret.
Existing webhooks using the old value will start failing until
the user updates their Sonarr/Radarr config; that's intended for
incident response (e.g., if a secret leaked).
5. handleWebhook — authentication path changed:
- If the instance has a webhook_secret, the supplied apikey MUST
match THAT secret. The master key is no longer accepted for
instances that have moved to per-instance secrets — that's the
whole point of the separation.
- If webhook_secret is NULL (legacy instances pre-migration),
fall back to the master API key. A Warnf log line nudges the
operator to generate a per-instance secret to close the gap.
The instance lookup now happens BEFORE credential validation
(previously it was after) because the per-instance secret is the
authoritative credential and we need to know which instance is
being asked about before we can validate.
Tests:
- setupTestDB and the webhook test DB schema include webhook_secret.
- Three existing failure-mode webhook tests
(TestWebhook_InvalidAPIKey, _DBError, _DecryptError) updated to
create an arr instance first; previously they bypassed the lookup
but my reorder made instance lookup the first step. The legacy
master-key fallback path is what they're actually testing now.
- TestWebhook_PerInstanceSecret_Accepted (new): inserts an instance
with a known webhook_secret, verifies that secret authenticates
successfully, AND that the master key is rejected — the regression
guarantee for this PR.
Backward compatibility:
- Existing webhooks (pre-migration) continue to authenticate via the
master API key. The Warnf log indicates which instances need
per-instance secret generation.
- Existing tools (CLI, manual curl) authenticating via the master
API key for the admin UI / protected endpoints are unaffected —
this PR only changes webhook handler auth, not the general
authMiddleware.
- New instances created after this migration get a webhook_secret
automatically and require it for webhook auth — the master key
stops working for them immediately.
Frontend follow-up (deferred):
- UI display of webhook_secret per instance with copy-to-clipboard
and "regenerate" button
- Setup wizard prompting users with legacy instances to generate
per-instance secrets
Addresses Phase 1.1c — the last Phase 1 item. With this merged,
Phase 1 (Critical correctness — security & silent failures) is
COMPLETE.
📝 WalkthroughWalkthroughThe PR adds per-instance webhook secret support to Healarr. A database migration introduces a nullable ChangesPer-Instance Webhook Secrets
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/api/handlers_webhook.go (1)
64-67:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDifferentiate
sql.ErrNoRowsfrom real DB failures.This branch currently turns every query failure into 404. Database faults should return 500 to avoid masking operational issues.
Suggested fix
if err != nil { - logger.Errorf("Webhook rejected: Instance %d not found", instanceID) - c.JSON(http.StatusNotFound, gin.H{"error": "Instance not found"}) + if errors.Is(err, sql.ErrNoRows) { + logger.Errorf("Webhook rejected: Instance %d not found", instanceID) + c.JSON(http.StatusNotFound, gin.H{"error": "Instance not found"}) + } else { + logger.Errorf("Webhook auth lookup failed for instance %d: %v", instanceID, err) + c.JSON(http.StatusInternalServerError, gin.H{"error": "Authentication error"}) + } return }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/api/handlers_webhook.go` around lines 64 - 67, The current error branch in the webhook handler treats any DB error as "not found"; change it to distinguish sql.ErrNoRows from real DB failures by using errors.Is(err, sql.ErrNoRows) (or equivalent) and only return 404 (c.JSON(http.StatusNotFound,...)) when the error is ErrNoRows for the instance lookup (reference: instanceID, logger.Errorf, the err variable in the webhook handler in internal/api/handlers_webhook.go); for any other error log it as a DB failure and return 500 (http.StatusInternalServerError). Ensure you import/qualify database/sql and/or errors as needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/api/handlers_arr.go`:
- Line 222: The code currently ignores errors from result.LastInsertId() (and
similarly result.RowsAffected()) which can mask DB driver failures; update the
handler(s) that call LastInsertId and RowsAffected (search for the statements
using result.LastInsertId() and result.RowsAffected()) to check and handle the
returned error explicitly: if LastInsertId/RowsAffected returns an error, log it
and return an appropriate HTTP error response (or propagate the error) instead
of proceeding with a zero value; ensure you use the actual id/rows only when err
== nil and adjust subsequent success/404 logic accordingly.
In `@internal/api/handlers_webhook_test.go`:
- Line 202: The webhook test builds URLs using string(rune('0'+arrID)) which
only works for single digit IDs; replace that with a numeric string conversion
(e.g., strconv.Itoa(arrID) or fmt.Sprintf("%d", arrID)) wherever used
(specifically in the POST request constructions that reference arrID) so IDs >=
10 produce correct path segments—apply the same replacement at all affected
occurrences mentioned in the comment.
---
Outside diff comments:
In `@internal/api/handlers_webhook.go`:
- Around line 64-67: The current error branch in the webhook handler treats any
DB error as "not found"; change it to distinguish sql.ErrNoRows from real DB
failures by using errors.Is(err, sql.ErrNoRows) (or equivalent) and only return
404 (c.JSON(http.StatusNotFound,...)) when the error is ErrNoRows for the
instance lookup (reference: instanceID, logger.Errorf, the err variable in the
webhook handler in internal/api/handlers_webhook.go); for any other error log it
as a DB failure and return 500 (http.StatusInternalServerError). Ensure you
import/qualify database/sql and/or errors as needed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 206daab2-5011-4b50-b34f-918fd14cf10e
📒 Files selected for processing (6)
internal/api/handlers_arr.gointernal/api/handlers_test.gointernal/api/handlers_webhook.gointernal/api/handlers_webhook_test.gointernal/api/rest.gointernal/db/migrations/006_webhook_secret.sql
| return | ||
| } | ||
| c.Status(http.StatusCreated) | ||
| id, _ := result.LastInsertId() |
There was a problem hiding this comment.
Handle LastInsertId / RowsAffected errors explicitly.
Both DB result calls ignore returned errors. If the driver cannot provide these values, handlers may return incorrect success/404 behavior.
Suggested fix
- id, _ := result.LastInsertId()
+ id, err := result.LastInsertId()
+ if err != nil {
+ logger.Errorf("Failed to get inserted instance ID: %v", err)
+ c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to create instance"})
+ return
+ }
- n, _ := result.RowsAffected()
+ n, err := result.RowsAffected()
+ if err != nil {
+ logger.Errorf("Failed to read update result for instance %s: %v", id, err)
+ c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to rotate webhook secret"})
+ return
+ }Also applies to: 260-260
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/api/handlers_arr.go` at line 222, The code currently ignores errors
from result.LastInsertId() (and similarly result.RowsAffected()) which can mask
DB driver failures; update the handler(s) that call LastInsertId and
RowsAffected (search for the statements using result.LastInsertId() and
result.RowsAffected()) to check and handle the returned error explicitly: if
LastInsertId/RowsAffected returns an error, log it and return an appropriate
HTTP error response (or propagate the error) instead of proceeding with a zero
value; ensure you use the actual id/rows only when err == nil and adjust
subsequent success/404 logic accordingly.
|
|
||
| body := bytes.NewBufferString(`{"eventType": "Download"}`) | ||
| req, _ := http.NewRequest("POST", "/api/webhook/1", body) | ||
| req, _ := http.NewRequest("POST", "/api/webhook/"+string(rune('0'+arrID)), body) |
There was a problem hiding this comment.
Use numeric string conversion for arrID in webhook URLs.
Current URL construction converts arrID to a single rune, so IDs >= 10 generate invalid path segments and can fail for the wrong reason.
Suggested fix
-req, _ := http.NewRequest("POST", "/api/webhook/"+string(rune('0'+arrID)), body)
+req, _ := http.NewRequest("POST", "/api/webhook/"+strconv.FormatInt(arrID, 10), body)Apply the same replacement to all affected webhook test requests.
Also applies to: 512-512, 556-556, 597-597, 607-607
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/api/handlers_webhook_test.go` at line 202, The webhook test builds
URLs using string(rune('0'+arrID)) which only works for single digit IDs;
replace that with a numeric string conversion (e.g., strconv.Itoa(arrID) or
fmt.Sprintf("%d", arrID)) wherever used (specifically in the POST request
constructions that reference arrID) so IDs >= 10 produce correct path
segments—apply the same replacement at all affected occurrences mentioned in the
comment.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…on missing (#190) Closes audit finding T1. IsRecoverable and IsTrueCorruption used switch statements with implicit "default: false" — when a new ErrorType constant was added without updating both switches, the new type silently classified as neither recoverable nor true-corruption, making it invisible to the remediation pipeline. New errors would just be ignored, with no signal to the operator or developer. Replace with an authoritative errorCategories map keyed by ErrorType string: var errorCategories = map[string]ErrorCategory{ ErrorTypeZeroByte: CategoryTrueCorruption, // ... every constant must appear here ErrorTypeMountLost: CategoryRecoverable, } IsRecoverable and IsTrueCorruption now delegate to a category() helper that consults the map. On lookup miss: - testing.Testing() == true → panic with a message naming the unregistered type and pointing at the source file to update. This is the property that closes T1: a new ErrorType added without map entry will fail every test that exercises remediation routing, caught at CI rather than in production. - production binary → log Errorf and fall back to CategoryRecoverable (the conservative choice — retry rather than delete user data). Unknown errors no longer silently bypass remediation; they're treated as transient infrastructure issues with operator-visible logging. Tests: - Existing TestHealthCheckError_IsRecoverable / _IsTrueCorruption test tables continue to pass unchanged (registered types still classify the same way). - New TestHealthCheckError_UnregisteredTypePanics asserts the panic message contains "unregistered error type" so future refactors can't accidentally weaken the regression guard. - New TestHealthCheckError_CategoryConsistency asserts every entry in errorCategories has a non-Unknown category (a CategoryUnknown registration would defeat the purpose). This is the first PR in Phase 2 (Foundational types) of the remediation plan. Phase 1 (Critical correctness — security & silent failures) is complete as of PR #189. Co-authored-by: mescon <mescon@users.noreply.github.com>
Closes P0 finding S2 — the last Phase 1 item. With this merged, Phase 1 of the remediation plan is complete.
The webhook handler accepted the master API key as the credential for incoming Sonarr/Radarr webhook calls — one secret, two roles, no separation. Per the audit: anyone who could read the webhook config in a compromised *arr instance gained full Healarr admin access.
Changes:
Tests: 3 existing webhook tests updated to create instances first (the reorder made instance lookup the first step). New TestWebhook_PerInstanceSecret_Accepted verifies the per-instance secret authenticates AND that the master key is rejected once the instance has one.
Backward-compat: pre-migration instances continue to work via the master-key fallback; new instances get a webhook_secret automatically and require it for webhook auth. Admin UI / CLI auth via master key is unaffected (this PR only changes webhook handler auth).
Phase 1 of the remediation plan is COMPLETE after this PR.
Summary by CodeRabbit