Skip to content

fix(webhook): per-instance webhook secret separate from master API key (Phase 1.1c — Phase 1 complete)#189

Merged
mescon merged 1 commit into
mainfrom
fix/webhook-per-instance-secret
May 21, 2026
Merged

fix(webhook): per-instance webhook secret separate from master API key (Phase 1.1c — Phase 1 complete)#189
mescon merged 1 commit into
mainfrom
fix/webhook-per-instance-secret

Conversation

@mescon
Copy link
Copy Markdown
Owner

@mescon mescon commented May 21, 2026

Closes P0 finding S2 — the last Phase 1 item. With this merged, Phase 1 of the remediation plan is complete.

The webhook handler accepted the master API key as the credential for incoming Sonarr/Radarr webhook calls — one secret, two roles, no separation. Per the audit: anyone who could read the webhook config in a compromised *arr instance gained full Healarr admin access.

Changes:

  • Migration 006 adds a nullable webhook_secret column to arr_instances.
  • createArrInstance generates a per-instance secret (32-byte cryptographic token, same primitive as auth.GenerateAPIKey), stores encrypted, returns plaintext in response so the user can paste it into Sonarr/Radarr.
  • getArrInstances surfaces webhook_secret (decrypted for display); NULL on legacy rows so the UI can prompt to generate one.
  • New endpoint POST /api/config/arr/:id/webhook-secret rotates the secret for incident response.
  • handleWebhook auth: if the instance has a webhook_secret, ONLY that secret authenticates (master key rejected); if NULL (legacy), master key works with a Warnf nudge to generate a per-instance secret.

Tests: 3 existing webhook tests updated to create instances first (the reorder made instance lookup the first step). New TestWebhook_PerInstanceSecret_Accepted verifies the per-instance secret authenticates AND that the master key is rejected once the instance has one.

Backward-compat: pre-migration instances continue to work via the master-key fallback; new instances get a webhook_secret automatically and require it for webhook auth. Admin UI / CLI auth via master key is unaffected (this PR only changes webhook handler auth).

Phase 1 of the remediation plan is COMPLETE after this PR.

Summary by CodeRabbit

  • New Features
    • Introduced per-instance webhook secrets for enhanced security and isolation between instances
    • Added the ability to generate and regenerate webhook secrets for individual arr instances
    • New API endpoint to manage webhook secret regeneration
    • Maintained backward compatibility for instances without configured secrets

Review Change Stack

Closes the P0 finding S2 from the audit — the LAST remaining Phase 1
item. The webhook handler accepted the master API key (used by the
admin UI and CLI integrations) as the credential for incoming
Sonarr/Radarr webhook calls. Anyone who could read the webhook config
in a compromised *arr instance gained full Healarr admin access; the
two credentials shared one secret with no separation.

Changes:

1. Migration 006_webhook_secret.sql — adds nullable webhook_secret
   column to arr_instances. Additive, no breakage for existing rows.

2. createArrInstance — generates a per-instance webhook secret via
   auth.GenerateAPIKey (32-byte cryptographic token), stores
   encrypted, and returns the plaintext in the response so the user
   can paste it into the Sonarr/Radarr webhook config. The instance
   ID is also returned so the UI can link to the new instance.

3. getArrInstances — surfaces webhook_secret (decrypted for display)
   alongside api_key. Legacy rows with NULL webhook_secret report
   webhook_secret=null so the UI can prompt the user to generate one.

4. regenerateWebhookSecret — new endpoint
   POST /api/config/arr/:id/webhook-secret that rotates the secret.
   Existing webhooks using the old value will start failing until
   the user updates their Sonarr/Radarr config; that's intended for
   incident response (e.g., if a secret leaked).

5. handleWebhook — authentication path changed:
   - If the instance has a webhook_secret, the supplied apikey MUST
     match THAT secret. The master key is no longer accepted for
     instances that have moved to per-instance secrets — that's the
     whole point of the separation.
   - If webhook_secret is NULL (legacy instances pre-migration),
     fall back to the master API key. A Warnf log line nudges the
     operator to generate a per-instance secret to close the gap.

   The instance lookup now happens BEFORE credential validation
   (previously it was after) because the per-instance secret is the
   authoritative credential and we need to know which instance is
   being asked about before we can validate.

Tests:
- setupTestDB and the webhook test DB schema include webhook_secret.
- Three existing failure-mode webhook tests
  (TestWebhook_InvalidAPIKey, _DBError, _DecryptError) updated to
  create an arr instance first; previously they bypassed the lookup
  but my reorder made instance lookup the first step. The legacy
  master-key fallback path is what they're actually testing now.
- TestWebhook_PerInstanceSecret_Accepted (new): inserts an instance
  with a known webhook_secret, verifies that secret authenticates
  successfully, AND that the master key is rejected — the regression
  guarantee for this PR.

Backward compatibility:
- Existing webhooks (pre-migration) continue to authenticate via the
  master API key. The Warnf log indicates which instances need
  per-instance secret generation.
- Existing tools (CLI, manual curl) authenticating via the master
  API key for the admin UI / protected endpoints are unaffected —
  this PR only changes webhook handler auth, not the general
  authMiddleware.
- New instances created after this migration get a webhook_secret
  automatically and require it for webhook auth — the master key
  stops working for them immediately.

Frontend follow-up (deferred):
- UI display of webhook_secret per instance with copy-to-clipboard
  and "regenerate" button
- Setup wizard prompting users with legacy instances to generate
  per-instance secrets

Addresses Phase 1.1c — the last Phase 1 item. With this merged,
Phase 1 (Critical correctness — security & silent failures) is
COMPLETE.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

The PR adds per-instance webhook secret support to Healarr. A database migration introduces a nullable webhook_secret column on arr_instances. Instance handlers generate and encrypt secrets at creation; webhook authentication checks per-instance secrets first, then falls back to the master API key for legacy instances. A new regeneration endpoint allows updating secrets. Tests cover both authentication paths.

Changes

Per-Instance Webhook Secrets

Layer / File(s) Summary
Database schema and test setup
internal/db/migrations/006_webhook_secret.sql, internal/api/handlers_test.go, internal/api/handlers_webhook_test.go
Migration adds webhook_secret TEXT column to arr_instances; API and webhook test schemas add the new column to their test table definitions.
Instance management with webhook secrets
internal/api/handlers_arr.go
Imports database/sql and auth packages; updates getArrInstances to query and decrypt webhook_secret in instance listings; extends createArrInstance to auto-generate and encrypt per-instance secrets; adds new regenerateWebhookSecret handler to update and return plaintext secrets.
Webhook authentication with per-instance secret and legacy fallback
internal/api/handlers_webhook.go
Parses instance_id up front, loads instance enabled state and nullable webhook_secret; authenticates using per-instance secret when present (decrypt and constant-time compare); falls back to master settings.api_key for legacy instances (with warning log); removes unconditional master key check from main flow.
API endpoint for webhook secret regeneration
internal/api/rest.go
Registers new authenticated POST /config/arr/:id/webhook-secret route mapped to regenerateWebhookSecret handler.
Webhook tests for legacy and per-instance authentication
internal/api/handlers_webhook_test.go
Updates TestWebhook_InvalidAPIKey, TestWebhook_DBError, and TestWebhook_DecryptError to create and use dynamic legacy instance ids instead of hardcoded paths; adds new TestWebhook_PerInstanceSecret_Accepted test verifying per-instance secret acceptance and master key rejection for configured instances.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Secrets per instance, no more the single key,
Rabbits guard their gardens with authenticity,
Legacy paths fade, yet still the old way works,
In backward-compat magic, the gentle fallback lurks,
Fresh secrets bloom anew at regeneration's call! 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing per-instance webhook secrets separate from the master API key, which is the core focus of all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/webhook-per-instance-secret

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/api/handlers_webhook.go (1)

64-67: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Differentiate sql.ErrNoRows from real DB failures.

This branch currently turns every query failure into 404. Database faults should return 500 to avoid masking operational issues.

Suggested fix
 if err != nil {
-	logger.Errorf("Webhook rejected: Instance %d not found", instanceID)
-	c.JSON(http.StatusNotFound, gin.H{"error": "Instance not found"})
+	if errors.Is(err, sql.ErrNoRows) {
+		logger.Errorf("Webhook rejected: Instance %d not found", instanceID)
+		c.JSON(http.StatusNotFound, gin.H{"error": "Instance not found"})
+	} else {
+		logger.Errorf("Webhook auth lookup failed for instance %d: %v", instanceID, err)
+		c.JSON(http.StatusInternalServerError, gin.H{"error": "Authentication error"})
+	}
 	return
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/api/handlers_webhook.go` around lines 64 - 67, The current error
branch in the webhook handler treats any DB error as "not found"; change it to
distinguish sql.ErrNoRows from real DB failures by using errors.Is(err,
sql.ErrNoRows) (or equivalent) and only return 404
(c.JSON(http.StatusNotFound,...)) when the error is ErrNoRows for the instance
lookup (reference: instanceID, logger.Errorf, the err variable in the webhook
handler in internal/api/handlers_webhook.go); for any other error log it as a DB
failure and return 500 (http.StatusInternalServerError). Ensure you
import/qualify database/sql and/or errors as needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/api/handlers_arr.go`:
- Line 222: The code currently ignores errors from result.LastInsertId() (and
similarly result.RowsAffected()) which can mask DB driver failures; update the
handler(s) that call LastInsertId and RowsAffected (search for the statements
using result.LastInsertId() and result.RowsAffected()) to check and handle the
returned error explicitly: if LastInsertId/RowsAffected returns an error, log it
and return an appropriate HTTP error response (or propagate the error) instead
of proceeding with a zero value; ensure you use the actual id/rows only when err
== nil and adjust subsequent success/404 logic accordingly.

In `@internal/api/handlers_webhook_test.go`:
- Line 202: The webhook test builds URLs using string(rune('0'+arrID)) which
only works for single digit IDs; replace that with a numeric string conversion
(e.g., strconv.Itoa(arrID) or fmt.Sprintf("%d", arrID)) wherever used
(specifically in the POST request constructions that reference arrID) so IDs >=
10 produce correct path segments—apply the same replacement at all affected
occurrences mentioned in the comment.

---

Outside diff comments:
In `@internal/api/handlers_webhook.go`:
- Around line 64-67: The current error branch in the webhook handler treats any
DB error as "not found"; change it to distinguish sql.ErrNoRows from real DB
failures by using errors.Is(err, sql.ErrNoRows) (or equivalent) and only return
404 (c.JSON(http.StatusNotFound,...)) when the error is ErrNoRows for the
instance lookup (reference: instanceID, logger.Errorf, the err variable in the
webhook handler in internal/api/handlers_webhook.go); for any other error log it
as a DB failure and return 500 (http.StatusInternalServerError). Ensure you
import/qualify database/sql and/or errors as needed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 206daab2-5011-4b50-b34f-918fd14cf10e

📥 Commits

Reviewing files that changed from the base of the PR and between f053cc9 and 9983477.

📒 Files selected for processing (6)
  • internal/api/handlers_arr.go
  • internal/api/handlers_test.go
  • internal/api/handlers_webhook.go
  • internal/api/handlers_webhook_test.go
  • internal/api/rest.go
  • internal/db/migrations/006_webhook_secret.sql

return
}
c.Status(http.StatusCreated)
id, _ := result.LastInsertId()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle LastInsertId / RowsAffected errors explicitly.

Both DB result calls ignore returned errors. If the driver cannot provide these values, handlers may return incorrect success/404 behavior.

Suggested fix
- id, _ := result.LastInsertId()
+ id, err := result.LastInsertId()
+ if err != nil {
+ 	logger.Errorf("Failed to get inserted instance ID: %v", err)
+ 	c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to create instance"})
+ 	return
+ }

- n, _ := result.RowsAffected()
+ n, err := result.RowsAffected()
+ if err != nil {
+ 	logger.Errorf("Failed to read update result for instance %s: %v", id, err)
+ 	c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to rotate webhook secret"})
+ 	return
+ }
As per coding guidelines, "Handle all errors explicitly in Go - do not ignore error returns".

Also applies to: 260-260

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/api/handlers_arr.go` at line 222, The code currently ignores errors
from result.LastInsertId() (and similarly result.RowsAffected()) which can mask
DB driver failures; update the handler(s) that call LastInsertId and
RowsAffected (search for the statements using result.LastInsertId() and
result.RowsAffected()) to check and handle the returned error explicitly: if
LastInsertId/RowsAffected returns an error, log it and return an appropriate
HTTP error response (or propagate the error) instead of proceeding with a zero
value; ensure you use the actual id/rows only when err == nil and adjust
subsequent success/404 logic accordingly.


body := bytes.NewBufferString(`{"eventType": "Download"}`)
req, _ := http.NewRequest("POST", "/api/webhook/1", body)
req, _ := http.NewRequest("POST", "/api/webhook/"+string(rune('0'+arrID)), body)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use numeric string conversion for arrID in webhook URLs.

Current URL construction converts arrID to a single rune, so IDs >= 10 generate invalid path segments and can fail for the wrong reason.

Suggested fix
-req, _ := http.NewRequest("POST", "/api/webhook/"+string(rune('0'+arrID)), body)
+req, _ := http.NewRequest("POST", "/api/webhook/"+strconv.FormatInt(arrID, 10), body)

Apply the same replacement to all affected webhook test requests.

Also applies to: 512-512, 556-556, 597-597, 607-607

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/api/handlers_webhook_test.go` at line 202, The webhook test builds
URLs using string(rune('0'+arrID)) which only works for single digit IDs;
replace that with a numeric string conversion (e.g., strconv.Itoa(arrID) or
fmt.Sprintf("%d", arrID)) wherever used (specifically in the POST request
constructions that reference arrID) so IDs >= 10 produce correct path
segments—apply the same replacement at all affected occurrences mentioned in the
comment.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 48.27586% with 45 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/api/handlers_arr.go 28.07% 37 Missing and 4 partials ⚠️
internal/api/handlers_webhook.go 86.20% 3 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@mescon mescon merged commit bca5ff7 into main May 21, 2026
7 of 9 checks passed
@mescon mescon deleted the fix/webhook-per-instance-secret branch May 21, 2026 17:40
mescon added a commit that referenced this pull request May 22, 2026
…on missing (#190)

Closes audit finding T1. IsRecoverable and IsTrueCorruption used switch
statements with implicit "default: false" — when a new ErrorType
constant was added without updating both switches, the new type
silently classified as neither recoverable nor true-corruption, making
it invisible to the remediation pipeline. New errors would just be
ignored, with no signal to the operator or developer.

Replace with an authoritative errorCategories map keyed by ErrorType
string:

	var errorCategories = map[string]ErrorCategory{
		ErrorTypeZeroByte:      CategoryTrueCorruption,
		// ... every constant must appear here
		ErrorTypeMountLost:     CategoryRecoverable,
	}

IsRecoverable and IsTrueCorruption now delegate to a category()
helper that consults the map. On lookup miss:

- testing.Testing() == true → panic with a message naming the
  unregistered type and pointing at the source file to update. This
  is the property that closes T1: a new ErrorType added without map
  entry will fail every test that exercises remediation routing,
  caught at CI rather than in production.

- production binary → log Errorf and fall back to CategoryRecoverable
  (the conservative choice — retry rather than delete user data).
  Unknown errors no longer silently bypass remediation; they're
  treated as transient infrastructure issues with operator-visible
  logging.

Tests:
- Existing TestHealthCheckError_IsRecoverable / _IsTrueCorruption
  test tables continue to pass unchanged (registered types still
  classify the same way).
- New TestHealthCheckError_UnregisteredTypePanics asserts the panic
  message contains "unregistered error type" so future refactors
  can't accidentally weaken the regression guard.
- New TestHealthCheckError_CategoryConsistency asserts every entry
  in errorCategories has a non-Unknown category (a CategoryUnknown
  registration would defeat the purpose).

This is the first PR in Phase 2 (Foundational types) of the
remediation plan. Phase 1 (Critical correctness — security & silent
failures) is complete as of PR #189.

Co-authored-by: mescon <mescon@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant