Skip to content

fix(destregistry): handle publisher creation errors as delivery failures#828

Merged
alexluong merged 1 commit intomainfrom
fix/handle-publisher-creation-errors
Apr 13, 2026
Merged

fix(destregistry): handle publisher creation errors as delivery failures#828
alexluong merged 1 commit intomainfrom
fix/handle-publisher-creation-errors

Conversation

@alexluong
Copy link
Copy Markdown
Collaborator

Summary

  • When CreatePublisher fails (e.g. invalid GCP service account credentials missing the type field), the error was wrapped as a PreDeliveryError which caused a nack → retry loop until messages exhausted Pub/Sub max delivery attempts and landed in the DLQ — completely invisible to the customer
  • Root cause: a customer configured a GCP Pub/Sub destination with invalid service_account_json (valid JSON but missing required GCP fields). The delivery worker couldn't create a Pub/Sub client, nacked every attempt, and the events silently went to DLQ after 6 retries
  • Events were successfully delivered to the customer's other 3 destinations — only the misconfigured GCP Pub/Sub destination was affected

Changes

destgcppubsub.go — Destination-level fixes:

  • CreatePublisher now returns ErrDestinationPublishAttempt (instead of plain errors) for both resolveMetadata failures (validation_failed) and pubsub.NewClient failures (client_creation_failed), signaling to the registry that these are delivery errors
  • resolveMetadata now validates that service_account_json contains the required type field, catching invalid credentials at destination creation time

registry.go — Registry-level safety net:

  • PublishEvent now checks for ErrDestinationPublishAttempt and ErrDestinationValidation from ResolvePublisher and creates a failed attempt record instead of returning nil — this ensures the error flows through the AttemptError path (logged to ClickHouse, visible in dashboard, retry-scheduled then acked) instead of the PreDeliveryError path (nack → DLQ)

Error flow before

CreatePublisher fails → plain error
  → ResolvePublisher returns (nil, err)
  → PublishEvent returns (nil, err)
  → doHandle sees nil attempt → PreDeliveryError
  → shouldNackError → nack → Pub/Sub retries 6x → DLQ

Error flow after

CreatePublisher fails → ErrDestinationPublishAttempt
  → ResolvePublisher returns (nil, err)
  → PublishEvent detects ErrDestinationPublishAttempt → creates failed Attempt
  → doHandle sees non-nil attempt → AttemptError
  → logged to ClickHouse → retry scheduling → ack
  → customer sees failed attempt in dashboard

Test plan

  • New validation test: missing type field in service_account_json — rejects at Validate() time
  • New validation test: missing type field - but passes with emulator endpoint — emulator skips credential validation
  • New registry test: CreatePublisher returning ErrDestinationPublishAttempt → failed attempt returned
  • New registry test: CreatePublisher returning ErrDestinationValidation → failed attempt returned
  • New registry test: CreatePublisher returning unknown error → nil attempt (existing behavior preserved)
  • All existing destregistry and deliverymq tests pass

🤖 Generated with Claude Code

When CreatePublisher fails (e.g. invalid GCP credentials), the error
was treated as a PreDeliveryError which caused a nack/retry loop until
messages hit the Pub/Sub DLQ — invisible to the customer.

Now:
- GCP Pub/Sub CreatePublisher returns ErrDestinationPublishAttempt for
  validation and client creation errors
- Registry creates a failed attempt when ResolvePublisher returns
  ErrDestinationPublishAttempt or ErrDestinationValidation
- GCP credential JSON is validated for required 'type' field at
  destination creation time

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
outpost-docs Ready Ready Preview, Comment Apr 13, 2026 10:38am
outpost-website Ready Ready Preview, Comment Apr 13, 2026 10:38am

Request Review

@alexluong alexluong merged commit 003650f into main Apr 13, 2026
5 checks passed
@alexluong alexluong deleted the fix/handle-publisher-creation-errors branch April 13, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants