Skip to content

Consumer Exits Permanently on Errors without Auto-Recovery #555

@miekassu

Description

@miekassu

When the SQS consumer encounters an error, the consumer loop exits permanently and never restarts. The service appears healthy (container running, HTTP responding) but stops processing all messages from the queue. This causes silent failures where new messages accumulate but are never processed until someone manually restarts the service.

We experienced this when a message with malformed data caused an unmarshal error (json: cannot unmarshal string into Go struct field PublishedEvent.data of type map[string]interface{}). The handler attempted to Nack the message, which failed due to missing sqs:ChangeMessageVisibility IAM permission. This caused the consumer to exit silently and the service appeared healthy, but stopped processing messages until manual restart.

To Reproduce

  1. Deploy Outpost with AWS SQS but missing sqs:ChangeMessageVisibility permission
  2. Message arrives that causes any handler error (malformed data, processing failure, etc.)
  3. Handler attempts to Nack the failed message
  4. GoCloud driver tries to call ChangeMessageVisibilityBatch
  5. AWS returns 403 permission denied
  6. Error bubbles up from consumer.Run() to startPublishMQConsumer (api/api.go:253)
  7. Consumer goroutine logs error and exits permanently
  8. Service continues running, appears healthy
  9. All subsequent messages accumulate unprocessed with no alerts or visible indication

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions