Question: DeliveryMQ edge case around post-publish errors

## Problem

This is a simplistic pseudo-code of the deliverymq flow, with emphasis on some interesting cases:

```
step 1: pre-publish operations (query destination, query event, etc.)
step 2: publish event
step 3: post-publish operations (schedule retry, send event to logmq, idempotency cleanup etc.)
```

We have clear error handling around pre-publish ops & the publish step. The post-publish ops error handling is a bit trickier.

Currently, we don't have any special error handling to differentiate pre vs post publish ops. Is this something we should consider?

For example:

```
for event A

1: deliverymq
  a: publish succeeds
  b: logmq fails
  c: nack

2: deliverymq
  a: publish succeeds
  b: logmq fails
  c: nack

...
```

As you can see, essentially logmq becomes a very critical piece of infrastructure where if it fails, we will essentially spam all destinations with however many retries we can until the message ends up in DLQ.

It's not super clear to me if this is an expected problem of distributed systems, or if there's a way to limit the impact.

---

Another scenario is what is publish fails & log also fails.

- Should we simply nack & let the mq system retry?
- Should we schedule a retry via the Redis-based system? There's a chance that may fail too. If yes, should we nack or ack?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question: DeliveryMQ edge case around post-publish errors #151

Problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: DeliveryMQ edge case around post-publish errors #151

Description

Problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions