[Feature request] Dead letter queue for NATS #81

embano1 · 2019-12-02T22:06:10Z

My actions before raising this issue

Followed the troubleshooting guide
Read/searched the docs
Searched past issues

Using async invocation it seems there's no way to tell whether the invocation eventually succeeded. Failure could be caused by API issues, functions being deleted/not accepting connections (SIGTERM), event payload issues causing exceptions or simple app logic bugs within the function.

For async invocation this is usually handled with a dead letter queue (DLQ). I could not find any mention of DLQ support in OpenFaaS/NATS (STAN). How is this dealt with today? Is it a concern at all? Does STAN automatically redrive failed invocations? If so, how many until it gives up?

Expected Behaviour

Failure during async function invocation should be trackable, if possible using DLQ where events can be inspected and potentially redriven.

Current Behaviour

Tested async invocation via faas-cli and a connector using connector-sdk where the subscribed function does not exist (anymore). There was no error reported leaving the caller believing that the invocation would eventually succeed (even though 202 technically does not give a guarantee, so introspection capabilities would be generally useful in a 202 setup).

A work around seems to be to provide callbacks where the error status can be introspected. Not sure if this is always possible (CLI) or desired.

Details see here: openfaas/faas#1298

Possible Solution

Implement a DLQ capability. Are there already metrics exposed for failed async function invocations?

Steps to Reproduce (for bugs)

Simply call faas-cli -a (or curl) on a non-existing function.

Context

I sense potential consistency issues (no error reported while the function was not executed at all) leading to hard to debug issues. Also, malformed payloads and application logic bugs could be hidden by the current implementation (if my understanding of the issue is correct and complete).

The text was updated successfully, but these errors were encountered:

alexellis · 2019-12-10T12:34:21Z

/set title: [Feature request] Dead letter queue for NATS

alexellis · 2019-12-10T12:36:20Z

NATS does not provide a DLQ. I spent some time looking into build a DLQ when building colorisebot, but it's complicated. If the upstream API is failing due to rate-limiting, then retrying N times without an appropriate back-off is counter-productive.

https://github.com/alexellis/mailbox

https://github.com/alexellis/rate-limited-mailbox

derek bot changed the title ~~Support for Dead Letter Queue?~~ [Feature request] Dead letter queue for NATS Dec 10, 2019

alexellis transferred this issue from openfaas/faas Dec 10, 2019

alexellis mentioned this issue Dec 10, 2019

Support a manual acknowledgement mode. #80

Closed

embano1 mentioned this issue Dec 15, 2019

Clarify queue-worker durable queue implementation and delivery semantics #84

Closed

alexellis mentioned this issue May 15, 2020

[Research] Retries for certain HTTP codes #100

Open

embano1 closed this as completed Apr 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Dead letter queue for NATS #81

[Feature request] Dead letter queue for NATS #81

embano1 commented Dec 2, 2019

alexellis commented Dec 10, 2019

alexellis commented Dec 10, 2019

[Feature request] Dead letter queue for NATS #81

[Feature request] Dead letter queue for NATS #81

Comments

embano1 commented Dec 2, 2019

My actions before raising this issue

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

Context

alexellis commented Dec 10, 2019

alexellis commented Dec 10, 2019