Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Dead letter queue for NATS #81

Closed
3 tasks done
embano1 opened this issue Dec 2, 2019 · 2 comments
Closed
3 tasks done

[Feature request] Dead letter queue for NATS #81

embano1 opened this issue Dec 2, 2019 · 2 comments

Comments

@embano1
Copy link

embano1 commented Dec 2, 2019

My actions before raising this issue

Using async invocation it seems there's no way to tell whether the invocation eventually succeeded. Failure could be caused by API issues, functions being deleted/not accepting connections (SIGTERM), event payload issues causing exceptions or simple app logic bugs within the function.

For async invocation this is usually handled with a dead letter queue (DLQ). I could not find any mention of DLQ support in OpenFaaS/NATS (STAN). How is this dealt with today? Is it a concern at all? Does STAN automatically redrive failed invocations? If so, how many until it gives up?

Expected Behaviour

Failure during async function invocation should be trackable, if possible using DLQ where events can be inspected and potentially redriven.

Current Behaviour

Tested async invocation via faas-cli and a connector using connector-sdk where the subscribed function does not exist (anymore). There was no error reported leaving the caller believing that the invocation would eventually succeed (even though 202 technically does not give a guarantee, so introspection capabilities would be generally useful in a 202 setup).

A work around seems to be to provide callbacks where the error status can be introspected. Not sure if this is always possible (CLI) or desired.

Details see here: openfaas/faas#1298

Possible Solution

Implement a DLQ capability. Are there already metrics exposed for failed async function invocations?

Steps to Reproduce (for bugs)

Simply call faas-cli -a (or curl) on a non-existing function.

Context

I sense potential consistency issues (no error reported while the function was not executed at all) leading to hard to debug issues. Also, malformed payloads and application logic bugs could be hidden by the current implementation (if my understanding of the issue is correct and complete).

@alexellis
Copy link
Member

/set title: [Feature request] Dead letter queue for NATS

@derek derek bot changed the title Support for Dead Letter Queue? [Feature request] Dead letter queue for NATS Dec 10, 2019
@alexellis alexellis transferred this issue from openfaas/faas Dec 10, 2019
@alexellis
Copy link
Member

NATS does not provide a DLQ. I spent some time looking into build a DLQ when building colorisebot, but it's complicated. If the upstream API is failing due to rate-limiting, then retrying N times without an appropriate back-off is counter-productive.

https://github.com/alexellis/mailbox

https://github.com/alexellis/rate-limited-mailbox

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants