Limit number of redelivery attempts #728

robertmircea · 2019-01-12T23:51:30Z

What would be the best strategy to limit number of redeliveries of the same message to subscribers which are failing to ack?

For example: If the messages is not acked several times while attempting delivery to client, I want the message to be redelivered for up to x times and then dropped.

Derived question: Is it possible to keep a counter on server of how many times a message has been redelivered to client?

kozlovic · 2019-01-14T17:11:01Z

What would be the best strategy to limit number of redeliveries of the same message to subscribers which are failing to ack?

ACK it.. this is not being sarcastic ;-). Really, if your application keeps getting a redelivered message and is unable to acknowledge it because say processing of it is failing, and the application really decides that it does not want to have to deal with it, simply ACK it. The server will stop the redelivery.

Messages are not removed from the message log because they are ack'ed by consumers. Messages are always in the message log since they may have to be delivered to new consumers starting at any sequence in the log: https://github.com/nats-io/nats-streaming-server#message-log

The only advantage of having such configuration in the server would be when you try to limit the redelivery of a message to an application that has exited without calling Close() (therefore server does not know). But with pings from server to client, server will ultimately detect that the client is not responding and will suspend redelivery. The default heartbeat interval is 30seconds. If that's too high, it can be lowered through configuration.

robertmircea · 2019-01-14T18:50:45Z

I understand what you are saying, but this means that the app needs to be aware and keep track of the messages for a period of time in order to detect repeated delivery attempts (even the failed ones) up to a maximum. My first thought was to keep the message id in Redis for a while for idempotency checks. This is not very elegant and it scales up to a point. A better mechanism would be in my opinion to keep track on the server of number of redeliveries and either server or the client to have a mechanism to decide if more redeliveries are allowed. This would imply to have an explicit Nack command or to allow the client to instruct the server to deliver based on a specific policy. Probably, what I need here is to have the properties of a queuing system in nats server.

dansouza · 2019-01-31T04:35:20Z

@robertmircea if you're getting the same message redelivered over and over, chances are, a single subscriber will see it multiple times - so you don't need to create a counter on Redis for every single message you receive, only the ones that you're seeing twice (meaning they're getting redelivered in a loop).

Keep an in-memory counter of the messages you've seen with a cuckoo filter and if you see a message twice, then you start counting its redeliveries on Redis. This should scale a lot better since it won't hit Redis everytime, only when you suspect that a message is getting redelivered.

Once you hit enough redelivery attempts, you ACK the message (to remove it from the server) and delete it from the cuckoo filter and from Redis.

It won't perfectly track the number of redeliveries if you have many subscribers (since each subscriber will have its own counter that needs to see a message twice), but it should be good enough and scale well.

vtolstov · 2019-02-04T17:06:59Z

if i have invalid message, for example payload of it created with buggy service, consumers can't success ack this message.
what is the preferred way?

kozlovic · 2019-02-04T17:08:42Z

Again, from a consumer perspective, if you want to stop the server to redeliver this message, simply ack it. When the message callback is invoked, and say your application cannot process it, you still can call msg.Ack() on it. What you can't do, is remove a "random" message from a message log.

vtolstov · 2019-02-04T17:12:10Z

my question more from devops perspective, for example i see in prometheus that some service have big error counter in subscribe handler. what can i do? i can't rewrite service now. so what workaround to temporary fix this issue , why service rewritten and deployed?

robertmircea · 2019-02-04T18:22:56Z

Again, from a consumer perspective, if you want to stop the server to redeliver this message, simply ack it. When the message callback is invoked, and say your application cannot process it, you still can call msg.Ack() on it. What you can't do, is remove a "random" message from a message log.

I understand this, but this strategy means that on first failed message processing I should remove (ack) the message no matter if the error is transient or permanent. Usually, if it is a transient error this means that I should retry processing for a number of times before giving up. NATS server could help in this situation by keeping the number of retries instead of a boolean flag like redelivery. It is very easy to deduct if it is a redelivery by inspecting the count of retries. If it is greater than zero, it means that it was a redelivery. In my case, if I would like to retry for up to n times, maybe would be more efficient to have an explicit nack protocol command without waiting for timeout. For example, if the handler throws an exception would be more effective to fail fast. In case that redelivery counter would be n, I would explicitly ack it from the client in order to be removed from server.

vtolstov · 2019-02-04T18:55:13Z

Yes,does it poossible to count delivery attemps per durable group?

vtolstov · 2019-02-04T18:56:15Z

I can check attemps and do some client side action, for example serialize event to store and ack message

sujeshthekkepatt · 2019-11-08T12:23:24Z

Same need. Any updates on this?

kozlovic · 2019-11-08T16:02:28Z

@sujeshthekkepatt I commented on the other issue related to this: #789

bmcustodio · 2019-12-10T13:51:46Z

NATS server could help in this situation by keeping the number of retries instead of a boolean flag like redelivery.

@kozlovic while I understand that limiting the number of redelivery attempts or implementing any kind of dead-letter queue is out of scope, would it be feasible for NATS Streaming to provide a counter of redeliveries as suggested here (possibly as a new field rather than replacing the existing Redelivery one, for compatibility with existing consumers)?

sujeshthekkepatt · 2019-12-10T14:06:12Z

@bmcstdio that would be great. I have done a work around using the same concept where we manually add some metadata like number of retries etc and use that fields for pushing to multi stage poison queues and later to a dead letter queue. But as you said having this built in would be perfect. I think they are working on a new revised version of Stan.

kozlovic · 2019-12-10T16:49:43Z

@bmcstdio Possibly. The caveat is that this number would likely be "valid" only during the runtime of a server. Meaning that I don't think that it would be feasible to persist the delivery count (but maybe?). Also, in clustering mode, when leader changes, it may not have the redelivery count (again, if that info is not stored/replicated). In this worst case scenario (valid only at runtime/leader election), would that still be valuable?

bmcustodio · 2019-12-10T18:26:18Z

@kozlovic I think it would still be valuable, yes, but of course the absolute best would be to persist that information. I am not familiar at all with the NATS/NATS Streaming code base, but I'd be willing to contribute to this somehow (either implementing, reviewing or testing).

kell18 · 2019-12-16T10:08:07Z

Redeliveries counter would be nice to have for us as well! Even if only during the runtime (not persistent). Thanks!

This is the first step to support server setting a RedeliveryCount in PubMsg when those are redelivered. See nats-io/nats-streaming-server#728 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

kozlovic · 2019-12-17T01:57:39Z

@bmcstdio @sujeshthekkepatt @kell18 Ok, I have decided to add RedeliveryCount (only runtime at the moment, and won't survive a leader election if leader is different node). Starting with the client's repo that holds the PubMsg protobuf: nats-io/stan.go#295.

kozlovic · 2019-12-17T01:58:31Z

Server PR will be submitted shortly after we agree on the field name/type in the client repo. Thanks!

kozlovic · 2019-12-20T15:58:21Z

@bmcstdio @sujeshthekkepatt @kell18 Question for you guys.. the redelivery count is per subscription, but in the case of a queue group, would you expect the redelivery count to be for the group, or individual member?

bmcustodio · 2019-12-20T15:58:58Z

@kozlovic I'd expect it to be for the group.

vtolstov · 2019-12-20T16:43:51Z

For group. пт, 20 дек. 2019 г., 18:58 Bruno M. Custódio <notifications@github.com>:

…

@kozlovic <https://github.com/kozlovic> I'd expect it to be for the group. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#728?email_source=notifications&email_token=AADVQGYCB2BN42SAUSZRTH3QZTTUHA5CNFSM4GPTUCO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHNJTDY#issuecomment-567974287>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADVQGY4NWZXEC5BZAZBOZTQZTTUHANCNFSM4GPTUCOQ> .

robertmircea · 2019-12-20T16:53:30Z

For group for sure!

kozlovic · 2019-12-20T16:54:41Z

Ok, I hear you, but in case you did not notice, if you use clustering, as of now, a message is already redelivered to the same member (subject to change in the future, but as of now that is the case), so that will not make much of a difference :-)

kozlovic · 2019-12-20T18:58:30Z

Guys... closing this one in favor of #996

kozlovic mentioned this issue Dec 17, 2019

[ADDED] RedeliveryCount in PubMsg nats-io/stan.go#295

Merged

vm-affekt mentioned this issue Dec 18, 2019

Uncontrolled redelivering on manual ack mode #991

Closed

kozlovic mentioned this issue Dec 20, 2019

Add RedeliveryCount support #996

Closed

kozlovic closed this as completed Dec 20, 2019

liqweed mentioned this issue Jul 25, 2020

Support redeliveryCount in Message object nats-io/stan.java#155

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit number of redelivery attempts #728

Limit number of redelivery attempts #728

robertmircea commented Jan 12, 2019

kozlovic commented Jan 14, 2019

robertmircea commented Jan 14, 2019

dansouza commented Jan 31, 2019 •

edited

Loading

vtolstov commented Feb 4, 2019

kozlovic commented Feb 4, 2019

vtolstov commented Feb 4, 2019

robertmircea commented Feb 4, 2019

vtolstov commented Feb 4, 2019

vtolstov commented Feb 4, 2019 •

edited

Loading

sujeshthekkepatt commented Nov 8, 2019

kozlovic commented Nov 8, 2019

bmcustodio commented Dec 10, 2019 •

edited

Loading

sujeshthekkepatt commented Dec 10, 2019

kozlovic commented Dec 10, 2019

bmcustodio commented Dec 10, 2019

kell18 commented Dec 16, 2019

kozlovic commented Dec 17, 2019

kozlovic commented Dec 17, 2019

kozlovic commented Dec 20, 2019

bmcustodio commented Dec 20, 2019

vtolstov commented Dec 20, 2019 via email

robertmircea commented Dec 20, 2019

kozlovic commented Dec 20, 2019

kozlovic commented Dec 20, 2019

Limit number of redelivery attempts #728

Limit number of redelivery attempts #728

Comments

robertmircea commented Jan 12, 2019

kozlovic commented Jan 14, 2019

robertmircea commented Jan 14, 2019

dansouza commented Jan 31, 2019 • edited Loading

vtolstov commented Feb 4, 2019

kozlovic commented Feb 4, 2019

vtolstov commented Feb 4, 2019

robertmircea commented Feb 4, 2019

vtolstov commented Feb 4, 2019

vtolstov commented Feb 4, 2019 • edited Loading

sujeshthekkepatt commented Nov 8, 2019

kozlovic commented Nov 8, 2019

bmcustodio commented Dec 10, 2019 • edited Loading

sujeshthekkepatt commented Dec 10, 2019

kozlovic commented Dec 10, 2019

bmcustodio commented Dec 10, 2019

kell18 commented Dec 16, 2019

kozlovic commented Dec 17, 2019

kozlovic commented Dec 17, 2019

kozlovic commented Dec 20, 2019

bmcustodio commented Dec 20, 2019

vtolstov commented Dec 20, 2019 via email

robertmircea commented Dec 20, 2019

kozlovic commented Dec 20, 2019

kozlovic commented Dec 20, 2019

dansouza commented Jan 31, 2019 •

edited

Loading

vtolstov commented Feb 4, 2019 •

edited

Loading

bmcustodio commented Dec 10, 2019 •

edited

Loading