Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Durable subscription cleanup #759

Closed
OleksandrBerezianskyi opened this issue Mar 4, 2019 · 27 comments
Closed

Durable subscription cleanup #759

OleksandrBerezianskyi opened this issue Mar 4, 2019 · 27 comments

Comments

@OleksandrBerezianskyi
Copy link

OleksandrBerezianskyi commented Mar 4, 2019

Hi,

I have a question. Let say we have an AWS autoscale group. Every server has a subscription with a unique durable name (for example private IP like "10.10.0.3") for a single channel "events". So every server would receive every message in the channel.

Situation: one of servers fails and is unrecoverable. New server with a new durable subscription is started automatically.

Question: is there any cleanup mechanism that would remove dead "10.10.0.3" subscription from the nats streaming server after some inactivity timeout?

@kozlovic
Copy link
Member

kozlovic commented Mar 4, 2019

You would need to unsubscribe those durables. For instance, using the examples provided in the go-nats-streaming repo, you could do:

go run examples/stan-sub/main.go -c <cluster name> -id <client ID> -durable <durable name> -unsubscribe <subject>

then control+C and this will unsubscribe the given durable. Basically you need to call Subscription.Unsubscribe() after resuming the durable.

@kozlovic kozlovic closed this as completed Mar 4, 2019
@vtolstov
Copy link

vtolstov commented Mar 4, 2019

Does it possible to get list of all subscribers to durable queue group?

@kozlovic
Copy link
Member

kozlovic commented Mar 4, 2019

@vtolstov You could with monitoring: https://github.com/nats-io/nats-streaming-server#channelsz
You could get the list of subscriptions on this channel: http://localhost:8222/streaming/channelsz?channel=foo&subs=1 and then look at the queue name and durable name corresponding to your durable queue group.

@vtolstov
Copy link

vtolstov commented Mar 4, 2019

I'm prefer some api call, but if this is only one solution, ok. Thank you.

@OleksandrBerezianskyi
Copy link
Author

@kozlovic Thanks for the fast response. I think I have not formulated my question clear enough.

Is there any automated cleanup mechanism on the side of nats streaming server that would remove dead durable subscription after some inactivity timeout.
And a related question: is there any problems with having 1-10-100-1000 dead durable subscription?

I absolutely understand the part of documentation related to closing durable subscription and unsubscribing (including queue group case) from the durable subscription.

@kozlovic
Copy link
Member

kozlovic commented Mar 4, 2019

@OleksandrBerezianskyi No, there is no automated way: the point of a durable subscription is that user wants server to keep state while it is offline. You will need to explicitly unsubscribe.
Now, you could leave them without being unsubscribe in that the state maintained is very little if they don't have unacknowledged messages, but still I would recommend removing them once you know that they are not coming back.

@OleksandrBerezianskyi
Copy link
Author

@kozlovic thanks for confirming that. Just a few final thoughts:

  • people are using autoscale groups where each server has absolutely the same configuration
  • creating a unique durable subscription name on each such server should be pretty easy using something like timestamp+ip or any other UUID
  • servers are crashing all the time on big volumes and no one cares if the system has enough self-healing
  • I assume that the majority of devs would not build any additional code to track and store unique durable subscription names and analyze which of them are "dead"

Having that said I think it would be nice to create automated cleanup mechanism for "dead" durable subscriptions that would trigger after some period of inactivity (maybe 1 hour or 1 day depending on the solution). If that sounds logical please let me know.

@kozlovic
Copy link
Member

kozlovic commented Mar 5, 2019

The reason of creating a durable subscription is so that the state is maintained while the subscriber is offline. With NATS Streaming, messages are stored regardless of interest, so one can start a non durable subscription after messages have been sent and still receive them (you set the start position in the channel when creating the subscription).
In that sense, creating a durable name that is "random" is defeating the point of a durable subscription which name should be "static" and be able to be reused to resume the durable after the application restarts.
If you don't need to resume a durable, then you should not use a durable, but use a regular subscription.
Note that even with a non durable subscription, if the NATS Streaming server crashes and restarts, the delivery will resume where it was before the crash since the subscription state is persisted, even in non durable subscriptions, specifically for that: survive server crash.
Here is a description of the various types of subscriptions that Streaming has: https://github.com/nats-io/nats-streaming-server#subscriptions

@juandiegopalomino
Copy link

@kozlovic does this relate to a channel's max inactivity or a message's max age? Suppose we have a channel where no publisher has published in a week and no subscriber has consumed in over a week, and the max inactivity is a week, but there are some durable subscriptions that have not been cleaned up. Would the channel (and its messages) been cleaned up? Suppose the channel has a message over a week old and the message age is 1 week? Would the message be cleaned up?

@kozlovic
Copy link
Member

max_age will remove messages which timestamp is older than that, but does not mean that when all messages are gone the channel is removed.
max_inactivity allows the channel to be automatically removed when there is no new published message or new subscription for more than the specified time. Channel will be removed even if there are still messages (either no expiration - max_age - set, or not all have expired). Durables that are offline are ignored and channel can still be removed. If there are active subscriptions (durable or not), the channel can't be removed.

@juandiegopalomino
Copy link

@kozlovic I see, that makes sense. And the message's max age is non negotiable-- that is if a message is over its max age then it will be destroyed no matter what, right?

@kozlovic
Copy link
Member

Right. max_age will remove a message which timestamp is older than the defined value, regardless if there are subscriptions, etc.. Actually, even if that message was sent to a subscription and not ack'ed, it will still be removed and server will stop trying to redeliver it obviously.

@juandiegopalomino
Copy link

Awesome. One last question going back to the channels: so if there is a channel whose sole subscription is a durable that IS offline, then the max inactivity will still be enforced, right? For example:

    {
      "name": "EVENT_SERVICE.1d7ee3d1-995f-4596-8ab7-398161f12ffa",
      "msgs": 3743,
      "bytes": 2872004,
      "first_seq": 1,
      "last_seq": 3743,
      "subscriptions": [
        {
          "client_id": "EventPoller-1d7ee3d1-995f-4596-8ab7-398161f12ffa",
          "inbox": "ecb8ef0eafb929e099f542",
          "ack_inbox": "_INBOX.X9zwFC4V6RqBlCPF1JUyd6",
          "queue_name": "EVENT:EVENT_SERVICE.1d7ee3d1-995f-4596-8ab7-398161f12ffa",
          "is_durable": true,
          "is_offline": true,
          "max_inflight": 10,
          "ack_wait": 60,
          "last_sent": 3743,
          "pending_count": 1,
          "is_stalled": false
        }
      ]
    },

Gotten from the http://localhost:8222/streaming/channelsz?subs=1 endpoint

@kozlovic
Copy link
Member

Yes, it will be removed. I just ran a quick test with latest release. Note that there was a fix back in 0.12.0:

Channel not deleted if subscription not explicitly closed (#691)

@juandiegopalomino
Copy link

@kozlovic huh, then I think I'm seeing something weird. Running the server on cluster mode w/ debug logs enabled, I see a message like
[1] 2019/04/18 15:01:50.395155 [DBG] STREAM: Channel "X" is being deleted (I believe that comes from

s.log.Debugf("Channel %q is being deleted", c.name)
)
yet the monitoring url http://localhost:8222/streaming/channelsz still lists channel X in its list.
Is that expected?

@kozlovic
Copy link
Member

Which version are you running? There were quite a bit of issues related to clustering and channel deletion in 0.12.2. If issue persists, please open a new issue in GH. Thanks!

@juandiegopalomino
Copy link

@kozlovic we are running 0.12.2. Will updating to 0.14 fix it?

@kozlovic
Copy link
Member

I can't tell since I don't know what the problem is. Could you try to have a reproducible test case that shows the behavior and open an issue? Thanks!

@kozlovic
Copy link
Member

I tried with a cluster of 3 nodes and max inactivity of 10s. Again, no issue found. In debug mode, here is the type of tracing that you should see:

[97923] 2019/04/18 10:49:14.598161 [DBG] STREAM: Channel "foo" delete timer set to fire in 10s
[97923] 2019/04/18 10:49:14.598172 [DBG] STREAM: [Client:me] Closed (Inbox=_INBOX.ZsmBAh6onqVuglhJOnxtj5)
[97923] 2019/04/18 10:49:24.603579 [DBG] STREAM: Channel "foo" is being deleted
[97923] 2019/04/18 10:49:24.603593 [DBG] STREAM: Channel "foo" delete timer stopped
[97923] 2019/04/18 10:49:24.634422 [INF] STREAM: Channel "foo" has been deleted

I have all nodes exposing the monitoring endpoint and channel has disappeared from all 3.

@juandiegopalomino
Copy link

@kozlovic that is odd. We're in a middle of a deployment release and at 30% message capacity (over 1 week), so we have a bit of a runway. I'll investigate further next week and file an issue accordingly.

@juandiegopalomino
Copy link

And just to make sure, the channel should NOT be appearing in http://localhost:8222/streaming/channelsz if it's successfully deleted, right?

@kozlovic
Copy link
Member

Do you see the DBG and INF about that channel being deleted? If not, there may be some issue preventing the deletion.
But yes, if it is deleted, it should not be visible in the monitoring endpoint... unless a new message or new subscription just came in recreated it. Tracing should tell you that though.

@juandiegopalomino
Copy link

➜  ~ kubectl logs -n nats-io nats-streaming-file-nats-streaming-cluster-2 nats-streaming-cluster | grep 'EVENT_SERVICE.7dad2925-00dd-40c6-ad37-463e8dfb1e88' -B 3 -A 3
[1] 2019/04/18 14:03:02.643886 [DBG] STREAM: [Client:EventPoller-c6f29da0-9bd7-46c7-bf89-ffa8c1cdc1b8] Closed (Inbox=6804872cbb2be72e3b4496)
[1] 2019/04/18 15:01:49.658739 [DBG] STREAM: Channel "EMBEDDED_ACTOR.7dad2925-00dd-40c6-ad37-463e8dfb1e88" is being deleted
[1] 2019/04/18 15:01:49.658766 [DBG] STREAM: Channel "EMBEDDED_ACTOR.7dad2925-00dd-40c6-ad37-463e8dfb1e88" delete timer stopped
[1] 2019/04/18 15:01:50.395155 [DBG] STREAM: Channel "EVENT_SERVICE.7dad2925-00dd-40c6-ad37-463e8dfb1e88" is being deleted
[1] 2019/04/18 15:01:50.395177 [DBG] STREAM: Channel "EVENT_SERVICE.7dad2925-00dd-40c6-ad37-463e8dfb1e88" delete timer stopped
[1] 2019/04/18 15:42:14.862788 [INF] STREAM: Channel "QUEUE.92036a18-7263-406e-bf7a-aa0bd0d39293.0ceebc1b-cf9f-4749-88c9-9b78b4a53768" has been created
[1] 2019/04/18 15:42:17.971648 [DBG] STREAM: [Client:QUEUE_SVC---fb3f5b7a-94ed-4069-8645-350757db17f0] Started new durable queue subscription, subject=QUEUE.92036a18-7263-406e-bf7a-aa0bd0d39293.0ceebc1b-cf9f-4749-88c9-9b78b4a53768, inbox=6c41127356d3ae403e4546, queue=durable:QUEUE.92036a18-7263-406e-bf7a-aa0bd0d39293.0ceebc1b-cf9f-4749-88c9-9b78b4a53768, subid=1, sending from sequence, asked_seq=1 actual_seq=1
[1] 2019/04/18 15:42:18.081204 [DBG] STREAM: [Client:QUEUE_SVC---fb3f5b7a-94ed-4069-8645-350757db17f0] Suspended durable queue subscription, subject=QUEUE.92036a18-7263-406e-bf7a-aa0bd0d39293.0ceebc1b-cf9f-4749-88c9-9b78b4a53768, inbox=6c41127356d3ae403e4546, queue=durable:QUEUE.92036a18-7263-406e-bf7a-aa0bd0d39293.0ceebc1b-cf9f-4749-88c9-9b78b4a53768, subid=1
--
--
[1] 2019/04/18 17:00:57.874313 [DBG] STREAM: Channel "QUEUE.d9635131-a5d9-49f7-990b-6785bfa36d9d.b8c5379d-a325-4be6-87c9-4dfdf6bb580f" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:00:57.874328 [DBG] STREAM: Channel "EMBEDDED_ACTOR.720f0690-2c91-4a4f-b228-33198eb2d79f" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:00:57.874342 [DBG] STREAM: Channel "QUEUE.13c0e0d6-3fa0-48a9-936b-f318364cdfe4.c0b49ca1-4029-4243-ae80-0da2de929060" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:00:57.874355 [DBG] STREAM: Channel "EVENT_SERVICE.7dad2925-00dd-40c6-ad37-463e8dfb1e88" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:00:57.874370 [DBG] STREAM: Channel "EVENT_SERVICE.c29bacef-5232-4abc-882d-08f73a15085e" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:00:57.874384 [DBG] STREAM: Channel "EMBEDDED_ACTOR.b1040e9c-6905-4196-91ff-4d80b70c7768" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:00:57.874397 [DBG] STREAM: Channel "EVENT_SERVICE.fd21bdc5-003e-4858-8a27-0b7d987de103" delete timer set to fire in 168h0m0s
--
--
[1] 2019/04/18 17:11:07.876787 [DBG] STREAM: Channel "QUEUE.c34a976a-ec6e-4588-a3e5-70db70538d44.a8cc7535-0ce1-4b5c-a28f-30d40090d839" delete timer stopped
[1] 2019/04/18 17:11:07.876792 [DBG] STREAM: Channel "EVENT_SERVICE.aa5ad653-265c-4ad7-810c-53c19d06deb1" delete timer stopped
[1] 2019/04/18 17:11:07.876797 [DBG] STREAM: Channel "EVENT_SERVICE.a277b536-cdd5-4697-ab9f-5405fd6d8d04" delete timer stopped
[1] 2019/04/18 17:11:07.876802 [DBG] STREAM: Channel "EVENT_SERVICE.7dad2925-00dd-40c6-ad37-463e8dfb1e88" delete timer stopped
[1] 2019/04/18 17:11:07.876807 [DBG] STREAM: Channel "EVENT_SERVICE.d4c3a651-c872-42da-a0bb-3091df143c52" delete timer stopped
[1] 2019/04/18 17:11:07.876811 [DBG] STREAM: Channel "EVENT_SERVICE.4c32dc35-e222-4cbd-8af3-27f4ef6cfc1b" delete timer stopped
[1] 2019/04/18 17:11:07.876816 [DBG] STREAM: Channel "EVENT_SERVICE.95742cbf-857b-4cde-9302-228908581a07" delete timer stopped
➜  ~ kubectl logs -n nats-io nats-streaming-file-nats-streaming-cluster-1 nats-streaming-cluster | grep 'EVENT_SERVICE.7dad2925-00dd-40c6-ad37-463e8dfb1e88' -B 3 -A 3
➜  ~ kubectl logs -n nats-io nats-streaming-file-nats-streaming-cluster-0 nats-streaming-cluster | grep 'EVENT_SERVICE.7dad2925-00dd-40c6-ad37-463e8dfb1e88' -B 3 -A 3
[1] 2019/04/18 16:59:05.151542 [DBG] STREAM: Channel "QUEUE.25ddf0f1-8484-4673-9a45-007bbf380650.ead9dc5b-de70-4e3d-9034-57ce24f86477" delete timer stopped
[1] 2019/04/18 16:59:05.151551 [DBG] STREAM: Channel "QUEUE.060ff0f1-c812-4764-8f89-51cbc0786115.84aee6a8-03de-4fdd-abd2-9d8f10d9c297" delete timer stopped
[1] 2019/04/18 16:59:05.151559 [DBG] STREAM: Channel "EVENT_SERVICE.6a1259c3-7fdd-49f2-92cc-5c20dc8c427d" delete timer stopped
[1] 2019/04/18 16:59:05.151567 [DBG] STREAM: Channel "EVENT_SERVICE.7dad2925-00dd-40c6-ad37-463e8dfb1e88" delete timer stopped
[1] 2019/04/18 16:59:05.151575 [DBG] STREAM: Channel "QUEUE.01a70733-0b33-4827-9e7b-c855955b8ddf.bd0dfdbe-300c-408a-ac39-0f3bfbb3357a" delete timer stopped
[1] 2019/04/18 16:59:05.151583 [DBG] STREAM: Channel "QUEUE.f7cadba3-1fdb-498d-979c-f29e4685afe4.2328b5d4-d9d2-46f7-ad69-e191f0aae959" delete timer stopped
[1] 2019/04/18 16:59:05.151591 [DBG] STREAM: Channel "QUEUE.1a38ff3c-da92-4cc4-9254-b65ac214194b.258065f6-70b6-4605-bcf3-818fd6cdca78" delete timer stopped
--
--
[1] 2019/04/18 17:12:59.804994 [DBG] STREAM: Channel "QUEUE.2b19a2d1-1ad5-406b-958f-a5c853dbd682.e6a29763-b94e-4ff4-afad-641ec9619cc4" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:12:59.805002 [DBG] STREAM: Channel "QUEUE.465fb3a9-4acd-46e7-9efb-3c51bcc3dcf1.cd929157-241a-4b8b-8c18-895549311efb" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:12:59.805009 [DBG] STREAM: Channel "EVENT_SERVICE.4ea644ab-e157-4d5b-9469-44e640de25c6" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:12:59.805018 [DBG] STREAM: Channel "EVENT_SERVICE.7dad2925-00dd-40c6-ad37-463e8dfb1e88" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:12:59.805027 [DBG] STREAM: Channel "QUEUE.fbd08328-9ca6-4a80-8e70-7eff7d02f2e6.e430e33e-6727-48a0-a084-e0cd8aea09a2" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:12:59.805034 [DBG] STREAM: Channel "EVENT_SERVICE.7aeee55d-00c9-4c68-8bb6-50411ab18670" delete timer set to fire in 168h0m0s
[1] 2019/04/18 17:12:59.805043 [DBG] STREAM: Channel "EMBEDDED_ACTOR.351362d7-fe8c-4be4-ae83-694bd95a348c" delete timer set to fire in 168h0m0s

@kozlovic does this help? I'm not sure what exactly to be looking for. nats-io nats-streaming-file-nats-streaming-cluster-2 should be master

@kozlovic
Copy link
Member

Well, which channel you think should have been deleted but was not? Is it in that log?

@kozlovic
Copy link
Member

I think you should open a new issue really.. :-)

@juandiegopalomino
Copy link

will do

@juandiegopalomino
Copy link

(but to answer your question, the one I was grepping for)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants