-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADDED] ReplaceDurable option #1136
Conversation
When the subscription request for a durable subscription times out or fail on the client side, but it was accepted in the server, then if the application tries to restart the subscription request again it will fail with a "duplicate durable subscription" error until the connection is closed. This new option allows the user to decide how the server should behave when processing a duplicate durable subscription. If disabled, the default, it behaves as described above, that is, it will reject the second subscription request and return the "duplicate durable" error. If enabled, if the server detects that this is a duplicate, it will close the active one and accept the new one. It is a suspend followed by a resume. From the client perspective, if this is done in the context of #1135, then everything works well since the original subscription in the client was actually not started due to subscription request failure. However, if user try to create multiple duplicate durable subscriptions for subscription requests (Subscribe() calls) that did not fail, then their application will not be notified that the subscriptions that are being replaced are replaced, but they will simply stop receiving messages on those. Resolves #1135 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I did a
and instantiated the server with
With this build and setting, I am no longer able to reproduce the initial request timeout error that led to the subsequent duplicate error. This is not the outcome I expected, though it does seem to be working now in a scenario that it previously did not. |
Thank you for testing it. Some comments:
To be clear, this PR does not solve any timeout that your application may have experienced when creating the subscription. Some users had the same "timeout" issue and consequent "duplicate" error because they were on purpose making their network being very lossy, etc.. (see #1135)
Again, this PR simply allows users to call Subscribe() from the same connection for the same durable without getting the "duplicate durable" error and instead replace the previous with the latest. If this option is enabled, users should make sure that they are not "double" subscribing unless the previous subscription request had failed. The application would NOT receive the message twice though. |
I am basically putting a breakpoint here https://github.com/nats-io/go-nats/blob/73ffc26dfe70cc3b217df38d209c83b6d07eaf3c/nats.go#L2623 to cause a user timeout failure on the initial request, and then on a subsequent call would see the duplicate error. With the new build, I no longer see the initial timeout, and therefore never make a second call. Any reason why that initial call doesn't still timeout as expected? Sounds like this is not a valid repro. |
But it did. That's what's confusing :) ..and I depended on that error to "enable" the duplicate error. |
Your breakpoint is too late. The request has already been sent, and the message is likely to be available when you resume execution. So the select will get the message and return it. How did you get the timeout error in the first place, which lead you to open the issue since the second subscribe would have caused a duplicate? Was it just an error that happened one day - maybe the server was overloaded and did not send reply on time, or there was a network disconnect at the bad time, etc.. |
The "real" error happened on a heavy loaded server that took longer than the timeout to reply. I've since been able to reproduce though using the breakpoint- which sounds like it was a race condition, and I got lucky. Do you have a suggestion as how to generate the initial error? |
If your server is local and you can "play" with it, you could put it in paused state (ctrl+z), and have the client send the request. That request will timeout. You can then resume the server "fg" (for foreground). Or if you can modify the code/debugger runtime, set But to be fair, I was more asking for you guys to make sure that this option is a viable solution and does not break any of the existing deployment that you have. |
Thanks for the suggestions.
Got it - works for me 👍 As a trivial future request, we exclusively use command line arguments and not a config file, so that parity would be useful. |
Doing this on the first request does still cause a duplicate durable registration error on subsequent requests. |
If you set the PS: I am adding the |
Signed-off-by: Ivan Kozlovic <ivan@synadia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When the subscription request for a durable subscription times out
or fail on the client side, but it was accepted in the server, then
if the application tries to restart the subscription request again
it will fail with a "duplicate durable subscription" error until
the connection is closed.
This new option allows the user to decide how the server should behave
when processing a duplicate durable subscription. If disabled, the
default, it behaves as described above, that is, it will reject
the second subscription request and return the "duplicate durable"
error.
If enabled, if the server detects that this is a duplicate, it will
close the active one and accept the new one. It is a suspend followed
by a resume.
From the client perspective, if this is done in the context of #1135,
then everything works well since the original subscription in the
client was actually not started due to subscription request failure.
However, if user try to create multiple duplicate durable subscriptions
for subscription requests (Subscribe() calls) that did not fail, then
their application will not be notified that the subscriptions that are
being replaced are replaced, but they will simply stop receiving messages
on those.
Resolves #1135
Signed-off-by: Ivan Kozlovic ivan@synadia.com