-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pubsub was disconnected without closing iterator. #34
Comments
It's always tough with non-reproducing bugs, let me see if I can help.
Broadly speaking, the code you have is reasonable. It's possible to remove the reconnect loop, but whether this is simpler or not is kind of up to your needs for introspection. The lib already does a reconnect loop internally like you have in your example: ya-gcp/src/pubsub/streaming_subscription.rs Line 470 in 35ca2d7
The reason this doesn't reconnect forever is that the default retry policy has a retry limit. The PubSub servers return Unavailable when there aren't messages in the subscription (among other reasons) -- I'm not sure if this is always code=8a75 , maybe that means "no messages". You could consider increasing the retry limit, or tuning the retry policy to better reflect your message workflow, then let the lib handle retries for you. If you enable debug logging, you'll likely see some messages like retrying after error
One of our applications, for example, sets an unbounded retry limit:
|
I know this is a very vague error report and it can’t be reproduced, but I want to share our observations in case someone has seen something similar in their projects
Background
We are running a google cloud deployed service handling a burst of data from pubsubs every four hours. Messages are fetched with a loop like this:
During normal operation we get a recoverable error every 30-50 minutes:
This error is cough by the outer loop which restarts the subscription.
The issue, messages piling up not delivered
Last weekend we observed an outage, where we had messages piling up in our pubsub without being handled. The service was manually restarted four times, checking the logs from the incident I noticed, During the entire outage from
2023-06-30 07:12:40.865 UTC
to2023-07-01 11:40:57.218 UTC
, not a singlecode=8a75
occurred, and still no messaged were consumed by our loop.After the restarts messages were immediately handled but then again after some time it got silently disconnected again.
One theory is ya-gcp got disconnected from the pubsub service but was not able to notify the consumer by terminating the iterator.
The questions are:
We have not been able to reproduce the problem, the same code had been running without issues for several months and no issues observed in the last few days
The text was updated successfully, but these errors were encountered: