-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[馃悰 BUG]: rabbitmq_redial error causes redial/retry logic to prematurely exit #1857
Comments
Hey @haydenm315 馃憢 Thanks for the report 馃憤 |
Just tried with 2023.3.10 and there's messages which indicate an attempt to recover, however I still don't see the queue listed when doing ./rr workers, after bouncing the rabbitmq service to simulate an upgrade or service interruption. Looks like there's an attempt to re-establish the connection now. Upon making rabbitmq unavailable, roadrunner logs the following.
After making rabbitmq available again, roadrunner logs the following
|
Yeah, so, that means that RR successfully redialed. @haydenm315 Is that what you wanted to see, am I right? |
Seems closer than the older release we were using. Still requires a restart of roadrunner to get it to accept jobs and have a ready status. Unless I restart roadrunner after rabbitmq comes back, I don't see jobs like below in the rr workers output.
Maybe something's not quite right with "trying to redeclare queues and subscribers". |
I'm not sure, because I have a test especially for the case when rabbitmq is down. Have you tried to push the job after redial? |
@haydenm315 In your configuration I also see, that you're consuming pipeline with the name |
I renamed the queue in the previous example. It's not a queue naming issue. Jobs are getting pushed because when I restart the roadrunner server the number of Execs goes up. |
But, it's not a queue, it's a pipeline. Could you please share the latest configuration you have? |
Here the .rr.yaml file These are the env var values in the cfg file AMQP_REDIAL_TIMEOUT=99999999 |
Thanks 馃憤
|
Thanks for the recommendations |
Hey @haydenm315 馃憢 I double-checked that behavior. Let me summarize what I did:
Keep in mind, that RR uses an exponential backoff mechanism to redial. It won't reconnect immediately, but after some exponential step. For example: first redial will happen after 1sec after loosing a connection, next - 2s -> 5s -> 15s -> 25s ... |
I also noticed, that your |
Closing as answered. You are welcome to comment here if you still have questions 馃槂 |
No duplicates 馃ゲ.
What happened?
I'm using roadrunner with rabbitmq amqp and jobs to run a pipeline. If there's an interruption in the rabbitmq broker such as restarting it, roadrunner fails to re-establish the connecftion to the rabbitmq broker queue.
I see one redialing attempt when the broker is restarted, however it's never reconnected.
Version (rr --version)
rr version 2023.1.5 (build time: 2023-06-08T14:45:05+0000, go1.20.4), OS: linux, arch: amd64
How to reproduce the issue?
To reproduce, start up a rabbitmq broker and then the roadrunner.
Verify the roadrunner is connected to the broker and is ready for work
./rr workers
Output will include the following at the bottom.
restart your rabbitmq broker
roadrunner will output some errors as pasted below in the log output section.. After these errors are output, there won't be any activity.
Looking at the workers with ./rr workers won't have any job status, even though the rabbitmq broker is available.
Only after restarting the roadrunner process will the amqp connect restore.
Relevant log output
The text was updated successfully, but these errors were encountered: