-
-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] All workers ready, exec 0 #737
Comments
Hey @iluuu1994. That's definitely strange. In the past, I made a load-testing session for few days. Let me verify this. I'll make a load-testing session for a few days starting from now with your configuration and with
Might be some deadlock with the pool and supervised pool.
Debug logs will be super helpful. |
@iluuu1994 Could you please try the latest beta version if that's possible? |
Hey @rustatian, thanks for your quick response!
We'll do that.
Depends. How stable is it? Unfortunately we can only reproduce it in the production environment. I guess it's load-dependent. |
It's pretty stable, we will release the same version, but without beta postfix next week. It has few new configuration options like |
Have you been using RR1 in the past? If yes - have you seen the same experience? |
@iluuu1994 Could you please share info about your environment? OS version ( |
Can you also provide us more info on your load:
|
When do you |
Sounds like we found a core issue and it's related to a specific edge case with Without No need to update the beta, we will fix this issue before the next release (2.3.1, next Tuesday). Thank you for bringing this issue to us, we will solve it at the highest priority. |
Yes, we've been using RoadRunner for over a year and experienced the same thing in RoadRunner 1. Unfortunately we also had a bad configuration that lead to memory issues (no soft limit for the workers, no
No, CPU was virtually at 0%.
Linux {{hostname}} 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux We're running RoadRunner natively, so no Docker.
Yes, we have lots of libraries we're using and I'm sure the application also has various places where memory could leak. It's also possible that some endpoints exceed the exec TTL. Usually new workers start at around 100MB and get restarted after several hundred to thousand executions because they hit the memory limit. Certainly not great.
AFAIK (I'm not directly involved with the project) it happens randomly, there's not a specific time frame that passes after deployment.
Unfortunately, no.
I can't say with certainty. Admittedly we've not done our part here. Debug logging is active now and I'll report back as soon as we have some data. The website has not been down since enabling it but I see lots of these logs:
And a few of these:
Note that these logs are still from the stable release. I will switch to beta tonight when there's less traffic.
Just a few seconds. |
Got u, thanks. I've reproduced this issue, I'll try to fix it ASAP. Priority number 1 at the moment. |
Oh sorry, just saw that message now. We'll wait for the fix then and report if it resolves the issue. Thanks for the very quick reaction! We really appreciated it! |
These logs related to the client leaving the connection, and it should't be critical. The second is disturbing. @rustatian does it make sense to mark this error as a warning in the future? |
Writing twice into a closed @iluuu1994 I found the issue, the fix will be on Tuesday ( |
@rustatian @wolfy-j Great! Thanks again to both of you. Hope you have a great weekend! |
Have a wonderful weekend too, and welcome to the RR/Spiral community :) |
Hey @iluuu1994, you may safely update to the latest |
@rustatian Today was a holiday in our city so we were off. We're probably gonna upgrade tomorrow in the evening. I'll let you know how it goes 🙂 |
We deployed it yesterday, works great so far. If we encounter any issues we'll let you know. Thanks again! |
Cool, thanks 👍🏻 |
Hi! First of all, thanks for RoadRunner! ❤️ It's a great tool
We have some stability issues in one of our big websites. This has occurred many times in the past and we can't identify a clear cause for what's happening. RoadRunner suddenly stops responding to requests completely after several hours or days.
rr workers
reveals that all of the workers areready
with anEXECS
of0
. The issue never resolves on its own until we do arr reset
.I'm not sure if maybe sometimes the workers start incompletely, unable to process requests, without rr reloading them because they never reach any of the soft or hard limits. After some time they might be accumulating until there are no healthy workers left to process any requests. This is just a theory. The workers are displayed as
ready
.The version of RR used:
My
.rr.yaml
configuration is:We've just added error logging to our config now, so unfortunately I can't provide any logs yet, I will update this report as soon as we have anything.
Do you have any suspicions what might be causing this? Are there any things other than logs that we can provide that might be helpful?
Thanks for your help!
The text was updated successfully, but these errors were encountered: