-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
karma hangs after X time with to many files again? #2944
Comments
I was able to reproduce a hang when working on #2888 but I with that fixed I can't anymore. |
Karma stops responding on /metrics when this happens. When silently leaks sock according to lsof until it reaches limit and spams to many files error in logs. From stop responding to spamming errors took a few days. We have 3 instances so we can afford to let a non working one be running to troubleshoot. This is probably a more rare deadlock since it i |
I don’t see any leakage so there’s a chance client connections are not being closed. |
Is there a possible deadlock in https://github.com/prymitive/karma/blob/main/internal/alertmanager/models.go#L498 ? |
That seems likely, good catch |
Merged a fix for that, let me know if you still see any issues, thanks! |
Is might be related to #2888 but this time it took a while longer to stopp working. Still have 24 alertmanagers and the interval is now 30s with timeout 10s on each one.
Let me know if you need more info.
lsof is growing socks as last time:
and its growing. Will probably start responding with errors when it hits 65k.
Its the same symptoms as #2888
The text was updated successfully, but these errors were encountered: