-
-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
root: fix 100% CPU for worker container (#7025) #7762
base: main
Are you sure you want to change the base?
Conversation
Some linux users (arch linux, for example) are running docker with default service file that set NOFILES to infiite, which will cause celery to hang for hours to days taking 100% CPU to close all fds by enumerating from NOFILES to 3. This commit override ulimit for container without touching user docker service configuration. Signed-off-by: DKing <15340687+DKingAlpha@users.noreply.github.com>
❌ Deploy Preview for authentik-storybook failed.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks for digging into this! I'd like to get confirmation from some more people in #7025 if it fixes their issue before merging this. Also we should probably add a similar thing for the helm chart (https://github.com/goauthentik/helm) if possible
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #7762 +/- ##
==========================================
+ Coverage 92.62% 92.64% +0.02%
==========================================
Files 588 588
Lines 29141 29141
==========================================
+ Hits 26991 26997 +6
+ Misses 2150 2144 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Tried to ping someone, waiting for feedback. Meanwhile you can easily reproduce by changing the ulimit to 0x3ffffff8 in decimal. That should kinda prove it. |
For reference: Here's what I have at home:
And we have the same in production at authentik |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes the issue for me.
OS: clear-linux
# ulimit -Sn
1024
# ulimit -Hn
524288
Any idea someone how I can change this for my Authentik worker on unraid? It’s been running high cpu usage for days now? Help is appreciated. Reporting back (unraid-solved): |
Do you know why setting ulimit to a larger number fixes the issue? Is it an issue in Celery or Authentik? |
With #7810, #8440 and #7813 this shouldn't be an issue anymore, could you check this again with 2024.2.2 @cenkalti @mobiledude @Leptopoda @DKingAlpha |
Thanks for coming back. |
I still got high cpu usage with latest 2024.2.2. Adding ulimit back to compose.yml fixed the issue for me. live py profiler py-spy is incompatible with recent py3.12, I will find another way to identify the issue when I have time. |
I still experience high CPU usage with the latest 2024.2.2 version. However, I was able to resolve the issue by adding |
I can confirm that it fixed my setup too so its imho worth merging 🎉 |
New authentik user here. Tried re-setting Redis, tried setting ulimits in docker-compose, unfortunately CPU still spikes at 100%. After some more troubleshooting I did increase the RAM allocation to the VM (while still leaving custom ulimits in pace), and suddenly it all started to work - no CPU spikes |
Same here with |
Adding ulimits back to compose fixed my issue on |
For context, the reason why we haven't merged this PR:
|
I can confirm this fixed my issue for docker running on Oracle Linux 9.4 |
Just deployed a fresh compose install on current Arch linux using I'm not experiencing this in k8s running on Talos. |
Thanks! This fixed the issue for me as well. |
Some linux users (arch linux, for example) are running docker with default service file that set NOFILES to infiite, which will cause celery to hang for hours to days taking 100% CPU to close all fds by enumerating from NOFILES to 3.
This commit override ulimit for container without touching user docker service configuration.
For details see #7025