-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-repair high cpu load #312
Comments
@JeffreyDevloo what happened on this env? Why was there so much maintenance work to do? Any logs you can add so we can investigate what it was doing? |
There was no need to do repair work. It's a bug in the detection of when auto-repair should happen (as evidenced by the fact that disabling auto-repair made the load go away). |
@domsj what do you want to to with this bug? Can we fix it in Fargo? Is there a workaround (disable repair?) |
I'm not exactly sure yet where the bug is, so I can't immediately fix it. (Some more code inspection might bring something up though.) |
Please add a higher priority if this would happen again. |
It happened again on a @JeffreyDevloo env, not sure what he's doing wrong ;-) |
Problem descriptionThe maintenance process is hoarding the CPU for his own. I only saw it hoarding cpu on one node though this time.
SetupSteps that I executed
When I returned I found that the maintenance was spiking in cpu usage.
In the logs I found:
This and many more Exn while repairing osd XX Temporary solution
At first the CPU usage spiket back to 350% but after around 10 minutes, the maintenance process was only using 10%. |
It's still unclear why this happened. Added some more logging that should be available in the next version |
@domsj what needs to happen with this ticket? It is in status In Progress but no one is working on it/assigned to it? |
#708 fixes a case where maintenance starts spinning while trying to repair a bucket. |
Problem description
CPU usage spiking on all nodes within a cluster. (see first picture below)
![selection_003](https://cloud.githubusercontent.com/assets/17570109/17520043/8b6c9e5c-5e4e-11e6-8651-a7ded05285cd.png)
![selection_004](https://cloud.githubusercontent.com/assets/17570109/17520182/08f33a0c-5e4f-11e6-9b47-5547bfed5c73.png)
The CPU spike is coming from alba maintenance (see second picture below)
Possible root of the problem
Unknown
Possible solution
Unknown
Temporary solution
Disabling auto-repair: alba update-maintenance-config --config etcd://127.0.0.1:2379/ovs/arakoon/vm-backend-abm/config --disable-auto-repair
Additional information
Complete log file (gzip)
alba-maintenance_vm-backend-wJ4OUV0jLiZe4P9H.log.gz
Setup
Hyperconverged setup
Package information
The text was updated successfully, but these errors were encountered: