New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long-running space reclamation task #3404
Comments
Obviously we should attempt to reproduce this and do some profiling, but this part of the query stands out as being a potential N+1 query situation https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/reclaim_space.py#L50-L58 |
@ianballou Just for context, do you remember if this was on a system backed by SSD or HDD? |
@dralley this was a Katello VM running on an SSD. |
Also at the time, it seemed like the task was completely locked into some postgres query. I couldn't even cancel it. |
Note: the cancellation issue is filed separately here #3407 and other users have hit it too, it's not a total one-off. |
A quick update: After checking some user reports we verified that this was possibly triggered by a low-memory situation. |
@ianballou @decko Which user reports led to the conclusion it may be memory related? (Partly I just want to make sure they get linked up because I've only seen one or two that aren't easily visible from this issue and I don't recall seeing anything there. Not doubting the conclusion.) Do the profiles of where the task is spending its time (regardless of whether it actually takes a long time) show anything interesting? |
Oh also there is a setting you can enable to plot the memory usage of tasks over time, I am not sure if this was just a general memory usage issue or one which was related to this specific task, but it can be useful in cases where you think a task might be problematic. https://github.com/pulp/pulpcore/blob/main/docs/configuration/settings.rst#task_diagnostics Sidebar: maybe we could extend that to also log system memory consumption and perhaps even swap and plot them alongside the task memory consumption? That seems like a useful ability. |
@dralley it occurred in a private discussion, the gist of it was that increasing RAM solved the problem. PM me for more details if you'd like. |
Not so far. Also, I just changed a query to have a select_related statement to avoid a N+1 situation, but I didn't saw any relevant change on the profiling. |
Version
Katello nightly (4.7)
Describe the bug
I noticed my reclaim space task was taking over 20 minutes in an environment with 63 repositories and 91485 rpm content units (to give some perspective). PostgresSQL was being heavily taxed and taking 100% of one CPU core. I tried to cancel it, but the cancellation was stuck so I needed to restart Pulpcore to stop the space reclamation.
Here's the task output after it was canceled forcefully:
To Reproduce
Run space reclamation on an environment with a similar amount of repositories and content to what I posted above.
Expected behavior
After chatting with @dralley it sounds like this may be slower performance than expected.
The text was updated successfully, but these errors were encountered: