Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long-running space reclamation task #3404

Closed
ianballou opened this issue Nov 15, 2022 · 10 comments · Fixed by #3557
Closed

Long-running space reclamation task #3404

ianballou opened this issue Nov 15, 2022 · 10 comments · Fixed by #3557
Assignees

Comments

@ianballou
Copy link

Version

  "versions": [
    {
      "component": "core",
      "version": "3.21.0",
      "package": "pulpcore"
    },
    {
      "component": "container",
      "version": "2.14.2",
      "package": "pulp-container"
    },
    {
      "component": "rpm",
      "version": "3.18.5",
      "package": "pulp-rpm"
    },
    {
      "component": "python",
      "version": "3.7.2",
      "package": "pulp-python"
    },
    {
      "component": "ostree",
      "version": "2.0.0a6",
      "package": "pulp-ostree"
    },
    {
      "component": "file",
      "version": "1.11.1",
      "package": "pulp-file"
    },
    {
      "component": "deb",
      "version": "2.20.0",
      "package": "pulp_deb"
    },
    {
      "component": "certguard",
      "version": "1.5.5",
      "package": "pulp-certguard"
    },
    {
      "component": "ansible",
      "version": "0.15.0",
      "package": "pulp-ansible"
    }
  ],

Katello nightly (4.7)

Describe the bug

I noticed my reclaim space task was taking over 20 minutes in an environment with 63 repositories and 91485 rpm content units (to give some perspective). PostgresSQL was being heavily taxed and taking 100% of one CPU core. I tried to cancel it, but the cancellation was stuck so I needed to restart Pulpcore to stop the space reclamation.

Here's the task output after it was canceled forcefully:

{
  "pulp_href": "/pulp/api/v3/tasks/bce46114-a5d9-445a-a898-217210bf1975/",
  "pulp_created": "2022-11-15T16:38:50.639518Z",
  "state": "failed",
  "name": "pulpcore.app.tasks.reclaim_space.reclaim_space",
  "logging_cid": "c658f06c-3b76-49f6-a514-b19dd3bfbe52",
  "started_at": "2022-11-15T16:38:50.688113Z",
  "finished_at": "2022-11-15T17:09:06.918179Z",
  "error": {
    "reason": "Worker has gone missing."
  },
  "worker": "/pulp/api/v3/workers/80173b0a-f731-4c7b-b3ec-ed993369044e/",
  "parent_task": null,
  "child_tasks": [],
  "task_group": null,
  "progress_reports": [],
  "created_resources": [],
  "reserved_resources_record": [
    "shared:/pulp/api/v3/repositories/rpm/rpm/4ad1fb8e-ef06-42e6-a83a-00da97551dce/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/3224e11b-ec85-4e3d-8d7b-fd44dcfd184d/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/d0f49692-31dd-4709-9e52-27be83167a3f/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/bef78a95-9555-467b-9fe6-66650c081757/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/e5838919-ba35-4497-b8a0-98c10af8941b/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/7987e671-61e6-4d07-9c9b-ca7a07367d91/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/acd01e87-640a-4584-b52f-c999e937b55f/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/b01a1f40-c195-48c0-a05c-77b7748d6338/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/504b40fe-5d7f-456e-bc95-683878609791/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/8a1a3998-ff6c-460c-b26b-010ac57023a9/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/a1a44856-a028-4a2e-a539-aa73d3ef9ff3/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/1cde5855-eab1-4ac3-ac2f-f02a22541619/",
    "shared:/pulp/api/v3/repositories/deb/apt/509de38c-7ae7-4f7b-a37c-db8404488a51/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/cdd44804-8324-48ce-9e61-4ae6770d0427/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/dfe18547-f2bf-4c41-9b9e-32d6cb1e2f5e/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/d867837e-c35f-475d-9bb5-9c9bde465b19/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/a0bcd8d6-8e6d-4e05-83d1-8cbfbc28d8d9/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/b0169f69-55cc-4ce1-830c-f444152c6853/"
  ]
}

To Reproduce
Run space reclamation on an environment with a similar amount of repositories and content to what I posted above.

Expected behavior
After chatting with @dralley it sounds like this may be slower performance than expected.

@dralley
Copy link
Contributor

dralley commented Nov 15, 2022

Obviously we should attempt to reproduce this and do some profiling, but this part of the query stands out as being a potential N+1 query situation

https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/reclaim_space.py#L50-L58

@decko decko self-assigned this Jan 11, 2023
@dralley
Copy link
Contributor

dralley commented Jan 18, 2023

@ianballou Just for context, do you remember if this was on a system backed by SSD or HDD?

@ianballou
Copy link
Author

@dralley this was a Katello VM running on an SSD.

@ianballou
Copy link
Author

Also at the time, it seemed like the task was completely locked into some postgres query. I couldn't even cancel it.

@dralley
Copy link
Contributor

dralley commented Jan 18, 2023

Note: the cancellation issue is filed separately here #3407 and other users have hit it too, it's not a total one-off.

@decko
Copy link
Member

decko commented Jan 30, 2023

A quick update:
1 - We tested it locally cloning the Fedora 37 repo, about 60-70GB, and them cloning it to 70 repos. After that, we just called the reclaim_space_task. Tried this a couple of times (downloaded over 1TB along the week) and the issue have not triggered.
2 - @ianballou started a Katello VM, with aprox. 35GB of Pulp repos. We followed the resource utilization and called the reclaim_space_task. Again, things ran smoothly.

After checking some user reports we verified that this was possibly triggered by a low-memory situation.
Make sense to open a new issue or continue this one to find which was this low-memory value? @ianballou @dralley

@dralley
Copy link
Contributor

dralley commented Jan 30, 2023

@ianballou @decko Which user reports led to the conclusion it may be memory related? (Partly I just want to make sure they get linked up because I've only seen one or two that aren't easily visible from this issue and I don't recall seeing anything there. Not doubting the conclusion.)

Do the profiles of where the task is spending its time (regardless of whether it actually takes a long time) show anything interesting?

@dralley
Copy link
Contributor

dralley commented Jan 30, 2023

Oh also there is a setting you can enable to plot the memory usage of tasks over time, I am not sure if this was just a general memory usage issue or one which was related to this specific task, but it can be useful in cases where you think a task might be problematic.

https://github.com/pulp/pulpcore/blob/main/docs/configuration/settings.rst#task_diagnostics

Sidebar: maybe we could extend that to also log system memory consumption and perhaps even swap and plot them alongside the task memory consumption? That seems like a useful ability.

@ianballou
Copy link
Author

@ianballou @decko Which user reports led to the conclusion it may be memory related? (Partly I just want to make sure they get linked up because I've only seen one or two that aren't easily visible from this issue and I don't recall seeing anything there. Not doubting the conclusion.)

@dralley it occurred in a private discussion, the gist of it was that increasing RAM solved the problem. PM me for more details if you'd like.

@decko
Copy link
Member

decko commented Feb 6, 2023

@ianballou @decko Which user reports led to the conclusion it may be memory related? (Partly I just want to make sure they get linked up because I've only seen one or two that aren't easily visible from this issue and I don't recall seeing anything there. Not doubting the conclusion.)

Do the profiles of where the task is spending its time (regardless of whether it actually takes a long time) show anything interesting?

Not so far. Also, I just changed a query to have a select_related statement to avoid a N+1 situation, but I didn't saw any relevant change on the profiling.

decko added a commit to decko/pulpcore that referenced this issue Feb 8, 2023
decko added a commit to decko/pulpcore that referenced this issue Feb 8, 2023
decko added a commit to decko/pulpcore that referenced this issue Feb 10, 2023
decko added a commit to decko/pulpcore that referenced this issue Feb 10, 2023
decko added a commit to decko/pulpcore that referenced this issue Feb 10, 2023
decko added a commit to decko/pulpcore that referenced this issue Feb 10, 2023
decko added a commit to decko/pulpcore that referenced this issue Feb 10, 2023
decko added a commit to decko/pulpcore that referenced this issue Feb 13, 2023
decko added a commit to decko/pulpcore that referenced this issue Feb 13, 2023
ggainey pushed a commit that referenced this issue Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants