Long-running space reclamation task #3404

ianballou · 2022-11-15T17:37:13Z

Version

  "versions": [
    {
      "component": "core",
      "version": "3.21.0",
      "package": "pulpcore"
    },
    {
      "component": "container",
      "version": "2.14.2",
      "package": "pulp-container"
    },
    {
      "component": "rpm",
      "version": "3.18.5",
      "package": "pulp-rpm"
    },
    {
      "component": "python",
      "version": "3.7.2",
      "package": "pulp-python"
    },
    {
      "component": "ostree",
      "version": "2.0.0a6",
      "package": "pulp-ostree"
    },
    {
      "component": "file",
      "version": "1.11.1",
      "package": "pulp-file"
    },
    {
      "component": "deb",
      "version": "2.20.0",
      "package": "pulp_deb"
    },
    {
      "component": "certguard",
      "version": "1.5.5",
      "package": "pulp-certguard"
    },
    {
      "component": "ansible",
      "version": "0.15.0",
      "package": "pulp-ansible"
    }
  ],

Katello nightly (4.7)

Describe the bug

I noticed my reclaim space task was taking over 20 minutes in an environment with 63 repositories and 91485 rpm content units (to give some perspective). PostgresSQL was being heavily taxed and taking 100% of one CPU core. I tried to cancel it, but the cancellation was stuck so I needed to restart Pulpcore to stop the space reclamation.

Here's the task output after it was canceled forcefully:

{
  "pulp_href": "/pulp/api/v3/tasks/bce46114-a5d9-445a-a898-217210bf1975/",
  "pulp_created": "2022-11-15T16:38:50.639518Z",
  "state": "failed",
  "name": "pulpcore.app.tasks.reclaim_space.reclaim_space",
  "logging_cid": "c658f06c-3b76-49f6-a514-b19dd3bfbe52",
  "started_at": "2022-11-15T16:38:50.688113Z",
  "finished_at": "2022-11-15T17:09:06.918179Z",
  "error": {
    "reason": "Worker has gone missing."
  },
  "worker": "/pulp/api/v3/workers/80173b0a-f731-4c7b-b3ec-ed993369044e/",
  "parent_task": null,
  "child_tasks": [],
  "task_group": null,
  "progress_reports": [],
  "created_resources": [],
  "reserved_resources_record": [
    "shared:/pulp/api/v3/repositories/rpm/rpm/4ad1fb8e-ef06-42e6-a83a-00da97551dce/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/3224e11b-ec85-4e3d-8d7b-fd44dcfd184d/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/d0f49692-31dd-4709-9e52-27be83167a3f/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/bef78a95-9555-467b-9fe6-66650c081757/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/e5838919-ba35-4497-b8a0-98c10af8941b/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/7987e671-61e6-4d07-9c9b-ca7a07367d91/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/acd01e87-640a-4584-b52f-c999e937b55f/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/b01a1f40-c195-48c0-a05c-77b7748d6338/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/504b40fe-5d7f-456e-bc95-683878609791/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/8a1a3998-ff6c-460c-b26b-010ac57023a9/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/a1a44856-a028-4a2e-a539-aa73d3ef9ff3/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/1cde5855-eab1-4ac3-ac2f-f02a22541619/",
    "shared:/pulp/api/v3/repositories/deb/apt/509de38c-7ae7-4f7b-a37c-db8404488a51/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/cdd44804-8324-48ce-9e61-4ae6770d0427/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/dfe18547-f2bf-4c41-9b9e-32d6cb1e2f5e/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/d867837e-c35f-475d-9bb5-9c9bde465b19/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/a0bcd8d6-8e6d-4e05-83d1-8cbfbc28d8d9/",
    "shared:/pulp/api/v3/repositories/rpm/rpm/b0169f69-55cc-4ce1-830c-f444152c6853/"
  ]
}

To Reproduce
Run space reclamation on an environment with a similar amount of repositories and content to what I posted above.

Expected behavior
After chatting with @dralley it sounds like this may be slower performance than expected.

The text was updated successfully, but these errors were encountered:

dralley · 2022-11-15T19:09:16Z

Obviously we should attempt to reproduce this and do some profiling, but this part of the query stands out as being a potential N+1 query situation

https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/reclaim_space.py#L50-L58

dralley · 2023-01-18T20:17:44Z

@ianballou Just for context, do you remember if this was on a system backed by SSD or HDD?

ianballou · 2023-01-18T20:22:02Z

@dralley this was a Katello VM running on an SSD.

ianballou · 2023-01-18T20:22:58Z

Also at the time, it seemed like the task was completely locked into some postgres query. I couldn't even cancel it.

dralley · 2023-01-18T20:24:18Z

Note: the cancellation issue is filed separately here #3407 and other users have hit it too, it's not a total one-off.

decko · 2023-01-30T19:53:48Z

A quick update:
1 - We tested it locally cloning the Fedora 37 repo, about 60-70GB, and them cloning it to 70 repos. After that, we just called the reclaim_space_task. Tried this a couple of times (downloaded over 1TB along the week) and the issue have not triggered.
2 - @ianballou started a Katello VM, with aprox. 35GB of Pulp repos. We followed the resource utilization and called the reclaim_space_task. Again, things ran smoothly.

After checking some user reports we verified that this was possibly triggered by a low-memory situation.
Make sense to open a new issue or continue this one to find which was this low-memory value? @ianballou @dralley

dralley · 2023-01-30T20:10:35Z

@ianballou @decko Which user reports led to the conclusion it may be memory related? (Partly I just want to make sure they get linked up because I've only seen one or two that aren't easily visible from this issue and I don't recall seeing anything there. Not doubting the conclusion.)

Do the profiles of where the task is spending its time (regardless of whether it actually takes a long time) show anything interesting?

dralley · 2023-01-30T20:15:30Z

Oh also there is a setting you can enable to plot the memory usage of tasks over time, I am not sure if this was just a general memory usage issue or one which was related to this specific task, but it can be useful in cases where you think a task might be problematic.

https://github.com/pulp/pulpcore/blob/main/docs/configuration/settings.rst#task_diagnostics

Sidebar: maybe we could extend that to also log system memory consumption and perhaps even swap and plot them alongside the task memory consumption? That seems like a useful ability.

ianballou · 2023-01-30T20:59:23Z

@ianballou @decko Which user reports led to the conclusion it may be memory related? (Partly I just want to make sure they get linked up because I've only seen one or two that aren't easily visible from this issue and I don't recall seeing anything there. Not doubting the conclusion.)

@dralley it occurred in a private discussion, the gist of it was that increasing RAM solved the problem. PM me for more details if you'd like.

decko · 2023-02-06T20:14:27Z

@ianballou @decko Which user reports led to the conclusion it may be memory related? (Partly I just want to make sure they get linked up because I've only seen one or two that aren't easily visible from this issue and I don't recall seeing anything there. Not doubting the conclusion.)

Do the profiles of where the task is spending its time (regardless of whether it actually takes a long time) show anything interesting?

Not so far. Also, I just changed a query to have a select_related statement to avoid a N+1 situation, but I didn't saw any relevant change on the profiling.

Closes pulp#3404

Closes #3404

ianballou added Issue Triage-Needed labels Nov 15, 2022

dkliban added prio-list and removed Triage-Needed labels Nov 29, 2022

decko self-assigned this Jan 11, 2023

decko mentioned this issue Feb 8, 2023

Fixes a possible N+1 issue within reclaim_space task #3557

Merged

decko added a commit to decko/pulpcore that referenced this issue Feb 8, 2023

Fixes a possible N+1 issue

95c8ff4

Closes pulp#3404

decko added a commit to decko/pulpcore that referenced this issue Feb 8, 2023

Fixes a possible N+1 issue

c8992d7

Closes pulp#3404

decko added a commit to decko/pulpcore that referenced this issue Feb 10, 2023

Fixes a possible N+1 issue

29944f0

Closes pulp#3404

decko added a commit to decko/pulpcore that referenced this issue Feb 10, 2023

Fixes a possible N+1 issue

4b52444

Closes pulp#3404

decko added a commit to decko/pulpcore that referenced this issue Feb 10, 2023

Fixes a possible N+1 issue

29bc46c

Closes pulp#3404

decko added a commit to decko/pulpcore that referenced this issue Feb 10, 2023

Fixes a possible N+1 issue

3504991

Closes pulp#3404

decko added a commit to decko/pulpcore that referenced this issue Feb 10, 2023

Fixes a possible N+1 issue

0aea607

Closes pulp#3404

decko added a commit to decko/pulpcore that referenced this issue Feb 13, 2023

Fixes a possible N+1 issue

519be59

Closes pulp#3404

decko added a commit to decko/pulpcore that referenced this issue Feb 13, 2023

Fixes a possible N+1 issue

b8b2dc8

Closes pulp#3404

ggainey closed this as completed in #3557 Feb 13, 2023

ggainey pushed a commit that referenced this issue Feb 13, 2023

Fixes a possible N+1 issue

f4cf56b

Closes #3404

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long-running space reclamation task #3404

Long-running space reclamation task #3404

ianballou commented Nov 15, 2022

dralley commented Nov 15, 2022

dralley commented Jan 18, 2023

ianballou commented Jan 18, 2023

ianballou commented Jan 18, 2023

dralley commented Jan 18, 2023

decko commented Jan 30, 2023

dralley commented Jan 30, 2023

dralley commented Jan 30, 2023 •

edited

ianballou commented Jan 30, 2023

decko commented Feb 6, 2023

Long-running space reclamation task #3404

Long-running space reclamation task #3404

Comments

ianballou commented Nov 15, 2022

dralley commented Nov 15, 2022

dralley commented Jan 18, 2023

ianballou commented Jan 18, 2023

ianballou commented Jan 18, 2023

dralley commented Jan 18, 2023

decko commented Jan 30, 2023

dralley commented Jan 30, 2023

dralley commented Jan 30, 2023 • edited

ianballou commented Jan 30, 2023

decko commented Feb 6, 2023

dralley commented Jan 30, 2023 •

edited