-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cleanup tasks to remove stale objects #3950
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Added periodically executed cleanup tasks for uploads and temporary files. Configure a time | ||
interval in ``UPLOAD_PROTECTION_TIME`` or ``TMPFILE_PROTECTION_TIME`` to activate. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -240,8 +240,10 @@ | |
|
||
WORKER_TTL = 30 | ||
|
||
# how long to protect orphan content in minutes | ||
# how long to protect ephemeral items in minutes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. especially this comment suggests that having |
||
ORPHAN_PROTECTION_TIME = 24 * 60 | ||
UPLOAD_PROTECTION_TIME = 0 | ||
TMPFILE_PROTECTION_TIME = 0 | ||
|
||
REMOTE_USER_ENVIRON_NAME = "REMOTE_USER" | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,15 @@ | ||
import gc | ||
|
||
from django.conf import settings | ||
from django.utils import timezone | ||
|
||
from pulpcore.app.models import ( | ||
Artifact, | ||
Content, | ||
ProgressReport, | ||
PublishedMetadata, | ||
PulpTemporaryFile, | ||
Upload, | ||
) | ||
|
||
|
||
|
@@ -41,56 +44,58 @@ def orphan_cleanup(content_pks=None, orphan_protection_time=settings.ORPHAN_PROT | |
content_pks (list): A list of content pks. If specified, only remove these orphans. | ||
|
||
""" | ||
progress_bar = ProgressReport( | ||
with ProgressReport( | ||
message="Clean up orphan Content", | ||
total=0, | ||
total=None, | ||
code="clean-up.content", | ||
done=0, | ||
state="running", | ||
) | ||
|
||
while True: | ||
content = Content.objects.orphaned(orphan_protection_time, content_pks).exclude( | ||
pulp_type=PublishedMetadata.get_pulp_type() | ||
) | ||
content_count = content.count() | ||
if not content_count: | ||
break | ||
|
||
progress_bar.total += content_count | ||
progress_bar.save() | ||
|
||
# delete the content | ||
for c in queryset_iterator(content): | ||
progress_bar.increase_by(c.count()) | ||
c.delete() | ||
|
||
progress_bar.state = "completed" | ||
progress_bar.save() | ||
) as progress_bar: | ||
while True: | ||
content = Content.objects.orphaned(orphan_protection_time, content_pks).exclude( | ||
pulp_type=PublishedMetadata.get_pulp_type() | ||
) | ||
content_count = content.count() | ||
if not content_count: | ||
break | ||
|
||
# delete the content | ||
for c in queryset_iterator(content): | ||
progress_bar.increase_by(c.count()) | ||
c.delete() | ||
|
||
# delete the artifacts that don't belong to any content | ||
artifacts = Artifact.objects.orphaned(orphan_protection_time) | ||
|
||
progress_bar = ProgressReport( | ||
with ProgressReport( | ||
message="Clean up orphan Artifacts", | ||
total=artifacts.count(), | ||
code="clean-up.content", | ||
done=0, | ||
state="running", | ||
) | ||
progress_bar.save() | ||
|
||
counter = 0 | ||
interval = 100 | ||
for artifact in artifacts.iterator(): | ||
# we need to manually call delete() because it cleans up the file on the filesystem | ||
artifact.delete() | ||
progress_bar.done += 1 | ||
counter += 1 | ||
|
||
if counter >= interval: | ||
progress_bar.save() | ||
counter = 0 | ||
|
||
progress_bar.state = "completed" | ||
progress_bar.save() | ||
code="clean-up.artifacts", | ||
) as progress_bar: | ||
for artifact in progress_bar.iter(artifacts.iterator()): | ||
# we need to manually call delete() because it cleans up the file on the filesystem | ||
artifact.delete() | ||
|
||
|
||
def upload_cleanup(): | ||
assert settings.UPLOAD_PROTECTION_TIME > 0 | ||
expiration = timezone.now() - timezone.timedelta(minutes=settings.UPLOAD_PROTECTION_TIME) | ||
qs = Upload.objects.filter(pulp_created__lt=expiration) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not as such. Having it 0 will unschedule the task, but yes, there may be a window where things go south. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is now checked to be > 0. |
||
with ProgressReport( | ||
message="Clean up uploads", | ||
total=qs.count(), | ||
code="clean-up.uploads", | ||
) as pr: | ||
for upload in pr.iter(qs): | ||
upload.delete() | ||
|
||
|
||
def tmpfile_cleanup(): | ||
assert settings.TMPFILE_PROTECTION_TIME > 0 | ||
expiration = timezone.now() - timezone.timedelta(minutes=settings.TMPFILE_PROTECTION_TIME) | ||
qs = PulpTemporaryFile.objects.filter(pulp_created__lt=expiration) | ||
with ProgressReport( | ||
message="Clean up shared temporary files", | ||
total=qs.count(), | ||
code="clean-up.tmpfiles", | ||
) as pr: | ||
for tmpfile in pr.iter(qs): | ||
tmpfile.delete() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for my late 2c, but this a bit contradicts( or confuses?) to our similar setting orphan_protection_time. when that one set to 0 it means the orphans are not protected not that setting/cleanup is disabled .
How can we make this more consistent? The naming for all these 3 settings suggests they work in a similar way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a contradiction, but i can see it being confusing. And I must confess, being similar is what inspired me in the first place to use similar names. The biggest difference is probably that we do not have a scheduled task for orphan cleanup (not that this would be impossible...).
Is there any way you think we could improve the documentation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear where you're coming from.
Unfortunately, I don't see what can be improved in the documentation. Documentation is well written per se, and if users read it, it's ok even to have behavioral differences between the settings which have similar name.