Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete associated files from the storage #844

Merged
merged 1 commit into from Aug 20, 2020

Conversation

lubosmj
Copy link
Member

@lubosmj lubosmj commented Aug 12, 2020

closes #7316

@pulpbot
Copy link
Member

pulpbot commented Aug 12, 2020

Attached issue: https://pulp.plan.io/issues/7316

@@ -299,6 +301,14 @@ def storage_path(self, name):

file = models.FileField(null=False, upload_to=storage_path, max_length=255)

@hook(BEFORE_DELETE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be after delete. I'm sorry. You will end in an inconsistent state in case of a rolled_back transaction.

what about:

def delete(self):
    with transaction.atomic() as ta:
        ta.on_commit(lambda: delete_from_storage(self.file.name))
        super().delete()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering django didn't come up with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that when the model is being removed because of ondelete.CASCADE, then the method model.delete() is not invoked: https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.CASCADE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like django-cleanup resolves all the mentioned issues.

@lubosmj lubosmj marked this pull request as draft August 12, 2020 17:16
@lubosmj lubosmj changed the title Catch the pre_delete signal and delete associated files from storage Delete associated files from the storage Aug 13, 2020
@lubosmj lubosmj force-pushed the handle-files-removal-7316 branch 3 times, most recently from 107c69b to acf7266 Compare August 13, 2020 10:50
@lubosmj lubosmj marked this pull request as ready for review August 13, 2020 10:51
@@ -4,6 +4,7 @@ aiofiles
backoff
Django~=2.2.14 # LTS version, switch only if we have a compelling reason to
django-currentuser~=0.5.1
django-cleanup~=5.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with cascaded delete?
+1 for not reinventing it

Copy link
Member Author

@lubosmj lubosmj Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works in both cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A drawback (or perhaps benefit) of django-cleanup is devs/plugin writers must specify a decorator if they do not want their FileField files deleted. I'm conflicted about this approach because while I don't like the implicit nature of it, I do find it surprising that django doesn't delete files when models get deleted.

CC @pulp/core for more visibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And given @lubosmj 's insight it will also work properly for cascaded deletes.

@@ -65,6 +65,8 @@
"rest_framework",
# pulp core app
"pulpcore.app",
# the cleanup config has to be placed last
'django_cleanup.apps.CleanupConfig',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black .

a temporary file. It also accepts `class:django.core.files.File`.
file (models.FileField): The stored file. This field accepts `class:django.core.files.File`.
The field can be set using an absolute path to a temporary file. Note that the content
of the temporary file will not be automatically copied to the storage backend.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it transferred then?
Maybe:
Note that the content of the temporary file will only copied to the storage backend once save() is called. It will be immutable afterwards.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not copied. It is not transferred at all.

In [1]: import tempfile                                                                                               

In [2]: with tempfile.NamedTemporaryFile("ab") as tf: 
   ...:     temp_file = PulpTemporaryFile(file=tf.name) 
   ...:     temp_file.save() 
   ...:                                                                                                               

In [3]: PulpTemporaryFile.objects.all()                                                                               
Out[3]: <QuerySet [<PulpTemporaryFile: pk=db5b57ba-1f65-4f0c-85b5-6d01c84b7d77>]>

In [4]: PulpTemporaryFile.objects.all()[0].file                                                                       
Out[4]: <FieldFile: /tmp/tmp7ttftw60>

In [5]: PulpTemporaryFile.objects.all()[0].file.read()
OSError: File does not exist: tmp/tmp7ttftw60

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Isn't this how it's supposed to work in the first place? Also should the file not be in the django-storage location after saving?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also i get:
SuspiciousFileOperation: The joined path (/tmp/tmpxko75ex9) is located outside of the base path component (/var/lib/pulp)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, so this code does nothing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use ArtifactFileField then it should move it into place. I'll open an issue for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to have FileField instead: #793 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the example we provide does not even work?
How is this PulpTemporaryFile supposed to be used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this works if you pass in a File object (or rather a PulpTemporaryUploadedFile). What doesn't work is if you pass in a file path to a file on disk. We need to fix the latter.

@@ -0,0 +1 @@
Added the django-cleanup handlers for removing temporary files from the storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably make this a .removal or .feature. Or better yet, also add a .feature saying that files for FileFields will automatically be cleaned up using django-cleanup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

@daviddavis
Copy link
Contributor

Also, I think we can remove these methods with this change:

https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/upload.py#L49-L58

https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/content.py#L98-L107

@lubosmj lubosmj force-pushed the handle-files-removal-7316 branch 2 times, most recently from 03bf4f7 to 62433a1 Compare August 19, 2020 19:03
@daviddavis daviddavis merged commit f508d6b into pulp:master Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants