-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing PulpTemporaryFile #793
Conversation
Attached issue: https://pulp.plan.io/issues/6749 |
b84c53f
to
d6de173
Compare
pulpcore/app/models/content.py
Outdated
expected_digests = {"sha256": hashers["sha256"].hexdigest()} | ||
|
||
if Artifact.objects.filter(**expected_digests).count(): | ||
raise ValidationError(_("Artifact already exists.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that temporary files can be created even if there are Artifacts that already exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because of this test:
https://github.com/pulp/pulp_ansible/blob/master/pulp_ansible/tests/functional/api/collection/v2/test_upload.py#L51-L52
I can update the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think the test makes sense but that we should probably leave it up to plugins to check whether an Artifact already exists before creating a PulpTemporaryFile. The thought being that some plugins might want to still create a PulpTemporaryFile in some cases even if the Artifact exists. But maybe this is not a valid use case.
cc @bmbouter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be addressed. @bmbouter any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I expect a new PulpTemporaryFile
would still be created without error even there is a corresponding Artifact that already exists also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so what should I do with this test?
https://github.com/pulp/pulp_ansible/blob/master/pulp_ansible/tests/functional/api/collection/v2/test_upload.py#L51-L52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that test should receive an error on the task later when it realizes the Artifact sha256 already exists. So it would monitor the task and see "oh the task failed w/ a new error message". I think it will fail with a duplicate key on the sha256 when the Artifact.save() is called.
Looks good. Don't forget to add some plugin writer docs. |
self.file.delete(save=False) | ||
|
||
@staticmethod | ||
def init_and_validate(file, expected_digests=None, expected_size=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't imagine we would have this method at all. To me, plugin writers will save a file, use the file, delete the file. If a user case comes up later we could add something then. @fao89 and @daviddavis what do you both think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine but I think the import code is using this to check the checksum digest. We'll have to move that code into pulp_ansible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it because it validates the checksum, and I like to raise failures the earliest possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK that's understandable. The internals need to change then though because it shouldn't be using Artifact
in any places.
pulpcore/app/models/content.py
Outdated
|
||
return PulpTemporaryFile(file=file) | ||
|
||
def to_artifact(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this method? Maybe we do I'm not sure. I was thinking users would pass their file handle directly in as PulpTemporaryFile(file=my_file).save()
. What would happen if we deleted this, and init_and_validate
and then called it with PulpTemporaryFile(file=my_file).save()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems pretty handy to have a method that converts a PulpTemporaryFile into an Artifact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init_and_validate
is validating checkums
and size
, so it would fail at some point when turning PulpTemporaryFile
into Artifact
, I'm trying to raise the failures earlier.
to_artifact
is just a convenient method for not forgetting to call delete
when you move PulpTemporaryFile
to Artifact
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup I didn't understand it; I thought it was part of the internals. I see how it's used in the pulp_ansible PR now, and in seeing the usage I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose it could be a @classmethod on Artifact, a from_pulp_temporary_file(temp_file)
. Though the code is about equal parts Artifact and PulpTemporaryFile, so doesn't really matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artifact
is not accepting PulpTemporaryFile
yet, this is alikins suggestion - I can implement it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I thought it did from reading this docstring https://github.com/pulp/pulpcore/pull/793/files but I see now in the code it doesn't. +1 to having it take a PulpTemporaryFile
also just to make the existing interface even more usable.
Can a new docs section be added indicating to plugin writers that this is available for passing file data to one or more tasks with some examples? I think it would go in a new subsection here. I think that would be named |
72a3631
to
3935b7b
Compare
Is this temporary file using the django storage framework (and i think it should, because there are usecases without any other shared storage.)? |
032702e
to
4608b52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a typo, a question and some suggestions.
But generally i really like this!
|
||
.. code-block:: python | ||
|
||
# Example 1 - Saving a temporary file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we suggest that this is usually in the viewset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how can I express it, any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe like
# Example:
# (in the view)
# variant 1 - Saving a temporary file:
...
# variant 2 - --"-- with additional validation:
...
# (in the task)
# Using the temporary file to create an artifact:
...
while True: | ||
chunk = f.read(1048576) # 1 megabyte | ||
if not chunk: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while True: | |
chunk = f.read(1048576) # 1 megabyte | |
if not chunk: | |
break | |
for chunk in iter(lambda: f.read(1048576), b""): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the lambda is less readable
pulpcore/app/models/content.py
Outdated
file = fields.ArtifactFileField(null=False, upload_to=storage_path, max_length=255) | ||
|
||
@staticmethod | ||
def init_and_validate(file, expected_digests=None, expected_size=None, validate_artifact=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether this need to be it's own function, or if we can fold these optional validations in the __init__
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my idea was to mimic Artifact
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What i see is that if both of the expected_*
parameters are omitted from the call this function is really the same as just instanciating a PulpTemporaryFile
the old fashioned way.
Also if there is no expected digests, we are consuming reading the whole file for no additional benefit. Can you add a if expected_digests:
there?
71ebfb1
to
a1dad08
Compare
docs/glossary.rst
Outdated
@@ -7,6 +7,10 @@ Glossary | |||
A file. They usually belong to a :term:`content unit<Content>` but may be used | |||
elsewhere (e.g. for PublishedArtifacts). | |||
|
|||
:class:`~pulpcore.app.models.PulpTemporaryFile` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this glossary is for end-user facing terms and I don't believe they would know about this. So my suggestion is to remove this from the glossary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Thank you @fao89 !
pulpcore/app/models/content.py
Outdated
""" | ||
return storage.get_temp_file_path(self.pulp_id) | ||
|
||
file = fields.ArtifactFileField(null=False, upload_to=storage_path, max_length=255) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This being ArtifactFileField
instead of FileField
will cause this extra functionality to run. Is that important or is FileField
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the docstring I guess we don't need ArtifactFileField
https://pulp.plan.io/issues/6749
closes #6749
Please be sure you have read our documentation on creating PRs:
https://docs.pulpproject.org/contributing/pull-request-walkthrough.html