Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing PulpTemporaryFile #793

Merged
merged 1 commit into from
Jul 22, 2020
Merged

Introducing PulpTemporaryFile #793

merged 1 commit into from
Jul 22, 2020

Conversation

fao89
Copy link
Member

@fao89 fao89 commented Jul 17, 2020

https://pulp.plan.io/issues/6749
closes #6749

Please be sure you have read our documentation on creating PRs:
https://docs.pulpproject.org/contributing/pull-request-walkthrough.html

@pulpbot
Copy link
Member

pulpbot commented Jul 17, 2020

Attached issue: https://pulp.plan.io/issues/6749

@fao89 fao89 force-pushed the 6749 branch 2 times, most recently from b84c53f to d6de173 Compare July 18, 2020 04:35
pulpcore/app/models/content.py Outdated Show resolved Hide resolved
expected_digests = {"sha256": hashers["sha256"].hexdigest()}

if Artifact.objects.filter(**expected_digests).count():
raise ValidationError(_("Artifact already exists."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that temporary files can be created even if there are Artifacts that already exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think the test makes sense but that we should probably leave it up to plugins to check whether an Artifact already exists before creating a PulpTemporaryFile. The thought being that some plugins might want to still create a PulpTemporaryFile in some cases even if the Artifact exists. But maybe this is not a valid use case.

cc @bmbouter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be addressed. @bmbouter any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I expect a new PulpTemporaryFile would still be created without error even there is a corresponding Artifact that already exists also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that test should receive an error on the task later when it realizes the Artifact sha256 already exists. So it would monitor the task and see "oh the task failed w/ a new error message". I think it will fail with a duplicate key on the sha256 when the Artifact.save() is called.

CHANGES/6749.feature Outdated Show resolved Hide resolved
@daviddavis
Copy link
Contributor

Looks good. Don't forget to add some plugin writer docs.

self.file.delete(save=False)

@staticmethod
def init_and_validate(file, expected_digests=None, expected_size=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't imagine we would have this method at all. To me, plugin writers will save a file, use the file, delete the file. If a user case comes up later we could add something then. @fao89 and @daviddavis what do you both think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine but I think the import code is using this to check the checksum digest. We'll have to move that code into pulp_ansible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it because it validates the checksum, and I like to raise failures the earliest possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that's understandable. The internals need to change then though because it shouldn't be using Artifact in any places.


return PulpTemporaryFile(file=file)

def to_artifact(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this method? Maybe we do I'm not sure. I was thinking users would pass their file handle directly in as PulpTemporaryFile(file=my_file).save(). What would happen if we deleted this, and init_and_validate and then called it with PulpTemporaryFile(file=my_file).save()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems pretty handy to have a method that converts a PulpTemporaryFile into an Artifact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init_and_validate is validating checkums and size, so it would fail at some point when turning PulpTemporaryFile into Artifact, I'm trying to raise the failures earlier.
to_artifact is just a convenient method for not forgetting to call delete when you move PulpTemporaryFile to Artifact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I didn't understand it; I thought it was part of the internals. I see how it's used in the pulp_ansible PR now, and in seeing the usage I agree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose it could be a @classmethod on Artifact, a from_pulp_temporary_file(temp_file). Though the code is about equal parts Artifact and PulpTemporaryFile, so doesn't really matter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alikins I was thinking the same thing.

@fao89 am I reading correctly that the Artifact.init_and_validate now accepts a PulpTemporaryFile also? Should we just use that and remove this one? What do you think should be done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Artifact is not accepting PulpTemporaryFile yet, this is alikins suggestion - I can implement it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I thought it did from reading this docstring https://github.com/pulp/pulpcore/pull/793/files but I see now in the code it doesn't. +1 to having it take a PulpTemporaryFile also just to make the existing interface even more usable.

@bmbouter
Copy link
Member

Can a new docs section be added indicating to plugin writers that this is available for passing file data to one or more tasks with some examples? I think it would go in a new subsection here. I think that would be named Making Temporary Files Available to Tasks.

@fao89 fao89 force-pushed the 6749 branch 3 times, most recently from 72a3631 to 3935b7b Compare July 21, 2020 14:04
@mdellweg
Copy link
Member

Is this temporary file using the django storage framework (and i think it should, because there are usecases without any other shared storage.)?

@fao89 fao89 force-pushed the 6749 branch 3 times, most recently from 032702e to 4608b52 Compare July 21, 2020 19:56
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a typo, a question and some suggestions.
But generally i really like this!

docs/glossary.rst Outdated Show resolved Hide resolved

.. code-block:: python

# Example 1 - Saving a temporary file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we suggest that this is usually in the viewset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how can I express it, any suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe like

# Example:
# (in the view)
# variant 1 - Saving a temporary file:
...
# variant 2 - --"-- with additional validation:
...
# (in the task)
# Using the temporary file to create an artifact:
...

docs/plugins/plugin-writer/concepts/index.rst Outdated Show resolved Hide resolved
docs/plugins/plugin-writer/concepts/index.rst Outdated Show resolved Hide resolved
pulpcore/app/models/content.py Outdated Show resolved Hide resolved
pulpcore/app/models/content.py Outdated Show resolved Hide resolved
Comment on lines +323 to +326
while True:
chunk = f.read(1048576) # 1 megabyte
if not chunk:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while True:
chunk = f.read(1048576) # 1 megabyte
if not chunk:
break
for chunk in iter(lambda: f.read(1048576), b""):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the lambda is less readable

file = fields.ArtifactFileField(null=False, upload_to=storage_path, max_length=255)

@staticmethod
def init_and_validate(file, expected_digests=None, expected_size=None, validate_artifact=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether this need to be it's own function, or if we can fold these optional validations in the __init__ method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my idea was to mimic Artifact method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i see is that if both of the expected_* parameters are omitted from the call this function is really the same as just instanciating a PulpTemporaryFile the old fashioned way.
Also if there is no expected digests, we are consuming reading the whole file for no additional benefit. Can you add a if expected_digests: there?

@fao89 fao89 force-pushed the 6749 branch 2 times, most recently from 71ebfb1 to a1dad08 Compare July 22, 2020 13:14
@@ -7,6 +7,10 @@ Glossary
A file. They usually belong to a :term:`content unit<Content>` but may be used
elsewhere (e.g. for PublishedArtifacts).

:class:`~pulpcore.app.models.PulpTemporaryFile`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this glossary is for end-user facing terms and I don't believe they would know about this. So my suggestion is to remove this from the glossary.

Copy link
Member

@bmbouter bmbouter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thank you @fao89 !

"""
return storage.get_temp_file_path(self.pulp_id)

file = fields.ArtifactFileField(null=False, upload_to=storage_path, max_length=255)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This being ArtifactFileField instead of FileField will cause this extra functionality to run. Is that important or is FileField right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docstring I guess we don't need ArtifactFileField

@bmbouter bmbouter merged commit cbbe573 into pulp:master Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants