Problem: Artifacts can't be created via REST API #3080
Conversation
| """ | ||
| def storage_path(self, name): | ||
| digest = self.sha256 | ||
| return "%s/artifacts/%s/%s" % (settings.MEDIA_ROOT, digest[0:2], digest[2:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest os.path.join() would be better.
Also, directories are implicitly plural so artifact would be better and more consistent with linux directory naming conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as we discussed, the storage path should be calculated by something coupled to the actual storage mechanism. The current implementation makes an attempt at this by having ArtifactLocation defined in the storage module but falls a little short of the goal. I suggest adding a method to the FileStorage that can be used to get a deconstructable object that can calculate all paths appropriately for that storage mechanism. Then in Artifact.storage_path() instantiate DefaultStorage, get the path calculator and use it to calculate and return the storage path. We can do the same for the certificate storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. I was generally thinking that this type of logic is a good fit for being handled within the domain of the FileField object, and I think that lines up with your specific suggestions.
| due to validation error. This theoretically should never occur since validation is | ||
| performed before the task is dispatched. | ||
| :param data: | ||
| :return: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param_data: and :return: probably left over from cut-n-paste.
|
|
||
| Args: | ||
| data (dict): the id of the importer | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra blank lines.
platform/pulpcore/app/upload.py
Outdated
| self.hashers = {} | ||
| for hasher in hashlib.algorithms_guaranteed: | ||
| self.hashers[hasher] = getattr(hashlib, hasher)() | ||
| super(PulpTemporaryUploadedFile, self).__init__(name, content_type, size, charset, content_type_extra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line length.
platform/pulpcore/app/upload.py
Outdated
| Create the file object to append to as data is coming in. | ||
| """ | ||
| super(PulpFileUploadHandler, self).new_file(file_name, *args, **kwargs) | ||
| self.file = PulpTemporaryUploadedFile(self.file_name, self.content_type, 0, self.charset, self.content_type_extra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line length.
| file = serializers.FileField( | ||
| help_text=_("The stored file."), | ||
| read_only=True | ||
| required=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this actually mean for a FileField? That bits must be present? Note that for on-demand content, we will need an Artifact record so that we can publish it and also have Catalog entries, but the file may not yet be present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I am making the file not required.
| Validate file by size and all checksums which values are provided. | ||
|
|
||
| Args: | ||
| data (dict): the id of the importer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-and-paste gone wrong?
platform/pulpcore/app/upload.py
Outdated
| super(PulpTemporaryUploadedFile, self).__init__(name, content_type, size, charset, content_type_extra) | ||
|
|
||
|
|
||
| class PulpFileUploadHandler(TemporaryFileUploadHandler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more descriptive, consider something like HashingFileUploadHandler as the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a better name for sure. thanks!
platform/pulpcore/app/upload.py
Outdated
| """ | ||
| Create the file object to append to as data is coming in. | ||
| """ | ||
| super(PulpFileUploadHandler, self).new_file(file_name, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an act of mercy, Python 3 no longer requires us to pass stuff into super(). 😄
|
Hello @dkliban! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 25, 2017 at 13:16 Hours UTC |
|
generally LGTM so far. Suggest removing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM.
There is one interesting effect of this modeling change. If a type of content depends on knowing a particular checksum or the size of any of its artifacts at publish time, that information will need to be duplicated on the Content model subclass. For example, consider a FileContent repo set to on-demand.
Publish requires knowledge of the artifact's size and sha256 checksum. In an on-demand scenario where the file hasn't been downloaded, and thus there is no Artifact, that info would need to be stored on the FileContent model directly so it's available at publish time.
Maybe that's a good thing? Maybe we should declare that publication must be possible based solely on attributes of the Content subclass? That's definitely an important constraint though that we need to think through (maybe you already did and I'm catching up), and be explicit about.
Thoughts?
|
|
||
| content = models.ForeignKey(Content, related_name='artifacts', on_delete=models.CASCADE) | ||
| file = models.FileField(db_index=True, upload_to=storage_path, max_length=255) | ||
| size = models.IntegerField(blank=True, null=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the file itself is required, why not require this as well?
platform/pulpcore/app/upload.py
Outdated
| """ | ||
| Create the file object to append to as data is coming in. | ||
| """ | ||
| super().new_file(file_name, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will create a file whose reference I think will be overwritten by the next line. I also think we don't need or want that file at all. Is there anything that calling the superclass new_file() gets us that we need? Or could that be left out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we get this: https://docs.djangoproject.com/en/1.11/_modules/django/core/files/uploadhandler/#FileUploadHandler.new_file
So nothing we can't do here also.
|
It seems reasonable to expect that the content (unit) has all metadata needed to publish. Both |
|
Although, after thinking through this further - Storing artifact related information (fields) on the Content (unit) such as size and digest seems wrong. Instead, I think we should explore keeping the artifact as required on
I suggest the digest fields are the natural key. An artifact could be located by any one of it's strong digest fields. Artifacts created with upload (or sync with immediate download policy) will always have at least (1) digest because the file exists and they can all be calculated. no problem there. Artifacts created during sync (when downloading is deferred) of content types supported today probably all would have a digest because a digest is included in the metadata supplied by the remote repository. So, what about edge cases where downloading is deferred and the remote repository does not provide a digest. Perhaps in that case, Artifacts are created without any digest fields populated. The artifact cannot be fetched using a natural key and duplicates could be created. Maybe that is not so bad. As files are downloaded and digests calculated and set, pulp could perform lazy de-duplication. Anyway. I think we should discuss this further. |
|
Artifacts are provided by the core as a way for plugins to store files in Pulp. If a plugin doesn't have a file to store it doesn't need to create an Artifact. If a plugin needs to be able to publish content for which there are no artifacts, it is the plugin's responsibility to store all information required for publishing. |
| relative_path (models.TextField): The artifact's path relative to the associated | ||
| :class:`Content`. This path is incorporated in the absolute storage path of the file | ||
| and its published path relative to the root publishing directory. At a minimum the path | ||
| will contain the file name but may also include sub-directories. | ||
| size (models.IntegerField): The size of the file in bytes. | ||
| md5 (models.CharField): The MD5 checksum of the file. | ||
| sha1 (models.CharField): The SHA-1 checksum of the file. | ||
| sha224 (models.CharField): The SHA-224 checksum of the file. | ||
| sha256 (models.CharField): The SHA-256 checksum of the file. | ||
| sha384 (models.CharField): The SHA-384 checksum of the file. | ||
| sha512 (models.CharField): The SHA-512 checksum of the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The digest fields need indexes with sha256 and stronger being unique.
I'm assuming that (eventually) Artifact and RemoteArtifact will have a common abstract base model and all (3) models defined here, correct? If so, let refactor that here now since you know you'll be adding the RemoteArtifact eventually. Later get rid of catalog.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have different validation rules, and I don't know if that might prevent us from having an ABC. For example, on an Artifact all fields will be required.
|
Adding another LGTM but leaving approval to someone that knows more about serializer and viewsets. |
|
|
||
| class ArtifactViewSet(NamedModelViewSet): | ||
| endpoint_name = 'artifacts' | ||
| lookup_field = 'pk' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lookup_field value is a default for NamedModelViewset, so it's not needed.
Docstring for NamedModelViewset can be fixed though since it says about the old, no longer valid, default value. My bad.
|
I didn't re-review, but I'm happy with the changes I previously requested. If there's anything specific you'd like me to look at, let me know, but otherwise I'm happy to leave it to the others. |
Solution: add a viewset for Artifacts The REST API user can now POST to /api/v3/artifacts endpoint with a file, size, and any number of checksums. During the upload of the file, each possible checksum is calculated. If user provides size or any checksums, they are verified after the upload finishes. If one of the values does not match a ValidationError is raised. The user can also list all the the Artifacts by performing a GET request to /api/v3/artifacts. An Artifact can be deleted by making a DELETE request to /api/v3/artifacts/<pk>. The user can not update Artifacts via REST API after one is created. The 'django.core.files.uploadhandler.MemoryFileUploadHandler' is disabled so all uploads are handled by 'pulpcore.app.upload.PulpFileUploadHandler' and are written to '/var/lib/pulp/tmp' as the file is being uploaded. closes pulp#2843 https://pulp.plan.io/issues/2843
Solution: add a viewset for Artifacts
The REST API user can now POST to /api/v3/artifacts endpoint with a file,
size, and any number of checksums. During the upload of the file, each
possible checksum is calculated. If user provides size or any checksums,
they are verified after the upload finishes. If one of the values does not
match a ValidationError is raised.
The user can also list all the the Artifacts by performing a GET request
to /api/v3/artifacts. An Artifact can be deleted by making a DELETE
request to /api/v3/artifacts/.
The user can not update Artifacts via REST API after one is created.
The 'django.core.files.uploadhandler.MemoryFileUploadHandler' is disabled so
all uploads are handled by 'pulpcore.app.upload.PulpFileUploadHandler' and
are written to '/var/lib/pulp/tmp' as the file is being uploaded.
closes #2843
https://pulp.plan.io/issues/2843