-
Notifications
You must be signed in to change notification settings - Fork 53
As a user, I can sync manifests schema version 2. #187
Conversation
cab2c55
to
584e1f2
Compare
should be merged together with pulp/crane#69 |
docs/user-guide/recipes.rst
Outdated
@@ -410,10 +416,10 @@ If we have a tag named latest and it points to the first manifest with digest | |||
sha256:4ecca..., we can point it to the second manifest with the following | |||
command:: | |||
|
|||
$ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d | |||
$ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d --schema-version 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think i will change so we would retrieve the schema_version on server side. it does not make sense to force user to provide schema version when he provided the digest which is the unique identifier for the manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. I agree that will be better.
As a general comment, I'm not crazy about the convention of expressing schemas as "schemaX" with no space. The Docker documentation writes it out as "schema 1" or "schema version 1", etc. |
docs/tech-reference/distributor.rst
Outdated
* **repo-registry-id** *(string)* - the name that will be used for this repository in the Docker | ||
registry. | ||
* **url** *(string)* - the url for access to the repository's content. | ||
* **tags** *(dict)* - dictionary of tags for schema1 and schema2 manifests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON calls this an "object", so we should use that terminology
docs/tech-reference/tags.rst
Outdated
name and repo_id must be unique together so that in any given repository a Tag | ||
name only references a single Manifest. Here is an example tag from MongoDB:: | ||
(the digest of the Manifest that the Tag references), schmea_version(the manifest | ||
version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a space after "references"
docs/tech-reference/tags.rst
Outdated
version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id | ||
must be unique together so that in any given repository a Tag name only references | ||
a single Manifest(schema1 and/or schema2 version). | ||
Here is an example tag from MongoDB:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I think it's much better to show the REST API representation, since that is what users should be interacting with.
docs/user-guide/recipes.rst
Outdated
@@ -363,6 +363,12 @@ the tag we specify does not exist, it will be created. If the tag exists | |||
however, it will be updated as tag name is unique per repository and can point | |||
to only one manifest. | |||
|
|||
.. note:: | |||
|
|||
Pulp now supports manifest schema1 and schem2 versions. So when tagging a manifest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/schem2/schema 2/
docs/user-guide/recipes.rst
Outdated
@@ -410,10 +416,10 @@ If we have a tag named latest and it points to the first manifest with digest | |||
sha256:4ecca..., we can point it to the second manifest with the following | |||
command:: | |||
|
|||
$ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d | |||
$ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d --schema-version 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. I agree that will be better.
manifest = models.Manifest.from_json(manifest, digest, tag, upstream_name) | ||
self.parent.available_manifests.append(manifest) | ||
for layer in manifest.fs_layers: | ||
available_blobs.add(layer.blob_sum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation got a little crazy here.
collection = get_collection('units_docker_tag') | ||
# drop old index due to unit_keys fields change | ||
index_info = collection.index_information() | ||
old_index = 'name_1_repo_id_1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will pulp-manage-db automatically create the new index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my understanding mongo will automatically create the new index, at least it happened in my case without doing anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, i think the new index is created when a new unit is added into the collection after the upgrade, but in case we do just upgrade, without any new content sync we might end up removing old index and not having a new one until sync. i need to check this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no i was wrong, the new index is created at this point, once all the migrations are done
https://github.com/pulp/pulp/blob/master/server/pulp/server/db/manage.py#L147
So i think we are safe here.
try: | ||
fs_layers = [FSLayer(blob_sum=layer['digest']) for layer in manifest['layers']] | ||
config_layer = manifest['config']['digest'] | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this specifically catch a KeyError
? Or at least some narrow list of exceptions?
headers, manifest = self._get_path(path) | ||
# set the headers for first request | ||
request_headers['Accept'] = schema2 | ||
headers, manifest = self._get_path(path, headers=request_headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are so many headers flying around, that I think this would be more readable if named something like response_headers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are making a request to the registry so in my understanding those should be request_headers, correct me if i am wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This highlights the confusion exactly. The first word on this line is headers
. Is that not response headers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/pulp/pulp_docker/pull/187/files#diff-99c0d67fc4b80f3d190c6b6af1c453b1R447
i could rename headers
to request_headers
if that would make it better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are headers sent in the request to the docker registry API:
https://github.com/pulp/pulp_docker/pull/187/files#diff-99c0d67fc4b80f3d190c6b6af1c453b1R456
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not self._get_path
return the response headers? Looking at the code, it returns report.headers
, which are from a nectar report. The nectar report documents that attribute as "response headers" here: https://github.com/pulp/nectar/blob/master/nectar/report.py#L28
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i think i got confused about which headers we are talking about, if it is a parameter from _get_path() then those are request headers, and the ones which _get_path returns are response_headers. I think now we are on the same page and i completely agree with your suggestion.
""" | ||
Retrieve a single path within the upstream registry, and return a 2-tuple of the headers and | ||
the response body. | ||
|
||
:param path: a full http path to retrieve that will be urljoin'd to the upstream registry | ||
url. | ||
:type path: basestring | ||
:param headers: headers sent in the request | ||
:type headers: basestring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are correct
9407da1
to
044d4de
Compare
@mhrivnak i think i addressed all the comments |
167cc39
to
27502af
Compare
@@ -14,6 +14,9 @@ | |||
from pulp_docker.plugins.distributors import configuration, v1_publish_steps | |||
|
|||
|
|||
tags_names_redirect = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The risk of having anything in global namespace is that it will persist even when the task is done.
I suggest in V2WebPublisher.__init__
, make the dict
, and pass it as an argument to each of the steps that needs to use it.
# [(S2, digest), (S1, digest)] | ||
# or | ||
# [(S1, digest)] | ||
return manifests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on this?
7624c48
to
87632b3
Compare
077aece
to
e988a96
Compare
f6da3e3
to
ac3ce2b
Compare
There were issues with rsync distributor, thanks to @dkliban he provided a functional patch and helped with testing |
docs/tech-reference/distributor.rst
Outdated
* **repo-registry-id** *(string)* - the name that will be used for this repository in the Docker | ||
registry. | ||
* **url** *(string)* - the url for access to the repository's content. | ||
* **tags** *(dict)* - dictionary of tags for schema1 and schema2 manifests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little more detail in the description would be helpful. Something like this perhaps:
dictionary with two keys, 'schema1' and 'schema2'. The value of each is a list of tags for manifests of each respective schema type.
docs/tech-reference/tags.rst
Outdated
name and repo_id must be unique together so that in any given repository a Tag | ||
name only references a single Manifest. Here is an example tag from MongoDB:: | ||
(the digest of the Manifest that the Tag references), schmea_version(the manifest | ||
version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/schmea_version(the manifest version the Tag referencesand/schema_version (the schema version for the manifest) and/
docs/tech-reference/tags.rst
Outdated
name only references a single Manifest. Here is an example tag from MongoDB:: | ||
(the digest of the Manifest that the Tag references), schmea_version(the manifest | ||
version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id | ||
must be unique together so that in any given repository a Tag name only references |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Tag name/Tag/
docs/tech-reference/tags.rst
Outdated
(the digest of the Manifest that the Tag references), schmea_version(the manifest | ||
version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id | ||
must be unique together so that in any given repository a Tag name only references | ||
a single Manifest(schema1 and/or schema2 version). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Manifest(schema1 and/or schema2 version)/Manifest that uses either schema version 1 or schema version 2/
docs/user-guide/recipes.rst
Outdated
@@ -363,6 +363,12 @@ the tag we specify does not exist, it will be created. If the tag exists | |||
however, it will be updated as tag name is unique per repository and can point | |||
to only one manifest. | |||
|
|||
.. note:: | |||
|
|||
Pulp now supports manifest schema1 and schem2 versions. So when tagging a manifest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulp supports manifests that use schema version 1 and schema version 2. The schema version of the manifest needs to be specified when tagging a manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i changed this, so we handle that on server side, it does not make sense to force user to provide schema version if the provided the digest of the manifests( which is the unique identifier)
'repo-registry-id': registry, 'url': redirect_url, | ||
'protected': self.get_config().get('protected', False)} | ||
'protected': self.get_config().get('protected', False), | ||
'schema2_data': list(schema2_data)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like that we are only including schema2_data in here. I think it would be easier to debug pulp_docker/crane if this data was present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know we already discussed that, i do not have any strong preference here, but i think, less data we need to work with and parse better is. I would like to hear @mhrivnak opinion as well on this - do we want to write also schema1_data and make it available to Crane?
@@ -301,6 +302,10 @@ def _import_manifest(conduit, unit, dest_repo): | |||
for layer in unit.fs_layers: | |||
blob_digests.add(layer.blob_sum) | |||
|
|||
# in manifest schema version 2 there is an additional blob layer called config_layer | |||
if unit.config_layer: | |||
blob_digests.add(unit.config_layer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indention looks wrong here.
@@ -427,6 +432,9 @@ def _purge_unlinked_blobs(repo, manifest): | |||
# Find blob digests referenced by removed manifests (orphaned) | |||
orphaned = set() | |||
map((lambda layer: orphaned.add(layer.blob_sum)), manifest.fs_layers) | |||
# in manifest schema version 2 there is an additional blob layer called config_layer | |||
if manifest.config_layer: | |||
orphaned.add(manifest.config_layer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is over-indented.
|
||
def get_symlink_name(self): | ||
""" | ||
Provides the name that should be used when creating a symlink. | ||
:return: file name as it appears in a published repository | ||
:rtype: str | ||
""" | ||
return self.digest | ||
return '/'.join((str(self.schema_version), self.digest)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to make some additional changes to make the rsync distributor work. Here is the commit that fixed it for me: a5b81e2
|
||
Tag model has a new field `schema_version`. Now its unit key is extended to | ||
(repo_id, tag_name, schema_version) to enable the situation when there is manifest V2 | ||
schema version 1 and manifest V2 schema version 2 having same tag name within a repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove lines 11-16 ... This is technical stuff that users are not interested in reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i probably would like mention that where have been done changes to Tag so if you want to do copy or remove of tags you need to bear in mind that within a repo there can be 2 tags with same name.
Steps for upgrade testing( was tested with newer docker client):
|
closes #2099
ok test |
docker repo tag
works