As a user, I can sync manifests schema version 2. #187

ipanova · 2017-01-20T13:06:41Z

ipanova · 2017-02-09T10:56:32Z

should be merged together with pulp/crane#69

ipanova · 2017-02-09T14:11:38Z

docs/user-guide/recipes.rst

@@ -410,10 +416,10 @@ If we have a tag named latest and it points to the first manifest with digest
 sha256:4ecca..., we can point it to the second manifest with the following
 command::

-    $ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d
+    $ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d --schema-version 1


i think i will change so we would retrieve the schema_version on server side. it does not make sense to force user to provide schema version when he provided the digest which is the unique identifier for the manifest.

Great point. I agree that will be better.

mhrivnak · 2017-02-09T14:24:33Z

As a general comment, I'm not crazy about the convention of expressing schemas as "schemaX" with no space. The Docker documentation writes it out as "schema 1" or "schema version 1", etc.

ipanova · 2017-02-09T16:01:06Z

https://etherpad.net/p/schema2-testing

mhrivnak · 2017-02-09T14:15:46Z

docs/tech-reference/distributor.rst

+* **repo-registry-id** *(string)* - the name that will be used for this repository in the Docker
+  registry.
+* **url** *(string)* - the url for access to the repository's content.
+* **tags** *(dict)* - dictionary of tags for schema1 and schema2 manifests.


JSON calls this an "object", so we should use that terminology

mhrivnak · 2017-02-09T14:16:53Z

docs/tech-reference/tags.rst

-name and repo_id must be unique together so that in any given repository a Tag
-name only references a single Manifest. Here is an example tag from MongoDB::
+(the digest of the Manifest that the Tag references), schmea_version(the manifest
+version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id


Missing a space after "references"

mhrivnak · 2017-02-09T14:19:16Z

docs/tech-reference/tags.rst

+version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id
+must be unique together so that in any given repository a Tag name only references
+a single Manifest(schema1 and/or schema2 version).
+Here is an example tag from MongoDB::


Generally I think it's much better to show the REST API representation, since that is what users should be interacting with.

mhrivnak · 2017-02-09T14:20:27Z

docs/user-guide/recipes.rst

@@ -363,6 +363,12 @@ the tag we specify does not exist, it will be created. If the tag exists
 however, it will be updated as tag name is unique per repository and can point
 to only one manifest.

+.. note::
+
+    Pulp now supports manifest schema1 and schem2 versions. So when tagging a manifest,


s/schem2/schema 2/

mhrivnak · 2017-02-09T14:21:24Z

docs/user-guide/recipes.rst

@@ -410,10 +416,10 @@ If we have a tag named latest and it points to the first manifest with digest
 sha256:4ecca..., we can point it to the second manifest with the following
 command::

-    $ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d
+    $ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d --schema-version 1


Great point. I agree that will be better.

mhrivnak · 2017-02-09T15:56:46Z

plugins/pulp_docker/plugins/importers/sync.py

+        manifest = models.Manifest.from_json(manifest, digest, tag, upstream_name)
+        self.parent.available_manifests.append(manifest)
+        for layer in manifest.fs_layers:
+                    available_blobs.add(layer.blob_sum)


Indentation got a little crazy here.

mhrivnak · 2017-02-09T15:59:44Z

plugins/pulp_docker/plugins/migrations/0003_tag_schema_change.py

+    collection = get_collection('units_docker_tag')
+    # drop old index due to unit_keys fields change
+    index_info = collection.index_information()
+    old_index = 'name_1_repo_id_1'


Will pulp-manage-db automatically create the new index?

in my understanding mongo will automatically create the new index, at least it happened in my case without doing anything

actually, i think the new index is created when a new unit is added into the collection after the upgrade, but in case we do just upgrade, without any new content sync we might end up removing old index and not having a new one until sync. i need to check this

no i was wrong, the new index is created at this point, once all the migrations are done
https://github.com/pulp/pulp/blob/master/server/pulp/server/db/manage.py#L147

So i think we are safe here.

mhrivnak · 2017-02-09T16:22:59Z

plugins/pulp_docker/plugins/models.py

+        try:
+            fs_layers = [FSLayer(blob_sum=layer['digest']) for layer in manifest['layers']]
+            config_layer = manifest['config']['digest']
+        except:


Can this specifically catch a KeyError? Or at least some narrow list of exceptions?

mhrivnak · 2017-02-09T16:47:35Z

plugins/pulp_docker/plugins/registry.py

-        headers, manifest = self._get_path(path)
+        # set the headers for first request
+        request_headers['Accept'] = schema2
+        headers, manifest = self._get_path(path, headers=request_headers)


There are so many headers flying around, that I think this would be more readable if named something like response_headers.

we are making a request to the registry so in my understanding those should be request_headers, correct me if i am wrong

This highlights the confusion exactly. The first word on this line is headers. Is that not response headers?

https://github.com/pulp/pulp_docker/pull/187/files#diff-99c0d67fc4b80f3d190c6b6af1c453b1R447
i could rename headers to request_headers if that would make it better

Those are headers sent in the request to the docker registry API:
https://github.com/pulp/pulp_docker/pull/187/files#diff-99c0d67fc4b80f3d190c6b6af1c453b1R456

Does not self._get_path return the response headers? Looking at the code, it returns report.headers, which are from a nectar report. The nectar report documents that attribute as "response headers" here: https://github.com/pulp/nectar/blob/master/nectar/report.py#L28

ok i think i got confused about which headers we are talking about, if it is a parameter from _get_path() then those are request headers, and the ones which _get_path returns are response_headers. I think now we are on the same page and i completely agree with your suggestion.

mhrivnak · 2017-02-09T16:50:40Z

plugins/pulp_docker/plugins/registry.py

        """
        Retrieve a single path within the upstream registry, and return a 2-tuple of the headers and
        the response body.

        :param path: a full http path to retrieve that will be urljoin'd to the upstream registry
                     url.
        :type  path: basestring
+        :param headers: headers sent in the request
+        :type headers:  basestring


Isn't this a dict?

you are correct

ipanova · 2017-02-10T14:05:36Z

@mhrivnak i think i addressed all the comments

mhrivnak · 2017-02-13T19:19:18Z

plugins/pulp_docker/plugins/distributors/publish_steps.py

@@ -14,6 +14,9 @@
 from pulp_docker.plugins.distributors import configuration, v1_publish_steps


+tags_names_redirect = {}


I see. The risk of having anything in global namespace is that it will persist even when the task is done.

I suggest in V2WebPublisher.__init__, make the dict, and pass it as an argument to each of the steps that needs to use it.

mhrivnak · 2017-02-13T19:20:14Z

plugins/pulp_docker/plugins/registry.py

+        # [(S2, digest), (S1, digest)]
+        # or
+        # [(S1, digest)]
+        return manifests


Thoughts on this?

ipanova · 2017-03-02T13:38:23Z

There were issues with rsync distributor, thanks to @dkliban he provided a functional patch and helped with testing

dkliban · 2017-02-09T11:16:32Z

docs/tech-reference/distributor.rst

+* **repo-registry-id** *(string)* - the name that will be used for this repository in the Docker
+  registry.
+* **url** *(string)* - the url for access to the repository's content.
+* **tags** *(dict)* - dictionary of tags for schema1 and schema2 manifests.


A little more detail in the description would be helpful. Something like this perhaps:

dictionary with two keys, 'schema1' and 'schema2'. The value of each is a list of tags for manifests of each respective schema type.

dkliban · 2017-02-09T11:30:13Z

docs/tech-reference/tags.rst

-name and repo_id must be unique together so that in any given repository a Tag
-name only references a single Manifest. Here is an example tag from MongoDB::
+(the digest of the Manifest that the Tag references), schmea_version(the manifest
+version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id


s/schmea_version(the manifest version the Tag referencesand/schema_version (the schema version for the manifest) and/

dkliban · 2017-02-09T11:38:15Z

docs/tech-reference/tags.rst

-name only references a single Manifest. Here is an example tag from MongoDB::
+(the digest of the Manifest that the Tag references), schmea_version(the manifest
+version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id
+must be unique together so that in any given repository a Tag name only references


s/Tag name/Tag/

dkliban · 2017-02-09T11:43:05Z

docs/tech-reference/tags.rst

+(the digest of the Manifest that the Tag references), schmea_version(the manifest
+version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id
+must be unique together so that in any given repository a Tag name only references
+a single Manifest(schema1 and/or schema2 version).


s/Manifest(schema1 and/or schema2 version)/Manifest that uses either schema version 1 or schema version 2/

dkliban · 2017-02-09T11:45:36Z

docs/user-guide/recipes.rst

@@ -363,6 +363,12 @@ the tag we specify does not exist, it will be created. If the tag exists
 however, it will be updated as tag name is unique per repository and can point
 to only one manifest.

+.. note::
+
+    Pulp now supports manifest schema1 and schem2 versions. So when tagging a manifest,


Pulp supports manifests that use schema version 1 and schema version 2. The schema version of the manifest needs to be specified when tagging a manifest.

i changed this, so we handle that on server side, it does not make sense to force user to provide schema version if the provided the digest of the manifests( which is the unique identifier)

dkliban · 2017-03-01T16:55:35Z

plugins/pulp_docker/plugins/distributors/publish_steps.py

            'repo-registry-id': registry, 'url': redirect_url,
-            'protected': self.get_config().get('protected', False)}
+            'protected': self.get_config().get('protected', False),
+            'schema2_data': list(schema2_data)}


I still don't like that we are only including schema2_data in here. I think it would be easier to debug pulp_docker/crane if this data was present.

i know we already discussed that, i do not have any strong preference here, but i think, less data we need to work with and parse better is. I would like to hear @mhrivnak opinion as well on this - do we want to write also schema1_data and make it available to Crane?

dkliban · 2017-03-01T16:58:01Z

plugins/pulp_docker/plugins/importers/importer.py

@@ -301,6 +302,10 @@ def _import_manifest(conduit, unit, dest_repo):
        for layer in unit.fs_layers:
            blob_digests.add(layer.blob_sum)

+        # in manifest schema version 2 there is an additional blob layer called config_layer
+        if unit.config_layer:
+                blob_digests.add(unit.config_layer)


The indention looks wrong here.

dkliban · 2017-03-01T17:01:05Z

plugins/pulp_docker/plugins/importers/importer.py

@@ -427,6 +432,9 @@ def _purge_unlinked_blobs(repo, manifest):
        # Find blob digests referenced by removed manifests (orphaned)
        orphaned = set()
        map((lambda layer: orphaned.add(layer.blob_sum)), manifest.fs_layers)
+        # in manifest schema version 2 there is an additional blob layer called config_layer
+        if manifest.config_layer:
+                orphaned.add(manifest.config_layer)


I think this is over-indented.

dkliban · 2017-03-01T20:59:37Z

plugins/pulp_docker/plugins/models.py


    def get_symlink_name(self):
        """
        Provides the name that should be used when creating a symlink.
        :return: file name as it appears in a published repository
        :rtype: str
        """
-        return self.digest
+        return '/'.join((str(self.schema_version), self.digest))


You will need to make some additional changes to make the rsync distributor work. Here is the commit that fixed it for me: a5b81e2

dkliban · 2017-03-02T14:07:25Z

docs/user-guide/release-notes/2.4.x.rst

+
+Tag model has a new field `schema_version`. Now its unit key is extended to
+(repo_id, tag_name, schema_version) to enable the situation when there is manifest V2
+schema version 1 and manifest V2 schema version 2 having same tag name within a repository.


You can remove lines 11-16 ... This is technical stuff that users are not interested in reading.

i probably would like mention that where have been done changes to Tag so if you want to do copy or remove of tags you need to bear in mind that within a repo there can be 2 tags with same name.

ipanova · 2017-03-03T16:10:53Z

Steps for upgrade testing( was tested with newer docker client):

pulp_docker and crane checkout on master. Create, Sync and publish busybox
docker pull by tag 'latest'. Pull should succeed. Make a curl request to crane API ensure schema1 is returned
checkout pulp_docker and crane on schema2 branch( my PRs). run pulp-manage-db, restart services
create , sync and publish docker repo from docker hub alpine ( it is a small one)
perform docker pull localhost:5000/alpine:latest. Docker pull should work. Make a curl request to crane API ensure schema2 is returned
perform docker pull localhost:5000/busybox:latest. Docker pull should work. Make a curl request to crane API ensure schema1 is returned
re-publish busybox ( publish should be operational, so use force-full flag). This step is done to test redirect file 3
perform docker pull localhost:5000/busybox:latest. Docker pull should work. Make a curl request to crane API ensure schema1 is returned
re-sync busybox with auto-publish
perform docker pull localhost:5000/busybox:latest. Docker pull should work. Make a curl request to crane API ensure schema2 is returned

closes #2099

ipanova · 2017-03-08T14:11:51Z

ok test

ipanova added the Work In Progress label Jan 20, 2017

ipanova force-pushed the schema2 branch 4 times, most recently from cab2c55 to 584e1f2 Compare January 20, 2017 16:21

ipanova commented Feb 9, 2017

View reviewed changes

mhrivnak suggested changes Feb 9, 2017

View reviewed changes

ipanova force-pushed the schema2 branch 3 times, most recently from 9407da1 to 044d4de Compare February 10, 2017 14:05

ipanova force-pushed the schema2 branch 2 times, most recently from 167cc39 to 27502af Compare February 13, 2017 13:12

mhrivnak suggested changes Feb 13, 2017

View reviewed changes

ipanova force-pushed the schema2 branch 2 times, most recently from 7624c48 to 87632b3 Compare February 22, 2017 16:08

ipanova force-pushed the schema2 branch 3 times, most recently from 077aece to e988a96 Compare February 28, 2017 14:13

ipanova added enhancement and removed Work In Progress labels Feb 28, 2017

ipanova force-pushed the schema2 branch 4 times, most recently from f6da3e3 to ac3ce2b Compare March 2, 2017 13:20

ipanova force-pushed the schema2 branch from ac3ce2b to b82312e Compare March 2, 2017 13:48

ipanova force-pushed the schema2 branch from b82312e to 338dffb Compare March 2, 2017 13:56

dkliban suggested changes Mar 2, 2017

View reviewed changes

ipanova force-pushed the schema2 branch from 338dffb to 3cc57c8 Compare March 2, 2017 16:53

ipanova mentioned this pull request Mar 6, 2017

Test schema version 2 manifests published by pulp docker pulp/pulp-smash#555

Closed

ipanova changed the title ~~schema 2~~ As a user, I can sync manifests schema version 2. Mar 6, 2017

As a user, I can sync manifests schema version 2.

35dc19c

closes #2099

ipanova force-pushed the schema2 branch from 3cc57c8 to 35dc19c Compare March 7, 2017 14:41

dkliban approved these changes Mar 7, 2017

View reviewed changes

ipanova merged commit 35dc19c into pulp:master Mar 8, 2017

		@@ -14,6 +14,9 @@
		from pulp_docker.plugins.distributors import configuration, v1_publish_steps


		tags_names_redirect = {}

As a user, I can sync manifests schema version 2. #187

As a user, I can sync manifests schema version 2. #187

Conversation

ipanova commented Jan 20, 2017 • edited Loading

ipanova commented Feb 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhrivnak commented Feb 9, 2017

ipanova commented Feb 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova Feb 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova Feb 10, 2017 • edited Loading

Choose a reason for hiding this comment

ipanova Feb 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova commented Feb 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova commented Mar 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova Mar 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ipanova commented Mar 3, 2017 • edited Loading

ipanova commented Mar 8, 2017

ipanova commented Jan 20, 2017 •

edited

Loading

ipanova Feb 10, 2017 •

edited

Loading

ipanova Feb 10, 2017 •

edited

Loading

ipanova Feb 10, 2017 •

edited

Loading

ipanova commented Mar 2, 2017 •

edited

Loading

ipanova Mar 2, 2017 •

edited

Loading

ipanova commented Mar 3, 2017 •

edited

Loading