Skip to content
This repository has been archived by the owner on Dec 7, 2022. It is now read-only.

As a user, I can sync manifests schema version 2. #187

Merged
merged 1 commit into from
Mar 8, 2017

Conversation

ipanova
Copy link
Member

@ipanova ipanova commented Jan 20, 2017

  • add documentation
  • add release notes
  • test copy operation works with tags, manifestsm blobs
  • test remove operation works with tags, manifests, blobs
  • test docker repo tag works
  • test docker rsync distributor works
  • test docker export distributor did not break because of new changes
  • update broken tests
  • create some tests for coverage
  • add smash issue with different scenarios
  • test schema2 workflow on upgraded env and fresh install --> sync-publish-crane-docker pull

@ipanova
Copy link
Member Author

ipanova commented Feb 9, 2017

should be merged together with pulp/crane#69

@@ -410,10 +416,10 @@ If we have a tag named latest and it points to the first manifest with digest
sha256:4ecca..., we can point it to the second manifest with the following
command::

$ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d
$ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d --schema-version 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i will change so we would retrieve the schema_version on server side. it does not make sense to force user to provide schema version when he provided the digest which is the unique identifier for the manifest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I agree that will be better.

@mhrivnak
Copy link
Contributor

mhrivnak commented Feb 9, 2017

As a general comment, I'm not crazy about the convention of expressing schemas as "schemaX" with no space. The Docker documentation writes it out as "schema 1" or "schema version 1", etc.

@ipanova
Copy link
Member Author

ipanova commented Feb 9, 2017

* **repo-registry-id** *(string)* - the name that will be used for this repository in the Docker
registry.
* **url** *(string)* - the url for access to the repository's content.
* **tags** *(dict)* - dictionary of tags for schema1 and schema2 manifests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON calls this an "object", so we should use that terminology

name and repo_id must be unique together so that in any given repository a Tag
name only references a single Manifest. Here is an example tag from MongoDB::
(the digest of the Manifest that the Tag references), schmea_version(the manifest
version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a space after "references"

version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id
must be unique together so that in any given repository a Tag name only references
a single Manifest(schema1 and/or schema2 version).
Here is an example tag from MongoDB::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I think it's much better to show the REST API representation, since that is what users should be interacting with.

@@ -363,6 +363,12 @@ the tag we specify does not exist, it will be created. If the tag exists
however, it will be updated as tag name is unique per repository and can point
to only one manifest.

.. note::

Pulp now supports manifest schema1 and schem2 versions. So when tagging a manifest,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/schem2/schema 2/

@@ -410,10 +416,10 @@ If we have a tag named latest and it points to the first manifest with digest
sha256:4ecca..., we can point it to the second manifest with the following
command::

$ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d
$ pulp-admin docker repo tag --repo-id busybox --tag-name latest --manifest-digest sha256:c152ddeda2b828fbb610cb9e4cb121e1879dd5301d336f0a6c070b2844a0f56d --schema-version 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I agree that will be better.

manifest = models.Manifest.from_json(manifest, digest, tag, upstream_name)
self.parent.available_manifests.append(manifest)
for layer in manifest.fs_layers:
available_blobs.add(layer.blob_sum)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation got a little crazy here.

collection = get_collection('units_docker_tag')
# drop old index due to unit_keys fields change
index_info = collection.index_information()
old_index = 'name_1_repo_id_1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will pulp-manage-db automatically create the new index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my understanding mongo will automatically create the new index, at least it happened in my case without doing anything

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, i think the new index is created when a new unit is added into the collection after the upgrade, but in case we do just upgrade, without any new content sync we might end up removing old index and not having a new one until sync. i need to check this

Copy link
Member Author

@ipanova ipanova Feb 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i was wrong, the new index is created at this point, once all the migrations are done
https://github.com/pulp/pulp/blob/master/server/pulp/server/db/manage.py#L147

So i think we are safe here.

try:
fs_layers = [FSLayer(blob_sum=layer['digest']) for layer in manifest['layers']]
config_layer = manifest['config']['digest']
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this specifically catch a KeyError? Or at least some narrow list of exceptions?

headers, manifest = self._get_path(path)
# set the headers for first request
request_headers['Accept'] = schema2
headers, manifest = self._get_path(path, headers=request_headers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are so many headers flying around, that I think this would be more readable if named something like response_headers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are making a request to the registry so in my understanding those should be request_headers, correct me if i am wrong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights the confusion exactly. The first word on this line is headers. Is that not response headers?

Copy link
Member Author

@ipanova ipanova Feb 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pulp/pulp_docker/pull/187/files#diff-99c0d67fc4b80f3d190c6b6af1c453b1R447
i could rename headers to request_headers if that would make it better

Copy link
Member Author

@ipanova ipanova Feb 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are headers sent in the request to the docker registry API:
https://github.com/pulp/pulp_docker/pull/187/files#diff-99c0d67fc4b80f3d190c6b6af1c453b1R456

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not self._get_path return the response headers? Looking at the code, it returns report.headers, which are from a nectar report. The nectar report documents that attribute as "response headers" here: https://github.com/pulp/nectar/blob/master/nectar/report.py#L28

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i think i got confused about which headers we are talking about, if it is a parameter from _get_path() then those are request headers, and the ones which _get_path returns are response_headers. I think now we are on the same page and i completely agree with your suggestion.

"""
Retrieve a single path within the upstream registry, and return a 2-tuple of the headers and
the response body.

:param path: a full http path to retrieve that will be urljoin'd to the upstream registry
url.
:type path: basestring
:param headers: headers sent in the request
:type headers: basestring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a dict?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are correct

@ipanova ipanova force-pushed the schema2 branch 3 times, most recently from 9407da1 to 044d4de Compare February 10, 2017 14:05
@ipanova
Copy link
Member Author

ipanova commented Feb 10, 2017

@mhrivnak i think i addressed all the comments

@ipanova ipanova force-pushed the schema2 branch 2 times, most recently from 167cc39 to 27502af Compare February 13, 2017 13:12
@@ -14,6 +14,9 @@
from pulp_docker.plugins.distributors import configuration, v1_publish_steps


tags_names_redirect = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The risk of having anything in global namespace is that it will persist even when the task is done.

I suggest in V2WebPublisher.__init__, make the dict, and pass it as an argument to each of the steps that needs to use it.

# [(S2, digest), (S1, digest)]
# or
# [(S1, digest)]
return manifests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on this?

@ipanova
Copy link
Member Author

ipanova commented Mar 2, 2017

There were issues with rsync distributor, thanks to @dkliban he provided a functional patch and helped with testing

* **repo-registry-id** *(string)* - the name that will be used for this repository in the Docker
registry.
* **url** *(string)* - the url for access to the repository's content.
* **tags** *(dict)* - dictionary of tags for schema1 and schema2 manifests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little more detail in the description would be helpful. Something like this perhaps:

dictionary with two keys, 'schema1' and 'schema2'. The value of each is a list of tags for manifests of each respective schema type.

name and repo_id must be unique together so that in any given repository a Tag
name only references a single Manifest. Here is an example tag from MongoDB::
(the digest of the Manifest that the Tag references), schmea_version(the manifest
version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/schmea_version(the manifest version the Tag referencesand/schema_version (the schema version for the manifest) and/

name only references a single Manifest. Here is an example tag from MongoDB::
(the digest of the Manifest that the Tag references), schmea_version(the manifest
version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id
must be unique together so that in any given repository a Tag name only references
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Tag name/Tag/

(the digest of the Manifest that the Tag references), schmea_version(the manifest
version the Tag referencesand a repo_id. A Tag's name, schema_version and repo_id
must be unique together so that in any given repository a Tag name only references
a single Manifest(schema1 and/or schema2 version).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Manifest(schema1 and/or schema2 version)/Manifest that uses either schema version 1 or schema version 2/

@@ -363,6 +363,12 @@ the tag we specify does not exist, it will be created. If the tag exists
however, it will be updated as tag name is unique per repository and can point
to only one manifest.

.. note::

Pulp now supports manifest schema1 and schem2 versions. So when tagging a manifest,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulp supports manifests that use schema version 1 and schema version 2. The schema version of the manifest needs to be specified when tagging a manifest.

Copy link
Member Author

@ipanova ipanova Mar 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i changed this, so we handle that on server side, it does not make sense to force user to provide schema version if the provided the digest of the manifests( which is the unique identifier)

'repo-registry-id': registry, 'url': redirect_url,
'protected': self.get_config().get('protected', False)}
'protected': self.get_config().get('protected', False),
'schema2_data': list(schema2_data)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like that we are only including schema2_data in here. I think it would be easier to debug pulp_docker/crane if this data was present.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know we already discussed that, i do not have any strong preference here, but i think, less data we need to work with and parse better is. I would like to hear @mhrivnak opinion as well on this - do we want to write also schema1_data and make it available to Crane?

@@ -301,6 +302,10 @@ def _import_manifest(conduit, unit, dest_repo):
for layer in unit.fs_layers:
blob_digests.add(layer.blob_sum)

# in manifest schema version 2 there is an additional blob layer called config_layer
if unit.config_layer:
blob_digests.add(unit.config_layer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indention looks wrong here.

@@ -427,6 +432,9 @@ def _purge_unlinked_blobs(repo, manifest):
# Find blob digests referenced by removed manifests (orphaned)
orphaned = set()
map((lambda layer: orphaned.add(layer.blob_sum)), manifest.fs_layers)
# in manifest schema version 2 there is an additional blob layer called config_layer
if manifest.config_layer:
orphaned.add(manifest.config_layer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is over-indented.


def get_symlink_name(self):
"""
Provides the name that should be used when creating a symlink.
:return: file name as it appears in a published repository
:rtype: str
"""
return self.digest
return '/'.join((str(self.schema_version), self.digest))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to make some additional changes to make the rsync distributor work. Here is the commit that fixed it for me: a5b81e2


Tag model has a new field `schema_version`. Now its unit key is extended to
(repo_id, tag_name, schema_version) to enable the situation when there is manifest V2
schema version 1 and manifest V2 schema version 2 having same tag name within a repository.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove lines 11-16 ... This is technical stuff that users are not interested in reading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i probably would like mention that where have been done changes to Tag so if you want to do copy or remove of tags you need to bear in mind that within a repo there can be 2 tags with same name.

@ipanova
Copy link
Member Author

ipanova commented Mar 3, 2017

Steps for upgrade testing( was tested with newer docker client):

  1. pulp_docker and crane checkout on master. Create, Sync and publish busybox
  2. docker pull by tag 'latest'. Pull should succeed. Make a curl request to crane API ensure schema1 is returned
  3. checkout pulp_docker and crane on schema2 branch( my PRs). run pulp-manage-db, restart services
  4. create , sync and publish docker repo from docker hub alpine ( it is a small one)
  5. perform docker pull localhost:5000/alpine:latest. Docker pull should work. Make a curl request to crane API ensure schema2 is returned
  6. perform docker pull localhost:5000/busybox:latest. Docker pull should work. Make a curl request to crane API ensure schema1 is returned
  7. re-publish busybox ( publish should be operational, so use force-full flag). This step is done to test redirect file 3
  8. perform docker pull localhost:5000/busybox:latest. Docker pull should work. Make a curl request to crane API ensure schema1 is returned
  9. re-sync busybox with auto-publish
  10. perform docker pull localhost:5000/busybox:latest. Docker pull should work. Make a curl request to crane API ensure schema2 is returned

@ipanova ipanova changed the title schema 2 As a user, I can sync manifests schema version 2. Mar 6, 2017
@ipanova
Copy link
Member Author

ipanova commented Mar 8, 2017

ok test

@ipanova ipanova merged commit 35dc19c into pulp:master Mar 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants