Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions docs/proposals/docker-registry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
## Improve docker-registry

### Motivation

Openshift's docker-registry always resolve and store in the etcd the last digest. So we always know what object
we need to request. But we can't get access to it without mapping to the repository through which it was uploaded.

In `docker/distribution` there is no way to read/write an object without using the `linkedBlobStore`.
This module deals with the comparison of objects and links to them from user repositories. These links are made
to reduce the amount of disk space.

We need to stop using `linkedBlobStore` from `docker/distribution`. The main purpose for this module is to store information
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this problem statement has enough detail for someone not familiar with the registry. I would include info on why we even have a linked blob store and the functionality it provides. Also, please include information about why you're proposing a refactor. What benefit does it give us if the linkedBlobStore is already providing the functionality. (it is sort of mentioned in the second paragraph here but I think calling it out but there is no justification on why it helps us)

Take a look at some of the existing proposals in https://github.com/openshift/origin/tree/master/docs/proposals.

about blobs because `docker/distribution` don't have own database. It places information in the filesystem.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qualify "information"


The `docker/distribution` allow you to add middleware for
[Registry](https://github.com/openshift/origin/blob/master/Godeps/_workspace/src/github.com/docker/distribution/registry/middleware/registry/middleware.go),
[Repository](https://github.com/openshift/origin/blob/master/Godeps/_workspace/src/github.com/docker/distribution/registry/middleware/repository/middleware.go) and
[Storage driver](https://github.com/openshift/origin/blob/master/Godeps/_workspace/src/github.com/docker/distribution/registry/storage/driver/middleware/storagemiddleware.go)
(there are several different types of middlewares but they do not fit for us).
The first two can't access the blobs directly because path to them will be created by `linkedBlobStore`.
The third has access to blobs, but cannot build a path. The path is created in a higher layer.

### Solution

Openshift already uses an etcd as a database. All infomation about images/imagestreams are there. We can hold information
about the blobs in repository in the database as well. As result, we need to store only the blobs's payload in the filesystem
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please detail where and how this information would be stored in etcd. On the existing imagestream object, a new meta object, etc? Also please detail what pieces would need to change to update this information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we don't have to store anything new to etcd. We don't even have to use the database at all right now.

using our own objects layout. In this case, we get complete control over the blobs.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to replace existing layout. We can do just fine with its subset. If we really need to use a different one, let's document why.


To do this we will need to implement part of the functionality of the `linkedBlobStore`. We can create our own middleware for
repository and storage driver. The middleware for storage driver will be used for reading and writing in old (docker/distribution)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to include some pseudo-code later in the document that illustrates this design more completely. Will we be wrapping every storage driver with a proxy implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. We can use any storage driver from docker/distribution.

and new layout.

We need to make a new layout because the layout that uses `docker/distribution` private and upstream doesn't
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little unclear. I think you're referring to the pathFor method here. A little more detail would be good. Especially if we have a reference to why they don't want to open it.

want to open it. The important point is that we don't have to replace the whole old layout. We need only the part
that is used to store blobs.

### New objects layout

In the layout I propose to use are similar to `docker/distribution` but simpler:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document, why we cannot use the old layout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't able to use it. It is private.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. You could also link to the pieces of code we'd need to export. Also add a note that we already tried to negotiate with upstream about exporting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also link to the pieces of code we'd need to export.

Not sure I understand.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are prompted to specify the code that will need to be duplicated?

Also add a note that we already tried to negotiate with upstream about exporting it.

I can't find the link to this discussion :(

```
uploadDataPathSpec: /openshift/v1/repositories/<name>/_uploads/<id>/data
uploadStartedAtPathSpec: /openshift/v1/repositories/<name>/_uploads/<id>/startedat
uploadHashStatePathSpec: /openshift/v1/repositories/<name>/_uploads/<id>/hashstates/<algorithm>/<offset>

blobPathSpec: /openshift/v1/blobs/<algorithm>/<first two hex bytes of digest>/<hex digest>
blobDataPathSpec: /openshift/v1/blobs/<algorithm>/<first two hex bytes of digest>/<hex digest>/data
```
That's all we need. Everything else is stored in the database or in old layout.

### Benefits

* Cross repository access to blobs;
* Reduce the number of patches for the `docker/distibution`;
* Possibility of applying the quota at the stage of upload.

### Negative aspects

* We have to implement part of the functionality of the `blobStore` and `linkedBlobStore`.