-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy API #8
Copy API #8
Conversation
The core interface for a import (
"github.com/containerd/containerd/remotes"
)
type Target interface {
remotes.Resolver
} As discussed on our call, I ran an experiment eliminating any direct dependencies on containerd as follows: package target
import (
"context"
"io"
"time"
"github.com/opencontainers/go-digest"
ocispec "github.com/opencontainers/image-spec/specs-go/v1"
)
// Target represents a place to which one can send/push or retrieve/pull artifacts.
// Anything that implements the Target interface can be used as a place to send or
// retrieve artifacts.
type Target interface {
Resolve(ctx context.Context, ref string) (name string, desc ocispec.Descriptor, err error)
Fetcher(ctx context.Context, ref string) (Fetcher, error)
Pusher(ctx context.Context, ref string) (Pusher, error)
}
type Fetcher interface {
Fetch(ctx context.Context, desc ocispec.Descriptor) (io.ReadCloser, error)
}
type Pusher interface {
Push(ctx context.Context, d ocispec.Descriptor) (Writer, error)
}
type Writer interface {
io.WriteCloser
Digest() digest.Digest
Commit(ctx context.Context, size int64, expected digest.Digest, opts ...Opt) error
Status() (Status, error)
Truncate(size int64) error
}
type Status struct {
Ref string
Offset int64
Total int64
Expected digest.Digest
StartedAt time.Time
UpdatedAt time.Time
}
type Opt func(*Info) error
func WithLabels(labels map[string]string) Opt {
return func(info *Info) error {
info.Labels = labels
return nil
}
}
type Info struct {
Digest digest.Digest
Size int64
CreatedAt time.Time
UpdatedAt time.Time
Labels map[string]string
} and then have a wrapper to get from a containerd one to an oras one: package target
import (
"context"
"github.com/containerd/containerd/content"
"github.com/containerd/containerd/remotes"
"github.com/opencontainers/go-digest"
ocispec "github.com/opencontainers/image-spec/specs-go/v1"
)
func FromContainerdResolver(resolver remotes.Resolver) Target {
return &ContainerdResolverTarget{resolver: resolver}
}
type ContainerdResolverTarget struct {
resolver remotes.Resolver
}
type containerdPusher struct {
pusher remotes.Pusher
}
type containerdWriter struct {
writer content.Writer
}
func (c *ContainerdResolverTarget) Resolve(ctx context.Context, ref string) (name string, desc ocispec.Descriptor, err error) {
return c.resolver.Resolve(ctx, ref)
}
func (c *ContainerdResolverTarget) Fetcher(ctx context.Context, ref string) (Fetcher, error) {
return c.resolver.Fetcher(ctx, ref)
}
func (c *ContainerdResolverTarget) Pusher(ctx context.Context, ref string) (Pusher, error) {
p, err := c.resolver.Pusher(ctx, ref)
if err != nil {
return nil, err
}
return &containerdPusher{pusher: p}, nil
}
func (c *containerdPusher) Push(ctx context.Context, d ocispec.Descriptor) (Writer, error) {
w, err := c.pusher.Push(ctx, d)
if err != nil {
return nil, err
}
return &containerdWriter{writer: w}, nil
}
func (c *containerdWriter) Write(p []byte) (n int, err error) {
return c.writer.Write(p)
}
func (c *containerdWriter) Close() error {
return c.writer.Close()
}
func (c *containerdWriter) Digest() digest.Digest {
return c.writer.Digest()
}
func (c *containerdWriter) Commit(ctx context.Context, size int64, expected digest.Digest, opts ...Opt) error {
return c.writer.Commit(ctx, size, expected)
}
func (c *containerdWriter) Status() (Status, error) {
s, err := c.writer.Status()
if err != nil {
return Status{}, err
}
return Status{
Ref: s.Ref,
Offset: s.Offset,
Total: s.Total,
Expected: s.Expected,
StartedAt: s.StartedAt,
UpdatedAt: s.UpdatedAt,
}, nil
}
func (c *containerdWriter) Truncate(size int64) error {
return c.writer.Truncate(size)
} so an existing containerd-compliant one would work. However, I wasn't sure it really was worth all of that duplication. |
I see the problem with Go here. Basically, interfaces are not equal even if they have the same methods. package main
import "fmt"
type Animal interface {
Say()
}
type AnimalFarm interface {
Produce() Animal
}
type Duck interface {
Say()
}
type DuckFarm interface {
Produce() Duck
}
type farm struct{}
func (farm) Produce() Duck {
return duck{}
}
type duck struct{}
func (duck) Say() {
fmt.Println("quak!")
}
func main() {
var animalFarm AnimalFarm = farm{}
duck := animalFarm.Produce()
duck.Say()
} The above code reports error even if
I think the duplication is temporary. Once we have our own implementation equivalent to |
Agreed @shizhMSFT ; I wasn't sure it is worth it, though. I have a commit with all of the above, and it works (I didn't just type it into the PR comments :-) ). I just wasn't sure we wanted all of that duplication. |
|
||
// Copy copy a ref from one target.Target to a ref in another target.Target. If toRef is blank, reuses fromRef | ||
// Returns the root | ||
// Descriptor of the copied item. Can use the root to retrieve child elements from target.Target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use the root to retrieve child elements from target.Target
Is there an example or test that displays how to do this? For example, obtain a list of blob descriptors (layers) referenced by a root manifest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had one in them, removed it for simplicity.
We might want to add a real example. But for now, there is one straight out of containerd, see images.Children()
Needless to say, that takes a content.Provider rather than a remotes.Fetcher, but we actually provider ProviderWrapper
Eventually, we might just want to replace images.ChildrenHandler
with our own; one more reduction in containerd dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also does the work in #17 help with this?
toParts := strings.SplitN(toStr, ":", 2) | ||
switch fromParts[0] { | ||
case "files": | ||
fromFile := content.NewFile("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed the continuation of it. FileStore
actually implements content.Provider
and content.Ingester
, while File
implements remote.Resolver
.
/cc @juliusl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes seem pretty straightforward, if it's working I say we merge.
@deitch, I'm fully supportive of #8, and defer to @shizhMSFT and @jdolitsky for the impact of merging. I would really like to see the split get complete so we can start adding the |
@shizhMSFT I got this to compile
|
) | ||
|
||
// ProviderWrapper wraps a remote.Fetcher to make a content.Provider, which is useful for things | ||
type ProviderWrapper struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could get rid of this wrapper type, if you do it like this:
type fetcherReaderAt struct {
ctx context.Context
fetcher remotes.Fetcher
desc ocispec.Descriptor
rc io.ReadCloser
offset int64
}
func (f fetcherReaderAt) ReaderAt(ctx context.Context, desc ocispec.Descriptor) (content.ReaderAt, error) {
if f.fetcher == nil {
return nil, errors.New("no Fetcher provided")
}
return &fetcherReaderAt{
ctx: ctx,
fetcher: f.fetcher,
desc: desc,
offset: 0,
}, nil
}
and then in copy.go
handlers = append(handlers,
fetchHandler,
picker,
images.ChildrenHandler(&fetcherReaderAt{fetcher: store}),
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does that buy us? You now have a fetcherReaderAt
, which has a call ReaderAt()
, which returns... another fetcherReaderAt
? Too easy to mess things up. I think it is better to have a simple clean structure that has a single call ReaderAt()
, which returns what we want.
// Copy copy a ref from one target.Target to a ref in another target.Target. If toRef is blank, reuses fromRef | ||
// Returns the root | ||
// Descriptor of the copied item. Can use the root to retrieve child elements from target.Target. | ||
func Copy(ctx context.Context, from target.Target, fromRef string, to target.Target, toRef string, opts ...CopyOpt) (ocispec.Descriptor, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for adding more comments, but now that I've had a chance to ramp up I have a bit more context. I think it would be more concise if you remove the fromRef and toRef parameters, since you've already separated the targets into distinct references.
Also, instead of adding this new target interface, semantically you should be able to write this api using a single resolver. If the goal is to enable different storage mediums to resolve to, you just need to teach the resolver how to resolve to those stores before it can get to the base.
For example:
type memoryResolver struct {
remotes.Resolver
}
func (memoryResolver) Fetcher(ctx context.Context, ref string) (remotes.Fetcher, error) {
return &memoryResolver{}, nil
}
func (memoryResolver) Pusher(ctx context.Context, ref string) (remotes.Pusher, error) {
return &memoryResolver{}, nil
}
func (memoryResolver) Resolve(ctx context.Context, ref string) (string, ocispec.Descriptor, error) {
return "", ocispec.Descriptor{}, nil
}
func (m *memoryResolver) Push(ctx context.Context, d ocispec.Descriptor) (content.Writer, error) {
return nil, nil
}
func (m *memoryResolver) Fetch(ctx context.Context, desc ocispec.Descriptor) (io.ReadCloser, error) {
return nil, nil
}
That way you only need to add one file per target, and you would not have to add any new interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(func (*memoryResolver)
works too I just was typing too fast. XD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example of what I mean:
package remotes
import (
"context"
"io"
"github.com/containerd/containerd/remotes"
ocispec "github.com/opencontainers/image-spec/specs-go/v1"
)
// For example say we implemented these
type httpResolver struct {
url string
remotes.Resolver
}
type memoryResolver struct {
memory []byte
remotes.Resolver
}
// The usage would look like this
func example_usage(ctx context.Context) {
Copy(ctx, "test.azurecr.io/ubuntu:latest", "memory://localcache")(ctx, httpResolver{}, memoryResolver{})
}
func Copy(ctx context.Context, fromRef string, toRef string) func(context.Context, remotes.Resolver, remotes.Resolver) (ocispec.Descriptor, error) {
return router{from: fromRef, to: toRef}.copy
}
type router struct {
from string
to string
}
// This would be the copy function, stays pretty high level, and only really relies on io.Copy, but any io impl could work
func (r router) copy(ctx context.Context, from remotes.Resolver, to remotes.Resolver) (ocispec.Descriptor, error) {
_, fromDesc, err := from.Resolve(ctx, r.from)
if err != nil {
return ocispec.Descriptor{}, err
}
_, toDesc, err := to.Resolve(ctx, r.to)
if err != nil {
return ocispec.Descriptor{}, err
}
fetcher, err := from.Fetcher(ctx, r.from)
if err != nil {
return ocispec.Descriptor{}, err
}
pusher, err := to.Pusher(ctx, r.to)
if err != nil {
return ocispec.Descriptor{}, err
}
reader, err := fetcher.Fetch(ctx, fromDesc)
if err != nil {
return ocispec.Descriptor{}, err
}
writer, err := pusher.Push(ctx, toDesc)
if err != nil {
return ocispec.Descriptor{}, err
}
_, err = io.Copy(writer, reader)
if err != nil {
return ocispec.Descriptor{}, err
}
return fromDesc, nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going a bit further, it could lead to something like this:
func example_usage(ctx context.Context) (CopyFunc, CopyFunc) {
initialPull := Copy(ctx, "test.azurecr.io/ubuntu:latest", "memory://localcache")
nextPulls := Copy(ctx, "memory://localcache", "*")
return initialPull, nextPulls
}
func example_runtime(ctx context.Context, initial, next CopyFunc, requests <-chan (remotes.Resolver)) (ocispec.Descriptor, error) {
cache := memoryResolver{}
desc, err := initial(ctx, httpResolver{}, cache)
if err != nil {
return ocispec.Descriptor{}, err
}
for {
select {
case r := <-requests:
next(ctx, cache, r)
case <-ctx.Done():
return desc, nil
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for adding more comments
Why sorry? That is what this is here for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got inspired by the conversation https://github.com/juliusl/piperesolver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with what you're trying to accomplish. Let me clarify what I am intending to communicate.
interfaces are a good thing, not a bad thing. It provides an API for new implementations (including external users, whether they contribute upstream or not), and makes testing materially easier. I like the concept of having an interface that says, "this is what a target (remote, local, over UUCP, whatever) looks like, anything that matches to it is good." It also makes it much easier for someone to grasp conceptually, reducing time-to-adopt and time-to-ramp-up.
I agree and I was not intending to express an opinion on the value of interfaces. If I am understanding you correctly, you are trying to design a Transport interface, and I feel like that is an essential type. In my opinion an example of a library that executes this well is GRPC. The reason a GRPC server is so easy to use is because you only need to pass it a Listener to get started, which is a built in interface. On the other end of the spectrum, compare this to something like WCF. WCF is pretty awesome on paper, it has pretty much any knob you can think for you to tune, and you can achieve some pretty high performance stuff with it. However, the trade off is the type calculus you have to do in order to derive even one way communication.
The weakness I'm trying to discuss is:
Target is suspiciously like remotes.Resolver because, as of now, that is precisely what it is. That is likely to change going forward
In it's current form it is only acting as a wrapper. Someone taking a dependency on this would still need to implement remotes.Resolver. When reviewing your PR I see no clear path of deriving a new Target under /pkg/target
. What I'm trying to demonstrate with my examples above is how you can achieve this. Using remotes.Resolver
as your "net.Listener
" you can start implementing a library of concrete resolvers. And that would carve out a clear path for implementers and consumers, which becomes a win/win. Any time someone implements a new remotes.Resolver they get a target,Target for free.
To address some of the comments on my examples, excuse me for I was sharing an unfiltered stream of consciousness while I was writing those examples. To nitpick a bit..
I now have something that returns itself, recursively. I admit I did some of it as well in the PR here (I do try to limit it to simple use cases when there is no value in adding a layer, not always successfully).
It's not actually recursive. Defining a function with func (structtype)
does not allocate a reference. The code point should be compiled statically. In golang struct{}
is zero sized, which is the closest you can get to free. So if you put those two details together and write:
func (structtype) new() *structtype {
return &structtype{}
you are only creating a single new reference, which would make this iterative (adding one). This is the idiomatic constructor pattern. In C#, it's the same as:
record Type(string foo) { }
initialPull := Copy(ctx, "test.azurecr.io/ubuntu:latest", "memory://localcache")
this actually wouldn't fit requirements. What if the path for the above is to reach to a local registry? Or to a file? The identifier reference of an artifact can and should be distinct from the target where we read or write it. It may also have attributes like authentication parameters or proxies or conditions or other things that we haven't thought of.
I would like to reframe this a bit. The cool part I was trying to illustrate wasn't in the strings. What I want to demonstrate is that if you stick to returning a monad from your api, it opens up a lot of possibilities down the line. The content of the strings have no meaning until a concrete implementation does something with it, so whatever strings I choose for my example would be an implementation detail. It's my version of lorem ipsum.
The entire focus here, I believe, is on that one primary UX: func Copy(). How do we make it so it is easy to understand, easy to use, and easy to write additional endpoints that can be passed to it.
If there are ways to simplify it for the consumer, I am 100% for it.
We are on the same page, and my goal in this review is to talk about how to achieve that. By using Target as the main parameter of your Copy API, it means you would need a Target interface before you can use Copy, so it's worth discussing if that on it's own is adding friction. Today, when you consume a remotes.Resolver
, there is an expectation that it will automagically find the correct host to authenticate/negotiate with. This is the main pain point I observe and from an extensibility standpoint, it's currently trying to do too much at once.
To summarize, from a pragmatic side, I have no problem with getting this in. From a usability standpoint, I would like to see more code in pkg/target
that paints a clearer picture on how to derive a target. In the code I am sharing above, my idea is that if you design the right remotes.Resolver
, you can have a single implementation which can cover many use cases and be converted to Target. (For example if you have code that can go from a Listener interface to a Target interface that would be interesting) Also, if you add a version of your API can accept remotes.Resolver
as parameters as well as Target
parameters, it will make it easier to consume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got inspired by the conversation https://github.com/juliusl/piperesolver
Oh, nice. I like that. Being able to just pipe from one to the other. It is conceptually similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with what you're trying to accomplish. Let me clarify what I am intending to communicate.
Thanks. You help (and patience) is appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that was a solid write-up, thank you.
Let's try to focus on your latter points, as they focus on the UX.
By using Target as the main parameter of your Copy API, it means you would need a Target interface before you can use Copy, so it's worth discussing if that on it's own is adding friction. Today, when you consume a remotes.Resolver, there is an expectation that it will automagically find the correct host to authenticate/negotiate with. This is the main pain point I observe and from an extensibility standpoint, it's currently trying to do too much at once.
I think you are saying here that adding a Target
(or remotes.Resolver
, or anything that functionally represents a "target") adds a burden. So I have to derive a target before I can use it. So I cannot do the (quite common):
Copy("docker.io/library/nginx:latest", "my/local/path/nginx:latest")
Instead, I first need to:
- get a
Target
fordocker.io
- get a
Target
for my local filesystem - pass those and the refs into
Copy()
If that is your objection, I get the point. The Target
gives you a lot of options for doing things - new types of targets, authentication per target, even having two different auth schemes for the same target (one for the "from" and one for the "to"), using a specific target even if it is different than the URL (e.g. pulling docker.io/library/nginx:latest
from quay.io
or some local registry or filesystem). But that is flexibility, which comes with a burden.
I would happily wrap Copy()
with a simple CopyRefs()
that derives the Target
where possible, and then expose a RefToTarget()
that basically figures out the "default" target from the ref. Of course, if you want auth, or different hosts, etc, you will need to go to your own Target
.
Is that where you were going?
Also, if you add a version of your API can accept remotes.Resolver as parameters as well as Target parameters, it will make it easier to consume.
But more complicated, as we now have lots of variants. I get why Target
"feels" a bit strange, with all of the years of remotes.Resolver
under the belt. As long as we are confident that remotes.Resolver
is our future, and we are tied to it (as it is in github.com/containerd/containerd), and we won't need functionality beyond what it has / will have, then sure, we can replace Target
with just remotes.Resolver
. I am not sure that is true, though. We want to get away from it, as far as I understand, while supporting pulling it in. I would ask @jdolitsky and @SteveLasker to weigh in on this specific part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see this get in, and make incremental improvements as issues arise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to resolve some merge conflicts that need to be resolved. @shizhMSFT are you ok tracking the changes as additional items on this PR?
@deitch, can you help resolve the merge conflicts so we can add the copy APIs? |
Yup, all taken care of |
Due to the potentially large impact (with greatness abound), @shizhMSFT, can you review as well before we hit the [big green button]? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is accepted as the basis of future development. Please also resolve the minor comments.
sort.Slice(descriptors, func(i, j int) bool { | ||
return descriptors[i].Digest < descriptors[j].Digest | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to sort descriptors? We may need to keep the orignal order.
Related: oras-project/oras#304
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency. Although, in retrospect, I don't see why you might not want two distinct manifests for two distinct orderings. That is fair enough. If you want to open a PR to remove the sorting, I think that would work.
With the approves, what’s the next step here? |
Before this was merged, with |
@luisdavim is there a regression or are you looking for a sample on usage? /cc @deitch https://github.com/oras-project/oras-go/blob/main/examples/simple/simple_push_pull.go |
It would seem so. At least from the last stable tag. I've also tried the advanced example and couldn't get it to work. First it requires a config file to be passed, them if the file contains empty json I'm not at my computer but I can post some logs here later. |
So, the issue with the example is solved with #23 The regression might not really be a regression, and it's possible that is that it's my misunderstanding but, the signature for func Pull(ctx context.Context, resolver remotes.Resolver, ref string, ingester content.Ingester, opts ...PullOpt) (ocispec.Descriptor, []ocispec.Descriptor, error) whilst the signature for func Copy(ctx context.Context, from target.Target, fromRef string, to target.Target, toRef string, opts ...CopyOpt) (ocispec.Descriptor, error) The main difference being that |
Hey @luisdavim I am catching up on some issues now, so gong to ask that you open a new one for that (if you haven't yet, which I might discover in 5 mins). To explain the conceptual difference, the I had thought we had an example like that, but if not, we should include it. We wanted an option in If there isn't an open issue, let's get one open and move the whole discussion there, and fix it correctly. |
Yes, both WithPullBaseHandler and WithPullCallbackHandler still exist and, having rechecked the implementation, still are called. Granted, these should be renamed to I wouldn't object to a PR that wraps those so that you might have a |
Thanks for the reply @deitch . I have 2 issues open, one is about discovering what exists on the remote and the other about getting the metadata from the manifests and layers. My use case is a plugin management system, a bit like |
One could make an argument that walking the tree is out of scope for oras. But I wouldn't mind a simple utility function here that does it. Even if it is out of scope, I think the option to grab some data on the fly while copy is happening is 100% within scope. Why not open that PR, and we will approve it and get it in? |
This is an implementation of the "copy API". It solves several outstanding issues with the oras go library, with the intent of making it easier to work with, easier to understand, and more flexible.
Push
andPull
, there is a single funcCopy()
. You copy from a ref in oneTarget
to a ref (which may be the same as the first) in anotherTarget
Target
is suspiciously like remotes.Resolver because, as of now, that is precisely what it is. That is likely to change going forwardThis makes the interface much simpler to use and understand.
This also opens possibilities like using different URLs or different authentication for different targets. You treat a local bunch of files as a target (from or to) just like a remote registry. Memory, file, registry, oci layout, all are just targets.
The directory
examples/advanced/
contains some good examples of how this is used.