Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy API #8

Merged
merged 1 commit into from
Sep 8, 2021
Merged

Copy API #8

merged 1 commit into from
Sep 8, 2021

Conversation

deitch
Copy link
Contributor

@deitch deitch commented Jun 27, 2021

This is an implementation of the "copy API". It solves several outstanding issues with the oras go library, with the intent of making it easier to work with, easier to understand, and more flexible.

  • Instead of Push and Pull, there is a single func Copy(). You copy from a ref in one Target to a ref (which may be the same as the first) in another Target
  • Target is suspiciously like remotes.Resolver because, as of now, that is precisely what it is. That is likely to change going forward

This makes the interface much simpler to use and understand.

This also opens possibilities like using different URLs or different authentication for different targets. You treat a local bunch of files as a target (from or to) just like a remote registry. Memory, file, registry, oci layout, all are just targets.

The directory examples/advanced/ contains some good examples of how this is used.

@deitch
Copy link
Contributor Author

deitch commented Jun 28, 2021

The core interface for a Target looks like this:

import (
	"github.com/containerd/containerd/remotes"
)

type Target interface {
	remotes.Resolver
}

As discussed on our call, I ran an experiment eliminating any direct dependencies on containerd as follows:

package target

import (
	"context"
	"io"
	"time"

	"github.com/opencontainers/go-digest"
	ocispec "github.com/opencontainers/image-spec/specs-go/v1"
)

// Target represents a place to which one can send/push or retrieve/pull artifacts.
// Anything that implements the Target interface can be used as a place to send or
// retrieve artifacts.
type Target interface {
	Resolve(ctx context.Context, ref string) (name string, desc ocispec.Descriptor, err error)
	Fetcher(ctx context.Context, ref string) (Fetcher, error)
	Pusher(ctx context.Context, ref string) (Pusher, error)
}

type Fetcher interface {
	Fetch(ctx context.Context, desc ocispec.Descriptor) (io.ReadCloser, error)
}

type Pusher interface {
	Push(ctx context.Context, d ocispec.Descriptor) (Writer, error)
}

type Writer interface {
	io.WriteCloser
	Digest() digest.Digest
	Commit(ctx context.Context, size int64, expected digest.Digest, opts ...Opt) error
	Status() (Status, error)
	Truncate(size int64) error
}

type Status struct {
	Ref       string
	Offset    int64
	Total     int64
	Expected  digest.Digest
	StartedAt time.Time
	UpdatedAt time.Time
}

type Opt func(*Info) error

func WithLabels(labels map[string]string) Opt {
	return func(info *Info) error {
		info.Labels = labels
		return nil
	}
}

type Info struct {
	Digest    digest.Digest
	Size      int64
	CreatedAt time.Time
	UpdatedAt time.Time
	Labels    map[string]string
}

and then have a wrapper to get from a containerd one to an oras one:

package target

import (
	"context"

	"github.com/containerd/containerd/content"
	"github.com/containerd/containerd/remotes"
	"github.com/opencontainers/go-digest"
	ocispec "github.com/opencontainers/image-spec/specs-go/v1"
)

func FromContainerdResolver(resolver remotes.Resolver) Target {
	return &ContainerdResolverTarget{resolver: resolver}
}

type ContainerdResolverTarget struct {
	resolver remotes.Resolver
}

type containerdPusher struct {
	pusher remotes.Pusher
}

type containerdWriter struct {
	writer content.Writer
}

func (c *ContainerdResolverTarget) Resolve(ctx context.Context, ref string) (name string, desc ocispec.Descriptor, err error) {
	return c.resolver.Resolve(ctx, ref)
}
func (c *ContainerdResolverTarget) Fetcher(ctx context.Context, ref string) (Fetcher, error) {
	return c.resolver.Fetcher(ctx, ref)
}
func (c *ContainerdResolverTarget) Pusher(ctx context.Context, ref string) (Pusher, error) {
	p, err := c.resolver.Pusher(ctx, ref)
	if err != nil {
		return nil, err
	}
	return &containerdPusher{pusher: p}, nil
}

func (c *containerdPusher) Push(ctx context.Context, d ocispec.Descriptor) (Writer, error) {
	w, err := c.pusher.Push(ctx, d)
	if err != nil {
		return nil, err
	}
	return &containerdWriter{writer: w}, nil
}

func (c *containerdWriter) Write(p []byte) (n int, err error) {
	return c.writer.Write(p)
}
func (c *containerdWriter) Close() error {
	return c.writer.Close()
}
func (c *containerdWriter) Digest() digest.Digest {
	return c.writer.Digest()
}
func (c *containerdWriter) Commit(ctx context.Context, size int64, expected digest.Digest, opts ...Opt) error {
	return c.writer.Commit(ctx, size, expected)
}
func (c *containerdWriter) Status() (Status, error) {
	s, err := c.writer.Status()
	if err != nil {
		return Status{}, err
	}
	return Status{
		Ref:       s.Ref,
		Offset:    s.Offset,
		Total:     s.Total,
		Expected:  s.Expected,
		StartedAt: s.StartedAt,
		UpdatedAt: s.UpdatedAt,
	}, nil
}
func (c *containerdWriter) Truncate(size int64) error {
	return c.writer.Truncate(size)
}

so an existing containerd-compliant one would work.

However, I wasn't sure it really was worth all of that duplication.

@shizhMSFT
Copy link
Contributor

I see the problem with Go here. Basically, interfaces are not equal even if they have the same methods.

package main

import "fmt"

type Animal interface {
	Say()
}

type AnimalFarm interface {
	Produce() Animal
}

type Duck interface {
	Say()
}

type DuckFarm interface {
	Produce() Duck
}

type farm struct{}

func (farm) Produce() Duck {
	return duck{}
}

type duck struct{}

func (duck) Say() {
	fmt.Println("quak!")
}

func main() {
	var animalFarm AnimalFarm = farm{}
	duck := animalFarm.Produce()
	duck.Say()
}

The above code reports error even if Duck and Animal should be equivalent.

./prog.go:34:6: cannot use farm (type DuckFarm) as type AnimalFarm in assignment:
	DuckFarm does not implement AnimalFarm (wrong type for Produce method)
		have Produce() Duck
		want Produce() Animal

I think the duplication is temporary. Once we have our own implementation equivalent to remotes, content, images packages, the code will be cleaner and the ContainerdResolverTarget related code can be cleaned.

@deitch
Copy link
Contributor Author

deitch commented Jul 6, 2021

Agreed @shizhMSFT ; I wasn't sure it is worth it, though. I have a commit with all of the above, and it works (I didn't just type it into the PR comments :-) ). I just wasn't sure we wanted all of that duplication.


// Copy copy a ref from one target.Target to a ref in another target.Target. If toRef is blank, reuses fromRef
// Returns the root
// Descriptor of the copied item. Can use the root to retrieve child elements from target.Target.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use the root to retrieve child elements from target.Target

Is there an example or test that displays how to do this? For example, obtain a list of blob descriptors (layers) referenced by a root manifest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one in them, removed it for simplicity.

We might want to add a real example. But for now, there is one straight out of containerd, see images.Children()

Needless to say, that takes a content.Provider rather than a remotes.Fetcher, but we actually provider ProviderWrapper

Eventually, we might just want to replace images.ChildrenHandler with our own; one more reduction in containerd dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also does the work in #17 help with this?

toParts := strings.SplitN(toStr, ":", 2)
switch fromParts[0] {
case "files":
fromFile := content.NewFile("")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deitch a question for my understanding - is this content.NewFile (which creates a new File interface) basically replacing / renaming what is currently called a FileStore? ->

type FileStore struct {
. Are there subtle differences between the two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed the continuation of it. FileStore actually implements content.Provider and content.Ingester, while File implements remote.Resolver.

@sajayantony
Copy link
Contributor

/cc @juliusl

Copy link

@juliusl juliusl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seem pretty straightforward, if it's working I say we merge.

@SteveLasker
Copy link
Contributor

@deitch, I'm fully supportive of #8, and defer to @shizhMSFT and @jdolitsky for the impact of merging. I would really like to see the split get complete so we can start adding the artfifact.manifest be implemented within the copy apis.

@juliusl
Copy link

juliusl commented Aug 26, 2021

@shizhMSFT I got this to compile

package main

import "fmt"

type Animal interface {
	Say()
}

type AnimalFarm interface {
	Produce() Animal
}

type Duck interface {
	Say()
}

type DuckFarm interface {
	Produce() Duck
}

type farm struct{}

// Was:
// ````
// func (farm) Produce() Duck {
// 	return duck{}
// }
// ````

func (farm) Produce() Animal {
	return farm{}.produce()
}

func (farm) produce() Duck {
	return duck{}
}

type duck struct{}

func (duck) Say() {
	fmt.Println("quak!")
}

func main() {
	var animalFarm AnimalFarm = farm{}
	duck := animalFarm.Produce()
	duck.Say()
}

)

// ProviderWrapper wraps a remote.Fetcher to make a content.Provider, which is useful for things
type ProviderWrapper struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could get rid of this wrapper type, if you do it like this:

type fetcherReaderAt struct {
	ctx     context.Context
	fetcher remotes.Fetcher
	desc    ocispec.Descriptor
	rc      io.ReadCloser
	offset  int64
}

func (f fetcherReaderAt) ReaderAt(ctx context.Context, desc ocispec.Descriptor) (content.ReaderAt, error) {
	if f.fetcher == nil {
		return nil, errors.New("no Fetcher provided")
	}
	return &fetcherReaderAt{
		ctx:     ctx,
		fetcher: f.fetcher,
		desc:    desc,
		offset:  0,
	}, nil
}

and then in copy.go

	handlers = append(handlers,
		fetchHandler,
		picker,
		images.ChildrenHandler(&fetcherReaderAt{fetcher: store}),
	)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that buy us? You now have a fetcherReaderAt, which has a call ReaderAt(), which returns... another fetcherReaderAt? Too easy to mess things up. I think it is better to have a simple clean structure that has a single call ReaderAt(), which returns what we want.

// Copy copy a ref from one target.Target to a ref in another target.Target. If toRef is blank, reuses fromRef
// Returns the root
// Descriptor of the copied item. Can use the root to retrieve child elements from target.Target.
func Copy(ctx context.Context, from target.Target, fromRef string, to target.Target, toRef string, opts ...CopyOpt) (ocispec.Descriptor, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for adding more comments, but now that I've had a chance to ramp up I have a bit more context. I think it would be more concise if you remove the fromRef and toRef parameters, since you've already separated the targets into distinct references.

Also, instead of adding this new target interface, semantically you should be able to write this api using a single resolver. If the goal is to enable different storage mediums to resolve to, you just need to teach the resolver how to resolve to those stores before it can get to the base.

For example:

type memoryResolver struct {
	remotes.Resolver
}

func (memoryResolver) Fetcher(ctx context.Context, ref string) (remotes.Fetcher, error) {
	return &memoryResolver{}, nil
}

func (memoryResolver) Pusher(ctx context.Context, ref string) (remotes.Pusher, error) {
	return &memoryResolver{}, nil
}

func (memoryResolver) Resolve(ctx context.Context, ref string) (string, ocispec.Descriptor, error) {
	return "", ocispec.Descriptor{}, nil
}

func (m *memoryResolver) Push(ctx context.Context, d ocispec.Descriptor) (content.Writer, error) {
	return nil, nil
}

func (m *memoryResolver) Fetch(ctx context.Context, desc ocispec.Descriptor) (io.ReadCloser, error) {
	return nil, nil
}

That way you only need to add one file per target, and you would not have to add any new interfaces.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(func (*memoryResolver) works too I just was typing too fast. XD)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example of what I mean:

package remotes

import (
	"context"
	"io"

	"github.com/containerd/containerd/remotes"
	ocispec "github.com/opencontainers/image-spec/specs-go/v1"
)

// For example say we implemented these 
type httpResolver struct {
	url string
	remotes.Resolver
}

type memoryResolver struct {
	memory []byte
	remotes.Resolver
}

// The usage would look like this 
func example_usage(ctx context.Context) {
	Copy(ctx, "test.azurecr.io/ubuntu:latest", "memory://localcache")(ctx, httpResolver{}, memoryResolver{})
}

func Copy(ctx context.Context, fromRef string, toRef string) func(context.Context, remotes.Resolver, remotes.Resolver) (ocispec.Descriptor, error) {
	return router{from: fromRef, to: toRef}.copy
}

type router struct {
	from string
	to   string
}

// This would be the copy function, stays pretty high level, and only really relies on io.Copy, but any io impl could work
func (r router) copy(ctx context.Context, from remotes.Resolver, to remotes.Resolver) (ocispec.Descriptor, error) {
	_, fromDesc, err := from.Resolve(ctx, r.from)
	if err != nil {
		return ocispec.Descriptor{}, err
	}

	_, toDesc, err := to.Resolve(ctx, r.to)
	if err != nil {
		return ocispec.Descriptor{}, err
	}

	fetcher, err := from.Fetcher(ctx, r.from)
	if err != nil {
		return ocispec.Descriptor{}, err
	}

	pusher, err := to.Pusher(ctx, r.to)
	if err != nil {
		return ocispec.Descriptor{}, err
	}

	reader, err := fetcher.Fetch(ctx, fromDesc)
	if err != nil {
		return ocispec.Descriptor{}, err
	}

	writer, err := pusher.Push(ctx, toDesc)
	if err != nil {
		return ocispec.Descriptor{}, err
	}

	_, err = io.Copy(writer, reader)
	if err != nil {
		return ocispec.Descriptor{}, err
	}

	return fromDesc, nil
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going a bit further, it could lead to something like this:

func example_usage(ctx context.Context) (CopyFunc, CopyFunc) {
	initialPull := Copy(ctx, "test.azurecr.io/ubuntu:latest", "memory://localcache")
	nextPulls := Copy(ctx, "memory://localcache", "*")

	return initialPull, nextPulls
}

func example_runtime(ctx context.Context, initial, next CopyFunc, requests <-chan (remotes.Resolver)) (ocispec.Descriptor, error) {
	cache := memoryResolver{}
	desc, err := initial(ctx, httpResolver{}, cache)
	if err != nil {
		return ocispec.Descriptor{}, err
	}

	for {
		select {
		case r := <-requests:
			next(ctx, cache, r)
		case <-ctx.Done():
			return desc, nil
		}
	}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for adding more comments

Why sorry? That is what this is here for.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got inspired by the conversation https://github.com/juliusl/piperesolver

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with what you're trying to accomplish. Let me clarify what I am intending to communicate.

interfaces are a good thing, not a bad thing. It provides an API for new implementations (including external users, whether they contribute upstream or not), and makes testing materially easier. I like the concept of having an interface that says, "this is what a target (remote, local, over UUCP, whatever) looks like, anything that matches to it is good." It also makes it much easier for someone to grasp conceptually, reducing time-to-adopt and time-to-ramp-up.

I agree and I was not intending to express an opinion on the value of interfaces. If I am understanding you correctly, you are trying to design a Transport interface, and I feel like that is an essential type. In my opinion an example of a library that executes this well is GRPC. The reason a GRPC server is so easy to use is because you only need to pass it a Listener to get started, which is a built in interface. On the other end of the spectrum, compare this to something like WCF. WCF is pretty awesome on paper, it has pretty much any knob you can think for you to tune, and you can achieve some pretty high performance stuff with it. However, the trade off is the type calculus you have to do in order to derive even one way communication.

The weakness I'm trying to discuss is:

Target is suspiciously like remotes.Resolver because, as of now, that is precisely what it is. That is likely to change going forward

In it's current form it is only acting as a wrapper. Someone taking a dependency on this would still need to implement remotes.Resolver. When reviewing your PR I see no clear path of deriving a new Target under /pkg/target. What I'm trying to demonstrate with my examples above is how you can achieve this. Using remotes.Resolver as your "net.Listener" you can start implementing a library of concrete resolvers. And that would carve out a clear path for implementers and consumers, which becomes a win/win. Any time someone implements a new remotes.Resolver they get a target,Target for free.

To address some of the comments on my examples, excuse me for I was sharing an unfiltered stream of consciousness while I was writing those examples. To nitpick a bit..

I now have something that returns itself, recursively. I admit I did some of it as well in the PR here (I do try to limit it to simple use cases when there is no value in adding a layer, not always successfully).

It's not actually recursive. Defining a function with func (structtype) does not allocate a reference. The code point should be compiled statically. In golang struct{} is zero sized, which is the closest you can get to free. So if you put those two details together and write:

func (structtype) new() *structtype {
return &structtype{}

you are only creating a single new reference, which would make this iterative (adding one). This is the idiomatic constructor pattern. In C#, it's the same as:

record Type(string foo) { }

initialPull := Copy(ctx, "test.azurecr.io/ubuntu:latest", "memory://localcache")
this actually wouldn't fit requirements. What if the path for the above is to reach to a local registry? Or to a file? The identifier reference of an artifact can and should be distinct from the target where we read or write it. It may also have attributes like authentication parameters or proxies or conditions or other things that we haven't thought of.

I would like to reframe this a bit. The cool part I was trying to illustrate wasn't in the strings. What I want to demonstrate is that if you stick to returning a monad from your api, it opens up a lot of possibilities down the line. The content of the strings have no meaning until a concrete implementation does something with it, so whatever strings I choose for my example would be an implementation detail. It's my version of lorem ipsum.

The entire focus here, I believe, is on that one primary UX: func Copy(). How do we make it so it is easy to understand, easy to use, and easy to write additional endpoints that can be passed to it.

If there are ways to simplify it for the consumer, I am 100% for it.

We are on the same page, and my goal in this review is to talk about how to achieve that. By using Target as the main parameter of your Copy API, it means you would need a Target interface before you can use Copy, so it's worth discussing if that on it's own is adding friction. Today, when you consume a remotes.Resolver, there is an expectation that it will automagically find the correct host to authenticate/negotiate with. This is the main pain point I observe and from an extensibility standpoint, it's currently trying to do too much at once.

To summarize, from a pragmatic side, I have no problem with getting this in. From a usability standpoint, I would like to see more code in pkg/target that paints a clearer picture on how to derive a target. In the code I am sharing above, my idea is that if you design the right remotes.Resolver, you can have a single implementation which can cover many use cases and be converted to Target. (For example if you have code that can go from a Listener interface to a Target interface that would be interesting) Also, if you add a version of your API can accept remotes.Resolver as parameters as well as Target parameters, it will make it easier to consume.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got inspired by the conversation https://github.com/juliusl/piperesolver

Oh, nice. I like that. Being able to just pipe from one to the other. It is conceptually similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with what you're trying to accomplish. Let me clarify what I am intending to communicate.

Thanks. You help (and patience) is appreciated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that was a solid write-up, thank you.

Let's try to focus on your latter points, as they focus on the UX.

By using Target as the main parameter of your Copy API, it means you would need a Target interface before you can use Copy, so it's worth discussing if that on it's own is adding friction. Today, when you consume a remotes.Resolver, there is an expectation that it will automagically find the correct host to authenticate/negotiate with. This is the main pain point I observe and from an extensibility standpoint, it's currently trying to do too much at once.

I think you are saying here that adding a Target (or remotes.Resolver, or anything that functionally represents a "target") adds a burden. So I have to derive a target before I can use it. So I cannot do the (quite common):

Copy("docker.io/library/nginx:latest", "my/local/path/nginx:latest")

Instead, I first need to:

  • get a Target for docker.io
  • get a Target for my local filesystem
  • pass those and the refs into Copy()

If that is your objection, I get the point. The Target gives you a lot of options for doing things - new types of targets, authentication per target, even having two different auth schemes for the same target (one for the "from" and one for the "to"), using a specific target even if it is different than the URL (e.g. pulling docker.io/library/nginx:latest from quay.io or some local registry or filesystem). But that is flexibility, which comes with a burden.

I would happily wrap Copy() with a simple CopyRefs() that derives the Target where possible, and then expose a RefToTarget() that basically figures out the "default" target from the ref. Of course, if you want auth, or different hosts, etc, you will need to go to your own Target.

Is that where you were going?

Also, if you add a version of your API can accept remotes.Resolver as parameters as well as Target parameters, it will make it easier to consume.

But more complicated, as we now have lots of variants. I get why Target "feels" a bit strange, with all of the years of remotes.Resolver under the belt. As long as we are confident that remotes.Resolver is our future, and we are tied to it (as it is in github.com/containerd/containerd), and we won't need functionality beyond what it has / will have, then sure, we can replace Target with just remotes.Resolver. I am not sure that is true, though. We want to get away from it, as far as I understand, while supporting pulling it in. I would ask @jdolitsky and @SteveLasker to weigh in on this specific part.

Copy link
Contributor

@jdolitsky jdolitsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this get in, and make incremental improvements as issues arise

Copy link
Contributor

@sajayantony sajayantony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to resolve some merge conflicts that need to be resolved. @shizhMSFT are you ok tracking the changes as additional items on this PR?

@SteveLasker
Copy link
Contributor

@deitch, can you help resolve the merge conflicts so we can add the copy APIs?

Signed-off-by: Avi Deitcher <avi@deitcher.net>
@deitch
Copy link
Contributor Author

deitch commented Sep 2, 2021

Yup, all taken care of

@SteveLasker
Copy link
Contributor

Due to the potentially large impact (with greatness abound), @shizhMSFT, can you review as well before we hit the [big green button]?

Copy link
Contributor

@shizhMSFT shizhMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is accepted as the basis of future development. Please also resolve the minor comments.

pkg/content/file.go Show resolved Hide resolved
Comment on lines +529 to +531
sort.Slice(descriptors, func(i, j int) bool {
return descriptors[i].Digest < descriptors[j].Digest
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to sort descriptors? We may need to keep the orignal order.

Related: oras-project/oras#304

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency. Although, in retrospect, I don't see why you might not want two distinct manifests for two distinct orderings. That is fair enough. If you want to open a PR to remove the sorting, I think that would work.

pkg/content/memory.go Show resolved Hide resolved
pkg/content/oci.go Show resolved Hide resolved
pkg/content/readerat.go Show resolved Hide resolved
@SteveLasker
Copy link
Contributor

With the approves, what’s the next step here?
Are we merging with issues to resolve the opening questions? Or, will the outstanding items be addressed before merging?

@luisdavim
Copy link
Contributor

Before this was merged, with oras.Pull() I'd get a list of the pulled artefacts, how would I get that with oras.Copy()?

@sajayantony
Copy link
Contributor

sajayantony commented Sep 9, 2021

@luisdavim is there a regression or are you looking for a sample on usage? /cc @deitch

https://github.com/oras-project/oras-go/blob/main/examples/simple/simple_push_pull.go

@luisdavim
Copy link
Contributor

It would seem so. At least from the last stable tag.

I've also tried the advanced example and couldn't get it to work. First it requires a config file to be passed, them if the file contains empty json {} I get a 405 from the registry and one I got passed that I get a 404 after the layer is uploaded. I'm using AWS ECR for my test...

I'm not at my computer but I can post some logs here later.

@luisdavim
Copy link
Contributor

So, the issue with the example is solved with #23

The regression might not really be a regression, and it's possible that is that it's my misunderstanding but, the signature for Pull was:

func Pull(ctx context.Context, resolver remotes.Resolver, ref string, ingester content.Ingester, opts ...PullOpt) (ocispec.Descriptor, []ocispec.Descriptor, error)

whilst the signature for Copy is:

func Copy(ctx context.Context, from target.Target, fromRef string, to target.Target, toRef string, opts ...CopyOpt) (ocispec.Descriptor, error)

The main difference being that Pull was returning the list of pulled layers []ocispec.Descriptor and Copy doesn't return that, so, how do I get access to it now?

@deitch deitch deleted the copy-api branch September 10, 2021 07:54
@deitch
Copy link
Contributor Author

deitch commented Sep 10, 2021

Hey @luisdavim I am catching up on some issues now, so gong to ask that you open a new one for that (if you haven't yet, which I might discover in 5 mins).

To explain the conceptual difference, the Pull() returning layers was always a bit of a weird thing. Why should it return layers and not config? Or if it is an index, why not some or all of its children? The cleaner implementation is to just return the Descriptor for the copied resource, and then use that to walk the tree (via either the From or the To target), if that is what you want.

I had thought we had an example like that, but if not, we should include it.

We wanted an option in CopyOpt to pass the blobs through a handler func that could then send things outward. It is much more flexible and consistent than arbitrarily returning the layers. Working off of top of my head (which rarely is good idea), I cannot recall if we implemented it or not. The only value is to allow you to do it in transit, vs having to go to the From or To target (before or after) and process it there. That doesn't add much if one of them is a Memory or File target, but if both are something remote, it can be helpful.

If there isn't an open issue, let's get one open and move the whole discussion there, and fix it correctly.

@deitch
Copy link
Contributor Author

deitch commented Sep 10, 2021

We wanted an option in CopyOpt to pass the blobs through a handler func

Yes, both WithPullBaseHandler and WithPullCallbackHandler still exist and, having rechecked the implementation, still are called. Granted, these should be renamed to WithCopy* instead of WithPull*, but they still function.

I wouldn't object to a PR that wraps those so that you might have a WithLayerDescriptors(layers []ocispec.Descriptor) that you can pass, and when it is done, layers is populated. It just would wrap the above, but make it convenient for reuse.

@luisdavim
Copy link
Contributor

luisdavim commented Sep 10, 2021

Thanks for the reply @deitch .
Yeah, you are right, I don't need the copy function to return the list of layers but a way to walk the tree, as you mentioned.

I have 2 issues open, one is about discovering what exists on the remote and the other about getting the metadata from the manifests and layers.

My use case is a plugin management system, a bit like krew and I was looking for being able to read the manifest annotations and then loop through the layers and get their annotations as well.

@deitch
Copy link
Contributor Author

deitch commented Sep 10, 2021

walk the tree

One could make an argument that walking the tree is out of scope for oras. But I wouldn't mind a simple utility function here that does it. Even if it is out of scope, I think the option to grab some data on the fly while copy is happening is 100% within scope.

Why not open that PR, and we will approve it and get it in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants