image: Refactor to use cas/ref engines instead of walkers #5

Open
wants to merge 14 commits into
from

Projects

None yet

5 participants

@wking
Contributor
wking commented Sep 16, 2016

Abstract out the details of the image-layout format, since the validators and unpackers shouldn't care about that sort of backend stuff. This should make it easy to build backends based on zip files, HTTP, the Docker registry format, etc.

Carried from opencontainers/image-spec#159. The outstanding issue there was what to do about tar handling. I'm fine with this PR as it stands, but I've floated a few alternatives in case the maintainers want to pick an alternative path.

@runcom
Contributor
runcom commented Sep 16, 2016
@wking wking referenced this pull request in opencontainers/image-spec Sep 16, 2016
Closed

image: Refactor to use cas/ref engines instead of walkers #159

@wking
Contributor
wking commented Sep 17, 2016

I've pushed 968748490c4fea with:

  • An ‘=’ → ‘:=’ fix in cmd/oci-refs/list.go's printName.
  • O_RDWR → O_WRONLY in image/layout/tar.go's CreateTarFile.
  • A new commit adding directory-backed engines (to compliment the
    previous tar-backed engines). The directory-backed engines are much
    nicer ;).

On Fri, Sep 16, 2016 at 01:53:40AM -0700, W. Trevor King wrote:

Abstract out the details of the image-layout format, since the validators and unpackers shouldn't care about that sort of backend stuff. This should make it easy to build backends based on zip files, HTTP, the Docker registry format, etc.

Carried from opencontainers/image-spec#159. The outstanding issue there was what to do about tar handling. I'm fine with this PR as it stands, but I've floated a few alternatives in case the maintainers want to pick an alternative path.

You can view, comment on, or merge this pull request online at:

#5

-- Commit Summary --

  • Makefile: Add a pattern rule for oci-* commands
  • image/cas: Add a generic CAS interface
  • image/refs: Add a generic name-based reference interface
  • specs-go: Add ImageLayoutVersion and check oci-layout in tar engines
  • image/layout/tar.go: Add TarEntryByName
  • image/cas/put: Add a PutJSON helper
  • image/cas: Implement Engine.Put
  • vendor: Bundle golang.org/x/net/context
  • image/*/interface: Add unstable warnings to Engines
  • image/layout/tar: Add a CreateTarFile helper
  • image/refs: Implement Engine.Put
  • image: Refactor to use cas/ref engines instead of walkers
  • cmd: Document the cas, refs, and init commands

-- File Changes --

M .gitignore (5)
M Makefile (20)
A cmd/oci-cas/get.go (93)
A cmd/oci-cas/main.go (38)
A cmd/oci-cas/oci-cas-get.1.md (27)
A cmd/oci-cas/oci-cas-put.1.md (27)
A cmd/oci-cas/oci-cas.1.md (47)
A cmd/oci-cas/put.go (83)
M cmd/oci-create-runtime-bundle/main.go (9)
M cmd/oci-create-runtime-bundle/oci-create-runtime-bundle.1.md (4)
A cmd/oci-image-init/image_layout.go (57)
A cmd/oci-image-init/main.go (37)
A cmd/oci-image-init/oci-image-init-image-layout.1.md (27)
A cmd/oci-image-init/oci-image-init.1.md (33)
A cmd/oci-image-tools.7.md (40)
M cmd/oci-image-validate/main.go (14)
M cmd/oci-image-validate/oci-image-validate.1.md (8)
A cmd/oci-refs/get.go (78)
A cmd/oci-refs/list.go (75)
A cmd/oci-refs/main.go (39)
A cmd/oci-refs/oci-refs-get.1.md (27)
A cmd/oci-refs/oci-refs-list.1.md (27)
A cmd/oci-refs/oci-refs-put.1.md (27)
A cmd/oci-refs/oci-refs.1.md (55)
A cmd/oci-refs/put.go (81)
M cmd/oci-unpack/main.go (13)
M cmd/oci-unpack/oci-unpack.1.md (4)
M glide.lock (10)
M image/autodetect.go (3)
A image/cas/interface.go (47)
A image/cas/layout/interface.go (25)
A image/cas/layout/main.go (37)
A image/cas/layout/tar.go (108)
A image/cas/put.go (47)
M image/config.go (69)
M image/descriptor.go (115)
M image/image.go (202)
A image/layout/layout.go (22)
A image/layout/tar.go (267)
M image/manifest.go (119)
M image/manifest_test.go (45)
A image/refs/interface.go (82)
A image/refs/layout/main.go (37)
A image/refs/layout/tar.go (135)
D image/walker.go (118)
A vendor/golang.org/x/net/LICENSE (27)
A vendor/golang.org/x/net/PATENTS (22)
A vendor/golang.org/x/net/context/context.go (156)
A vendor/golang.org/x/net/context/go17.go (72)
A vendor/golang.org/x/net/context/pre_go17.go (300)

-- Patch Links --

https://github.com/opencontainers/image-tools/pull/5.patch
https://github.com/opencontainers/image-tools/pull/5.diff

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#5

This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

@wking
Contributor
wking commented Sep 17, 2016

Pushed 90c4fea7c9efa7 adding:

  • oci-cas delete …

  • oci-refs delete …

  • DirDelete to avoid 1:

    $ make lint
    checking lint
    image/cas/layout/dir.go:108:⚠️ duplicate of image/refs/layout/dir.go:156-167 (dupl)
    image/refs/layout/dir.go:156:⚠️ duplicate of image/cas/layout/dir.go:108-119 (dupl)

@xiekeyang
Contributor
xiekeyang commented Sep 22, 2016 edited

@wking

This patch seems to be too great to be merged sooner. Could you please split it and submit one by one commit? Now we need the method instead of walking, and go forward to some other functionality based on it. I'm working on my local side xiekeyang@ff04013 referring to your implementation, which is temporary solution and should wait for your patch merged. So I hope your commit can be merged. Thanks so much.

@wking
Contributor
wking commented Sep 22, 2016

On Wed, Sep 21, 2016 at 05:27:46PM -0700, xiekeyang wrote:

Could you please split it and submit one by one commit?

First commit submitted separately as #28. If/when that lands, I'll
file a PR for the next commit, etc. (unless this PR has already landed
in its entirety ;).

@wking
Contributor
wking commented Oct 11, 2016

On Wed, Sep 21, 2016 at 10:54:19PM -0700, W. Trevor King wrote:

On Wed, Sep 21, 2016 at 05:27:46PM -0700, xiekeyang wrote:

Could you please split it and submit one by one commit?

First commit submitted separately as #28. If/when that lands, I'll
file a PR for the next commit, etc. (unless this PR has already
landed in its entirety ;).

Rebased onto master with 7c9efa71926150 now that #28 has landed.
Bottom of that stack submitted separately as #40 in case that
continues to help review.

@wking
Contributor
wking commented Oct 15, 2016

I've pushed 1926150ee44a09 which:

  • Checks the result of the DirEngine.Put cleanup Remove, and returns a wrapped error if both the Put and the cleanup Remove fail.
  • Adds a comment describing the public DirDelete function.
@wking
Contributor
wking commented Oct 15, 2016

I've pushed ee44a094dfe06f fixing a few more lint issues with the
tip commit:

  • Cleanup Remove error wrapping in the cas DirEngine.Put (previously
    I'd just fixed the refs DirEngine.Put).
  • Check the io.Copy error status in cas DirEngine.Put.
  • Don't mask err in cas TarEngine.Put now that it's using blobPath.
@cyphar

This is nice. I've been playing with it (the parts that compile) and it's very promising. The only request I have is that we add some documentation about how all of these tools will fit together to allow for the creation of an image (it's a bit complicated since you need to combine ~5 different tools in order for it to work).

/oci-create-runtime-bundle
-/oci-unpack
@cyphar
cyphar Oct 24, 2016 Member

Why did you remove this from .gitignore?

@wking
wking Oct 24, 2016 Contributor

It's down below, I just alphabetized it.

+ }
+ defer engine.Close()
+
+ return engine.List(ctx, "", -1, 0, state.printName)
@cyphar
cyphar Oct 24, 2016 Member

For me this fails with

../../.local/src/image-tools/cmd/oci-refs/list.go:69: cannot use state.printName (type func("image-tools/vendor/golang.org/x/net/context".Context, string) error) as type refs.ListNameCallback in argument to engine.List

Isn't Go an amazing language? They didn't have vendoring so people rolled their own, then they implemented an incompatible version that breaks if the wind blows in the wrong direction. Absolutely amazing.

@wking
wking Oct 24, 2016 Contributor

Strange. Travis and I both compile this successfully. I'll try and dig into this today.

@cyphar
cyphar Oct 24, 2016 edited Member

In which case, I know why it's happening. It's because I have my gopath setup so that GOPATH=~ and then in ~/src/ I have symlinks like github.com/opencontainers -> ~/src which basically keeps ~/src less insane. Unfortunately this is not the first time I've had problems. sigh Don't worry about it, it's an issue on my end (though I maintain it's a Go compiler issue).

@cyphar
cyphar Oct 25, 2016 Member

I figured it out and opened a PR. #52 fixes the issue.

+ return err
+ }
+
+ return engine.Put(ctx, state.name, &descriptor)
@cyphar
cyphar Oct 24, 2016 Member

This fails with

../../.local/src/image-tools/cmd/oci-refs/put.go:80: cannot use &descriptor (type *"image-tools/vendor/github.com/opencontainers/image-spec/specs-go".Descriptor) as type *"github.com/opencontainers/image-tools/vendor/github.com/opencontainers/image-spec/specs-go".Descriptor in argument to engine.Put
@wking
Contributor
wking commented Oct 24, 2016

On Mon, Oct 24, 2016 at 05:58:26AM -0700, Aleksa Sarai wrote:

The only request I have is that we add some documentation about how all of these tools will fit together to allow for the creation of an image (it's a bit complicated since you need to combine ~5 different tools in order for it to work).

We don't have sufficient tools to create an image until something like #8 lands. But I agree that having overview docs is good for providing an entry-point for new users. I have something like that in this PR with oci-image-tools(7) and there are similar README changes in flight with #48. If you want something besides those or you want them extended, can you be more specific about what you're looking for?

@cyphar
Member
cyphar commented Oct 24, 2016

@wking While #48 is good atm, I'd really like to also have just a shell session which shows that you can create an image with just this tooling (including #8 once it's merged):

% oci-create-layer base root | oci-cas put image
% rm -r base; mv root base; cp -r base root # this can be replaced if we implement #8 using go-mtree.
% # some more create-layer commands
% # some oci-refs commands to show adding refs
% # tar up the image to give you an image tar

Then provide some examples of validating and finally unpacking the image we've created (along with the runtime config.json. The only thing I can currently see that might be missing from the above is a way to set runtime metadata for image layers. TBH, I'm not completely familiar with the image-spec but AFAICS that required tooling is still missing. And by having an example script which creates a full image, it's much more clear what tooling needs to be added to the toolking -- as well as being a very useful guide for people who want to use the tooling.

@wking
Contributor
wking commented Oct 24, 2016

On Mon, Oct 24, 2016 at 12:46:14PM -0700, Aleksa Sarai wrote:

While #48 is good atm, I'd really like to also have just a shell session which shows that you can create an image with just this tooling (including #8 once it's merged):

I don't talk about the full image-creation logic (because you can't do that without something like #8), but there are currently a few shell-sessions in the docs (e.g. here and here). Do you have shell lines you'd like to see that aren't covered in those examples?

@cyphar
Member
cyphar commented Oct 24, 2016

@wking We can add the examples after this PR and #8 is merged. I can't give you a verbatim shell session because I can't compile this PR at the moment 😉 but I would like to have a single shell session inside the man page for oci-image-tool which shows the process for going from nothing to a full OCI image using our tooling. Do you understand what I mean? The examples you linked only use single tools, we need to show how the tooling works together.

@wking
Contributor
wking commented Oct 24, 2016

On Mon, Oct 24, 2016 at 01:42:11PM -0700, Aleksa Sarai wrote:

We can add the examples after this PR and #8 is merged… I would like to have a single shell session inside the man page for oci-image-tool which shows the process for going from nothing to a full OCI image using our tooling.

That makes sense, and I'm happy to come back and do this after this PR and #8 (or something like them) have landed.

… we need to show how the tooling works together.

Combinatorics make this expensive to do in general, but going from nothing to an image and then to a runtime bundle is a a limited enough target (once we can actually complete that process).

image/cas/layout/dir.go
+ return nil, err
+ }
+
+ engine.temp = filepath.Join(path, "tmp")
@cyphar
cyphar Oct 25, 2016 Member

This isn't deleted when I run oci-cas. Should there be a defer statement somewhere (not here obviously but somewhere else)? Maybe add a cleanup method for DirEngine and defer it in the oci-cas caller?

@cyphar
cyphar Oct 25, 2016 Member

Also (bit of a pedantic thing) IMO this should be a randomly generated name because you don't want to intentionally make collisions bad (for example two people modifying an image at the same time). Or if an image setup failed and we didn't clean up, we should avoid messing with broken state.

@wking
wking Oct 25, 2016 edited Contributor

There is a cleanup method (Close), but closing one CAS/ref engine rooted in this directory doesn't mean we have closed all such engines and can remove this directory. If we give the directory a random name we could remove it in Close. Do you mind having multiple random tmp-****** siblings with random blob-****** content?

[edit: Fixed Delete (which removes a blob) → Close (which closes the engine) for the cleanup method].

@cyphar
cyphar Oct 25, 2016 Member

Yeah, I think that tmp-***** siblings is a better way than just tmp. Cheers.

@wking
wking Oct 25, 2016 Contributor

Updated from tmp to tmp-****** with 4dfe06f512176d.

@cyphar
Member
cyphar commented Oct 26, 2016 edited

@wking I've been playing with this for a bit and have noticed that oci-refs is a bit ... ugly to deal with. Namely, it gives you the reference JSON not the blob that is being referenced. What do you think of adding a --deref or --follow flag which will take the descriptor in refs/ and retrieve the blob? The current implementation is really just a wrapper around rm, cat and ls -- IMO it should have "smarter" features.

EDIT: If you want, I can implement this after this PR is merged.

@cyphar
Member
cyphar commented Oct 26, 2016

Also, oci-refs really should be able to just take a digest and then figure out what the digest is. I was trying to generate images and this part was quite cumbersome -- in order to create a ref I need to by hand create these two things:

  1. A modified manifest by hand which I feed into oci-cas put. This will be handled by #59.
  2. A modified descriptor by hand, which I feed into oci-ref put. This should be handled by oci-ref. since every object has a mediaType it shouldn't be that hard to figure out what the target media type is. I just realised that there's a bug in the spec: a descriptor has a mediaType that isn't itself (which makes sense but it means that a descriptor referring to a descriptor will not be able to automatically figure out what the target type is from mediaType).

Do you think we should implement 2 before this PR lands or after? As above, I'd be happy to help out with a PR.

@wking
Contributor
wking commented Oct 26, 2016

On Wed, Oct 26, 2016 at 06:53:00AM -0700, Aleksa Sarai wrote:

Also, oci-refs really should be able to just take a digest and then figure out what the digest is.

I disagree, and would much rather avoid baking type-detection heuristics into the tooling. See related discussion in opencontainers/image-spec#411, especially these comments.

  1. A modified descriptor by hand, which I feed into oci-ref put. This should be handled by oci-ref. since every object has a mediaType it shouldn't be that hard to figure out what the target media type is. I just realised that there's a bug in the spec: a descriptor has a mediaType that isn't itself (which makes sense but it means that a descriptor referring to a descriptor will not be able to automatically figure out what the target type is from mediaType).

And the config object doesn't have a media type either. And layer objects aren't even JSON. There may be UIs we can build that make descriptor initialization easier, but I don't think we want media-type guessing based on blob inspection to be one of them.

@cyphar
Member
cyphar commented Oct 26, 2016

I disagree, and would much rather avoid baking type-detection heuristics into the tooling.

I don't really think that "look at the mediaType" counts as heuristics. But if you're planning on removing them my honest opinion is that it's a bad idea because it means that you cannot possibly know what a blob represents unless you have the entire image (everything becomes contextual).

And the config object doesn't have a media type either. And layer objects aren't even JSON. There may be UIs we can build that make descriptor initialization easier, but I don't think we want media-type guessing based on blob inspection to be one of them.

When will there be a reference to a non-descriptor? Is that even valid in the spec?

@wking
Contributor
wking commented Oct 26, 2016

On Wed, Oct 26, 2016 at 10:10:37AM -0700, Aleksa Sarai wrote:

I disagree, and would much rather avoid baking type-detection
heuristics into the tooling.

I don't really think that "look at the mediaType" counts as
heuristics.

It assumes that the blob you're looking at is JSON that can unmarshal
into Versioned (or @vbatt's new MediaTyped,
opencontainers/image-spec#411). I'd rather not make that assumption.

But if you're planning on removing them my honest opinion is that
it's a bad idea because it means that you cannot possibly know what
a blob represents unless you have the entire image (everything
becomes contextual).

You don't need an entire image, you just need a descriptor referencing
the blob in question. We SHOULD descriptors for all blob references
1, so I'm comfortable relying on that. Do you have a workflow where
you're likely to come across a digest but not have a full
descriptor?

And the config object doesn't have a media type either. And layer
objects aren't even JSON. There may be UIs we can build that make
descriptor initialization easier, but I don't think we want
media-type guessing based on blob inspection to be one of them.

When will there be a reference to a non-descriptor? Is that even
valid in the spec?

All reference payloads are descriptors [2,3], and those descriptors
can point at whatever you want 4.

@cyphar
Member
cyphar commented Oct 29, 2016

On 10/27/2016 04:23 AM, W. Trevor King wrote:

On Wed, Oct 26, 2016 at 10:10:37AM -0700, Aleksa Sarai wrote:

I disagree, and would much rather avoid baking type-detection
heuristics into the tooling.

I don't really think that "look at the mediaType" counts as
heuristics.

It assumes that the blob you're looking at is JSON that can unmarshal
into Versioned (or @vbatt's new MediaTyped,
opencontainers/image-spec#411). I'd rather not make that assumption.

But tar+gzip has a magic header, which is unique. So there's no issue of
"I don't know what this binary blob is". Not to mention that all JSON
blobs we have start with '{'.

But if you're planning on removing them my honest opinion is that
it's a bad idea because it means that you cannot possibly know what
a blob represents unless you have the entire image (everything
becomes contextual).

You don't need an entire image, you just need a descriptor referencing
the blob in question. We SHOULD descriptors for all blob references
[1], so I'm comfortable relying on that. Do you have a workflow where
you're likely to come across a digest but not have a full
descriptor?

"The entire image" == "A section of the image's DAG that is greater than
just the blob". If you can't effectively figure out the type of a JSON
blob you get from oci-cas then the oci-cas tooling is useless.

Maybe I'm misunderstanding the purpose of this tooling, but from my
perspective this should be tooling designed to make the spec not get in
the way of people trying to create images
. I'm going to end up writing
a wrapper for all of this, and I'm currently wondering whether I should
just reimplement all of this instead of reusing it (because it looks
like the goals of this project don't really match what I would be
looking for in a useful image toolkit).

And the config object doesn't have a media type either. And layer
objects aren't even JSON. There may be UIs we can build that make
descriptor initialization easier, but I don't think we want
media-type guessing based on blob inspection to be one of them.

When will there be a reference to a non-descriptor? Is that even
valid in the spec?

All reference payloads are descriptors [2,3], and those descriptors
can point at whatever you want [4].

My reading of that line is that currently it points to an
image-manifest. And it may in future change to something else. I
don't read that line as "a valid image can make a reference point to
literally anything" -- because I don't see how that would be useful from
an image PoV (not to mention that oci-create-runtime-bundle would
probably break in those cases).

I'm trying to make the point that in a valid image that this tooling can
deal with, references will point to image manifests. Those image
manifests will point to layers and configs. Now, the config isn't a
self-describing blob but that's not a problem we can solve right now
(well, we actually could -- by adding a new field that skopeo can
remove -- but then the reference hashes would change between OCI and
Docker). But that's all separate issues.

I don't see why we should be going backwards -- self-describing blobs
make things so much easier for consumers.

Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/

@wking
Contributor
wking commented Oct 29, 2016

If you can't effectively figure out the type of a JSON blob you get from oci-cas then the oci-cas tooling is useless.

oci-cas abstracts away the details of the CAS engine. It could be a tar-backed image-layout 1.0.0, or a directory-backed image-layout 1.0.0, or anything you like as long as you can implement cas.Engine over it. That seems useful to me in its own right. Folks who want peek-inside media-type detection (or anything else) can build on top of the CAS tooling.

I don't read that line as "a valid image can make a reference point to literally anything" -- because I don't see how that would be useful from an image PoV (not to mention that oci-create-runtime-bundle would probably break in those cases).

I see image-layout as a generic mutable refs + CAS format that happens to be useful for storing images which, along with many other things, fit into the refs + CAS model well. Keeping refs truly generic makes image extension easy, since users can reference docs, or source code, or their custom manifest format, or whatever. I agree that generic OCI tooling will not be able to validate/unpack these extensions, but that's fine. --strick validation could error out if any such extensions are found.

But I think CAS and ref engine abstractions are a useful base on top of which these domain-specific tools can be built.

@wking
Contributor
wking commented Nov 3, 2016

When this PR was opencontainers/image-spec#159, @stevvooe pushed back
against the O(N) tar-entry lookup 1, but pointed out that it's hard
to avoid if we handle compressed image-layout tarballs (which we don't
at the moment, but might want to support in the future). I said I
thought it was fine 2, but the issue is still up in the air 3.
Recently, @cyphar floated some similar tar handling in
containers/image#148 which used an offest cache in tarCache. @mtrmac
liked the idea, but feels that it's too brittle to use with the
current stdlib's archive/tar.Reader implementation 4. Do any
maintainers want to weigh in on whether the current O(N) tar-entry
lookups are ok here, or if @cyphar's offset approach is sufficient, or
if this is blocked on some other way around the O(N) problem?

@wking
Contributor
wking commented Nov 3, 2016

On Thu, Nov 03, 2016 at 01:35:59PM -0700, W. Trevor King wrote:

… some other way around the O(N) problem?

Note that this could be “unpack the tarball to a directory and use the
usual directory-based engine on that”. This has a higher disk cost
(unless you're unpacking the original tarball on the fly as you read
it off the network) but makes all the other tar-support-sucks issues
go away ;). I'm willing to do that with the “tar-backed engine” (and
rebuild a new tarball on .Close()?), but again, I'm fine with the
current O(N) tar support and it would be nice to have maintainers
weigh in on the desired approach.

@cyphar
Member
cyphar commented Nov 4, 2016

For those who are interested in imagectl (which I've currently just called umoci as a working name), I've written up a quick design doc on what I expect the final UX to look like (and how it will be implemented). Note that I haven't actually started work on the code yet for multiple reasons (university exams, as well as the fact that go-mtree and this project aren't ready yet).

@wking wking referenced this pull request in cyphar/umoci Nov 6, 2016
@cyphar cyphar image: cas: implement CAS engine
Signed-off-by: Aleksa Sarai <asarai@suse.com>
667271b
wking added some commits Jun 17, 2016
@wking wking image/cas: Add a generic CAS interface
And implement that interface for tarballs based on the specs
image-layout.  I plan on adding other backends later, but this is
enough for a proof of concept.

Also add a new oci-cas command so folks can access the new read
functionality from the command line.

In a subsequent commit, I'll replace the image/walker.go functionality
with this new API.

The Context interface follows the pattern recommended in [1], allowing
callers to cancel long running actions (e.g. push/pull over the
network for engine implementations that communicate with a remote
store).

blobPath's separator argument will allow us to use
string(os.PathSeparator)) once we add directory support.

[1]: https://blog.golang.org/context

Signed-off-by: W. Trevor King <wking@tremily.us>
fbae6d9
@wking wking image/refs: Add a generic name-based reference interface
And implement that interface for tarballs based on the specs
image-layout.  I plan on adding other backends later, but this is
enough for a proof of concept.

Also add a new oci-refs command so folks can access the new read
functionality from the command line.

The Engine.List interface uses a callback instead of returning
channels or a slice.  Benefits vs. returning a slice of names:

* There's no need to allocate a slice for the results, so calls with
  large (or negative) 'size' values can be made without consuming
  large amounts of memory.

* The name collection and processing can happen concurrently, so:
  * We don't waste cycles collecting names we won't use.
  * Slow collection can happen in the background if/when the consumer
    is blocked on something else.

The benefit of using callbacks vs. returning name and error channels
(as discussed in [1]) is more of a trade-off.  Stephen Day [2] and JT
Olds [3] don't like channel's internal locks.  Dave Cheney doesn't
have a problem with them [4].  Which approach is more efficient for a
given situation depends on how expensive it is for the engine to find
the next key and how expensive it is to act on a returned name.  If
both are expensive, you want goroutines in there somewhere to get
concurrent execution, and channels will help those goroutines
communicate.  When either action is fast (or both are fast), channels
are unnecessary overhead.  By using a callback in the interface, we
avoid baking in the overhead.  Folks who want concurrent execution can
initialize their own channel, launch List in a goroutine, and use the
callback to inject names into their channel.

In a subsequent commit, I'll replace the image/walker.go functionality
with this new API.

I'd prefer casLayout for the imported package, but Stephen doesn't
want camelCase for package names [5].

[1]: https://blog.golang.org/pipelines
[2]: opencontainers/image-spec#159 (comment)
[3]: http://www.jtolds.com/writing/2016/03/go-channels-are-bad-and-you-should-feel-bad/
[4]: https://groups.google.com/d/msg/golang-nuts/LM648yrPpck/idyupwodAwAJ
     Subject: Re: [go-nuts] Re: "Go channels are bad and you should feel bad"
     Date: Wed, 2 Mar 2016 16:04:13 -0800 (PST)
     Message-Id: <c8e4433a-53c0-4ee6-9dc5-98f62eea06d2@googlegroups.com>
[5]: opencontainers/image-spec#159 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
1a2c78f
@wking wking specs-go: Add ImageLayoutVersion and check oci-layout in tar engines
Collect the shared stuff in the image/layout utility package.

Signed-off-by: W. Trevor King <wking@tremily.us>
b27e428
@wking wking image/layout/tar.go: Add TarEntryByName
Making the CAS/refs Get implementations more DRY.

Signed-off-by: W. Trevor King <wking@tremily.us>
9263d6e
@wking wking image/cas/put: Add a PutJSON helper
A fair amount of image setup involves pushing JSON objects to CAS, so
provide a convenient wrapper around that.  This implementation could
be improved, with at least:

* Consistent key sorts, etc. to increase the chances of matching an
  existing CAS blob.
* Streaming the marshaled JSON into the engine to avoid serializing it
  in memory before passing it into Engine.Put.

But the API is fine, and we can improve the implementation as we go.

Signed-off-by: W. Trevor King <wking@tremily.us>
fe8f7cd
@wking wking vendor: Bundle golang.org/x/net/context
Generated with:

  $ make install.tools
  $ make update-deps
  $ git checkout HEAD -- vendor/github.com/spf13/pflag
  $ git checkout HEAD -- vendor/github.com/runtime-spec

I rolled back the other changes because I haven't checked for
compatibility issues due to the upgrades.  I also rolled back the hash
bumps for those packages in glide.lock.

Signed-off-by: W. Trevor King <wking@tremily.us>
753408b
@wking wking image/*/interface: Add unstable warnings to Engines 8c55e0c
@wking wking image/cas: Implement Engine.Put
This is a bit awkward.  For writing a tar entry, we need to know both
the name and size of the file ahead of time.  The implementation in
this commit accomplishes that by reading the Put content into a
buffer, hashing and sizing the buffer, and then calling
WriteTarEntryByName to create the entry.  With a filesystem-backed CAS
engine, we could avoid the buffer by writing the file to a temporary
location with rolling hash and size tracking and then renaming the
temporary file to the appropriate path.

WriteTarEntryByName itself has awkward buffering to avoid dropping
anything onto disk.  It reads through its current file and writes the
new tar into a buffer, and then writes that buffer back back over its
current file.  There are a few issues with this:

* It's a lot more work than you need if you're just appending a new
  entry to the end of the tarball.  But writing the whole file into a
  buffer means we don't have to worry about the trailing blocks that
  mark the end of the tarball; that's all handled transparently for us
  by the Go implementation.  And this implementation doesn't have to
  be performant (folks should not be using tarballs to back
  write-heavy engines).

* It could leave you with a corrupted tarball if the caller dies
  mid-overwrite.  Again, I expect folks will only ever write to a
  tarball when building a tarball for publishing.  If the caller dies,
  you can just start over.  Folks looking for a more reliable
  implementation should use a filesystem-backed engine.

* It could leave you with dangling bytes at the end of the tarball.  I
  couldn't find a Go invocation to truncate the file.  Go does have an
  ftruncate(2) wrapper [1], but it doesn't seem to be exposed at the
  io.Reader/io.Writer/... level.  So if you write a shorter file with
  the same name as the original, you may end up with some dangling
  bytes.

cas.Engine.Put protects against excessive writes with a Get guard;
after hashing the new data, Put trys to Get it from the tarball and
only writes a new entry if it can't find an existing entry.  This also
protects the CAS engine from the dangling-bytes issue.

The 0666 file modes and 0777 directory modes rely on the caller's
umask to appropriately limit user/group/other permissions for the
tarball itself and any content extracted to the filesystem from the
tarball.

The trailing slash manipulation (stripping before comparison and
injecting before creation) is based on part of libarchive's
description of old-style archives [2]:

  name
    Pathname, stored as a null-terminated string.  Early tar
    implementations only stored regular files (including hardlinks to
    those files).  One common early convention used a trailing "/"
    character to indicate a directory name, allowing directory
    permissions and owner information to be archived and restored.

and POSIX ustar archives [3]:

  name, prefix
    ... The standard does not require a trailing / character on
    directory names, though most implementations still include this
    for compatibility reasons.

[1]: https://golang.org/pkg/syscall/#Ftruncate
[2]: https://github.com/libarchive/libarchive/wiki/ManPageTar5#old-style-archive-format
[3]: https://github.com/libarchive/libarchive/wiki/ManPageTar5#posix-ustar-archives

Signed-off-by: W. Trevor King <wking@tremily.us>
7ea54f2
@wking wking image: Refactor to use cas/ref engines instead of walkers
The validation/unpacking code doesn't really care what the reference
and CAS implemenations are.  And the new generic interfaces in
image/refs and image/cas will scale better as we add new backends than
the walker interface.  This replaces the simpler interface from
image/reader.go with something more robust.

The old tar/directory distinction between image and imageLayout is
gone.  The new CAS/refs engines don't support directory backends yet
(I plan on adding them once the engine framework lands), but the new
framework will handle tar/directory/... detection inside
layout.NewEngine (and possibly inside a new (cas|refs).NewEngine when
we grow engine types that aren't based on image-layout).

Also replace the old methods like:

  func (d *descriptor) validateContent(r io.Reader) error

with functions like:

  validateContent(ctx context.Context, descriptor *specs.Descriptor, r io.Reader) error

to avoid local types that duplicate the image-spec types.  This saves
an extra instantiation for folks who want to validate (or whatever) a
specs.Descriptor they have obtained elsewhere.

I'd prefer casLayout and refsLayout for the imported packages, but
Stephen doesn't want camelCase for package names [1].

[1]: opencontainers/image-spec#159 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
7bd8bcf
@wking wking .tool/lint: Ignore dupl complaints for cmd/oci-*/get.go
Don't worry about:

  $ make lint
  checking lint
  image/cas/layout/dir.go:37:⚠️ duplicate of image/refs/layout/dir.go:37-54 (dupl)
  image/refs/layout/dir.go:37:⚠️ duplicate of image/cas/layout/dir.go:37-54 (dupl)
  cmd/oci-cas/delete.go:41:⚠️ duplicate of cmd/oci-cas/get.go:43-62 (dupl)
  cmd/oci-refs/delete.go:41:⚠️ duplicate of cmd/oci-refs/get.go:42-61 (dupl)
  cmd/oci-cas/delete.go:15:⚠️ duplicate of cmd/oci-refs/delete.go:15-72 (dupl)
  cmd/oci-refs/delete.go:15:⚠️ duplicate of cmd/oci-cas/delete.go:15-72 (dupl)
  cmd/oci-cas/get.go:43:⚠️ duplicate of cmd/oci-refs/get.go:42-61 (dupl)
  cmd/oci-refs/get.go:42:⚠️ duplicate of cmd/oci-cas/get.go:43-62 (dupl)
  make: *** [lint] Error 1

The commands are all similar (open an engine, perform some method,
print the result), but are short enough that extracting out helpers
would be more trouble and indirection than it's worth.

Oddly, dupl seems happy to print:

  "duplicate of oci-cas/get.go:..."

and

  "duplicate of get.go:..."

if I exclude:

  "duplicate of cmd/oci-cas/get.go:..."

or

  "duplicate of .*oci-cas/get.go:..."

I want to get "oci-cas" and "oci-refs" in the exclusion regular
expression somewhere to avoid accidentally skipping dupl checks for
other get.go and similar if they show up somewhere else in the
repository, so I'm matching on the initial filename.

Signed-off-by: W. Trevor King <wking@tremily.us>
2564e3d
@wking wking image/layout/tar: Add a CreateTarFile helper
The NewEngine commands for the tar-backed image-layout engines (both
the CAS and refs engines) open files O_RDWR and expect image-layout
compatible content in the tarball.  That makes sense, but for folks
who *don't* have such a tarball, a helper like CreateTarFile makes it
easy to explicitly create an empty one.

The 0666 file modes and 0777 directory modes rely on the caller's
umask to appropriately limit user/group/other permissions for the
tarball itself and any content extracted to the filesystem from the
tarball.

The trailing slashes are based on part of libarchive's description of
old-style archives [1]:

  name
    Pathname, stored as a null-terminated string.  Early tar
    implementations only stored regular files (including hardlinks to
    those files).  One common early convention used a trailing "/"
    character to indicate a directory name, allowing directory
    permissions and owner information to be archived and restored.

and POSIX ustar archives [2]:

  name, prefix
    ... The standard does not require a trailing / character on
    directory names, though most implementations still include this
    for compatibility reasons.

Expose this new functionality on the command line as:

  $ oci-image-init image-layout PATH

where 'image-layout' is a separate level in case we support
initializing additional types of repositories in the future.

[1]: https://github.com/libarchive/libarchive/wiki/ManPageTar5#old-style-archive-format
[2]: https://github.com/libarchive/libarchive/wiki/ManPageTar5#posix-ustar-archives

Signed-off-by: W. Trevor King <wking@tremily.us>
5c748c9
@wking wking image/refs: Implement Engine.Put
This is pretty straightforward with the new WriteTarEntryByName
helper.

I considered pulling the ref name -> path conversion (%s -> ./refs/%s)
out into a helper function to stay DRY, but the logic is simple enough
that it seemed more intuitive to leave it inline.

Signed-off-by: W. Trevor King <wking@tremily.us>
d2c026b
@wking wking cmd: Document the cas, refs, and init commands
Most of this is new boilerplate, but oci-image-tools.7.md is based on
the old oci-image-tool.1.md removed by fe363aa (*: move to
opencontainers/image-tools, 2016-09-15).  There's a lot going on in
this repo, and it's nice to have a page that outlines everything
provided by the project even though we're no longer providing a single
command.

Signed-off-by: W. Trevor King <wking@tremily.us>
94f5137
@wking wking image: Add image-layout directory based CAS and ref engines
These are much nicer than the tar engines (hooray atomic renames :),
so switch the manifest tests tests back to using the directory-backed
engines.  I also switched the man-page examples over to
directory-backed layouts, now that they're what oci-image-init
generates by default.  And I added command-line wrappers for the
delete methods, now that we have a backend that implements it.

I do with there was a paginated, callback-based directory lister we
could use instead of ioutils.ReadDir.  On the other hand, by the time
directories get big enough for that to matter we may be sharding them
anyway.

Signed-off-by: W. Trevor King <wking@tremily.us>
ca90284
@stevvooe
Contributor

@cyphar Sorry for the slow cycle, but it looks like that link is broken.

Shall we take this design into an issue and close this PR?

@wking
Contributor
wking commented Nov 18, 2016

On Fri, Nov 18, 2016 at 12:11:25PM -0800, Stephen Day wrote:

@cyphar Sorry for the slow cycle, but it looks like that link is broken.

He removed it in cyphar/umoci@8d19f36. The last version before that
removal is in 1.

Shall we take this design into an issue and close this PR?

I'm happy to discuss any design differences between what I have here
and the choices @cyphar has made in umoci. I don't think ditching all
of this is the best way to have that discussion ;). Last night saw
some progress on #40, maybe you want to chime in there on any issues
you see with the CAS engine interface or the tar-get implementation?

@cyphar
Member
cyphar commented Nov 19, 2016

@stevvooe I've removed the file because all of the stuff in the design document has since been implemented in umoci and the design document no longer accurately described how everything is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment