specs-go: clarify mediatypes #411

Open
wants to merge 1 commit into
from

Projects

None yet

5 participants

@vbatts
Member
vbatts commented Oct 21, 2016 edited

Due to the conflicting use of the mediatType field across
documents, and after discussion on
#411,
this changeset removes the use of mediaType where it is used to refers
to a document's own type. Leaving only the use of mediaType for
descriptors, where it is used to describe the type of a referenced object.

@@ -75,6 +75,7 @@ type History struct {
}
// Image is the JSON structure which describes some basic information about the image.
+// This provides the `application/vnd.oci.image.config.v1+json` mediatype when marshalled to JSON.
@jonboulle
jonboulle Oct 21, 2016 Contributor

It is annoying that this one is inconsistent. Can't we safely add a media type field to it?

@vbatts
vbatts Oct 21, 2016 Member

i was thinking the same. Especially since the field is omitEmpty for JSON. Perhaps make it optional with a SHOULD, so it will have compat with older docker? Thoughts @stevvooe ?

@vbatts
vbatts Oct 21, 2016 Member

(to that end, also the oci-layout file, and it doesn't even have a mediatype assigned for it)

@wking
wking Oct 21, 2016 Contributor

On Fri, Oct 21, 2016 at 07:12:12AM -0700, Vincent Batts wrote:

(to that end, also the oci-layout file, and it doesn't even have a mediatype assigned for it)

+1 to assigning a media type to oci-layout. I don't expect to find it in CAS, but image-layout directories might be accessed over HTTP, and it would be strange to return application/json for it when all of our other schemas have more specific types.

On Fri, Oct 21, 2016 at 06:59:20AM -0700, Jonathan Boulle wrote:

It is annoying that this one is inconsistent. Can't we safely add a media type field to it?

This is going the wrong way. If folks want to look inside inside the blob and try to guess its type (e.g. by unmarshalling into Versioned), they can do that. But I don't think we should require anyone to look inside the blob to figure out what it is. Descriptor references tell you the media type ahead of time, and we should be using those to identify blob types, and that approach places no restrictions on the blob itself. So while we may want to keep the Versioned fields for backwards compat with Docker, I don't think we want to extend that approach to additional structures.

@stevvooe
stevvooe Oct 21, 2016 Contributor

@jonboulle In general, we should not actually be embedding mediaTypes in the target types. The mediaType is a lens to the data. Really, we should remove them from the types.

@vbatts
vbatts Nov 3, 2016 Member

@stevvooe woah. The mediatype is as necessary as a document providing the version or schema that it represents. Why have any of them, if this is the case?
Regardless, this leaves a mess for implementations to require a big case statement for reflect and discovery what kind of document and version is even being dealt with.

@wking
wking Nov 3, 2016 edited Contributor

The mediatype is as necessary as a document providing the version or schema that it represents.

Agreed. But if you know the media type from the referencing descriptor, why repeat it in the blob itself? The only case I've seen is for outside-of-CAS reference like verification, where the caller presumably knows the media type they're trying to verify but doesn't feel like typing it out. I'm ok with providing a self-describing mediaType to save them that effort, but don't feel like we have to provide it.

@cyphar
cyphar Nov 3, 2016 edited Member

@wking Because you cannot implement a generic tool to mutate a blob unless there's a self-describing aspect to it (cat blob | some-tool --add-option won't work properly because you cannot deal with multiple versions and also error handling). And since oci-refs and oci-cas appear to be on the path to "images are just a CAS" you need to have that.

@wking
wking Nov 3, 2016 Contributor

cat blob | some-tool --add-option won't work properly because you cannot deal with multiple versions and also error handling

But:

$ DESCRIPTOR="$(oci-refs get some-image v1.0)"
$ DIGEST="$(echo "${DESCRIPTOR}" | jq .digest)"
$ MEDIA_TYPE="$(echo "${DESCRIPTOR}" | jq .mediaType)"
$ oci-cas get some-image "${DIGEST}" | some-tool --add-option --media-type "${MEDIA_TYPE}"

works fine. And the only case I can think of where you aren't getting to the digest by walking down from a ref is still validating a blob before you push it into CAS. And there you presumably know the type, and just want to be saved from typing it out on the command line (and as I've said, I'm ok keeping a self-describing mediaType if/where we want to support this).

And since oci-refs and oci-cas appear to be on the path to "images are just a CAS" you need to have that.

I think “images can be stored in CAS as a Merkle tree” is true, but I don't think they're just a set of Merkle blobs. Otherwise we could have written descriptor.md and image-layout.md and skipped manifest.md, manifest-list.md, layer.md, etc. CAS (and refs) are one level in the implementation, but there are additional, higher levels with more domain-specific spec. I'd rather have image-tools provide APIs and tools for handling images at all of these levels.

@cyphar
cyphar Nov 3, 2016 Member

oci-cas get some-image "${DIGEST}" | some-tool --add-option --media-type "${MEDIA_TYPE}"

IMO that's ridiculous. You now need to carry around the media type of an object separate to the object. Why? What is the benefit? Why not just follow along with literally every other format that exists and make it self-describing? Why has this discussion been going on for so long?

@wking
wking Nov 3, 2016 Contributor

Why not just follow along with literally every other format that exists and make it self-describing?

The only time I can see where we'd want this to be the recommended method for typing a blob is in signed assertions. For everything else, I'd rather have:

The descriptor that sent me here said this was a application/vnd.oci.image.manifest.v1+json, so I'll attach it to my manifest handler…

Instead of:

Lets see, does it have ustar\000 at offset 257? No? Good, because I'm not sure how I would have figured out if that was using the .wh.* whiteout handling or the new [static] whiteout handling (#24). Do the first four bytes match 00 00 00 xx? No? Ok, not UTF-32BE JSON. What about 00 xx 00 xx? … Maybe the first byte is {? Ok, that sounds like UTF-8 JSON. Let me unmarshal it into MediaTyped. That worked! And it has a value from mediaType! It says it is application/vnd.oci.image.manifest.v1+json, so I'll seek back to the beginning and attach it to my manifest handler…

And again, I'm just arguing that we shouldn't be using peek-inside typing for image unpacking, etc. I'm ok with us deciding that we want to use it to save keystrokes on pre-CAS-push validation.

@cyphar
cyphar Nov 4, 2016 Member

@wking You're arguing that there's dichotomy between "free for all, no need to have mediaType" and "hueristic so that we can recognise every blob type". I don't think there is one.

The benefit of being able to know what a JSON blob is meant to represent is entirely separate from "I can tell what every blob in the image is without references". If you're not happy with detecting tar files (like file and libmagic do) that's fine. But please let's not make all of our JSON objects meaningless blobs that require jumping through references in order to even understand what we're looking at (or keeping the type information out-of-band).

@wking
wking Nov 4, 2016 Contributor

But please let's not make all of our JSON objects meaningless blobs that require jumping through references in order to even understand what we're looking at…

You're saying “assuming (for some out-of-band reason) that the blob is a JSON object which contains a self-describing mediaType field, we can use that mediaType field to unambiguously identify the content”. That initial assumption is what I'm worried about. If you see cases where you are comfortable making that assumption (for whatever out-of-band reasons), then great, use peek-inside type detection based on the mediaType value. But I'd strongly recommend consumers use the referencing descriptor's mediaType to avoid having to rely on that assumption.

@cyphar
cyphar Nov 4, 2016 edited Member

But I'd strongly recommend consumers use the referencing descriptor's mediaType to avoid having to rely on that assumption.

Your consistent implication that all consumers will have access to the entire image is getting annoying. If I have a tool like oci-do-something which I pipe a JSON object to, I don't expect that it will be reading the repository. In fact, I might have a service that modifies the JSON objects (and therefore actually cannot access the original repo). So you can't "use the referencing descriptor" because there isn't one (that you can see).

Now, you might argue that we should send the out-of-band media type with it. But why should that be a requirement? What are you gaining by removing mediaType?

@wking
wking Nov 4, 2016 Contributor

Now, you might argue that we should send the out-of-band media type with it. But why should that be a requirement?

That's exactly what I'll argue ;). And unless you implement completely generic peek-inside detection (which I don't think anyone's arguing for), you're going to have to transmit some amount of media-type guidance along with your blob content. I'm suggesting that guidance be the media type.

You seem to be suggesting that that guidance be “this blob is a JSON object which contains a self-describing mediaType field”. Maybe you transmit that information because the tool-caller knows the tool can only handle such media types and therefore only feeds matching blobs into the tool. That's how oci-image-validate works, and I'm comfortable with that from a keystroke-saving perspective. However, I don't think we should pretend that this approach is completely free of out-of-band type guidance.

What are you gaining by removing mediaType?

I'm not suggesting we remove mediaType, because some users (e.g. you with oci-do-something, or a number of people with oci-image-validate's autodection) can't be bothered to pass media types around. And I'm fine with that (typing out a long media type is not something I'd like to do repeatedly).

I'm just suggesting image-handling tools follow the spec's SHOULD and use descriptors to reference blob content, with peek-inside type detection being reserved for signed-assertions. And having acquired the media type from the referencing descriptor (or because we authored the blob ourselves), I see no need for image-handling tools to use peek-inside type detection.

Perhaps our difference here is that I see (almost) all tooling as being descriptor-based, while you see the tooling as being isolated-blob based. Since I'm fine leaving existing mediaType entries in place, maybe we can just wait a year to see how that plays out and revisit this discussion then?

specs-go/v1/manifest_list.go
type ManifestList struct {
+ specs.MediaTyped
specs.Versioned
@wking
wking Oct 21, 2016 Contributor

If Versioned contains MediaTyped (which is what you currently have), isn't it redundant to list MediaTyped here?

@vbatts
vbatts Oct 21, 2016 Member

yes, technically. Does not change anything and is very apparent that it is media typed

@stevvooe
Contributor

@vbatts Is this to support partial decode for mediaType sniffing? If so, are there other fields that are common to all types that we want in this decode structure.

@jonboulle
Contributor

Why did we ever do this?

On 21 October 2016 at 20:35, Stephen Day notifications@github.com wrote:

@stevvooe commented on this pull request.

In specs-go/v1/config.go
#411:

@@ -75,6 +75,7 @@ type History struct {
}

// Image is the JSON structure which describes some basic information about the image.
+// This provides the application/vnd.oci.image.config.v1+json mediatype when marshalled to JSON.

@jonboulle https://github.com/jonboulle In general, we should not
actually be embedding mediaTypes in the target types. The mediaType is a
lens to the data. Really, we should remove them from the types.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#411, or mute the
thread
https://github.com/notifications/unsubscribe-auth/ACewN0bFsBrR04esbK8ZVK_IDBWi6X_jks5q2QX7gaJpZM4KdNZp
.

@wking wking referenced this pull request in opencontainers/image-tools Oct 26, 2016
Open

image: Refactor to use cas/ref engines instead of walkers #5

specs-go/v1/descriptor.go
type Descriptor struct {
- // MediaType contains the MIME type of the referenced object.
- MediaType string `json:"mediaType"`
+ specs.MediaTyped
@jonboulle
jonboulle Oct 28, 2016 Contributor

oh, so this is actually wrong because the mediatype here is the REFERENCED object not itself.

@vbatts
vbatts Oct 28, 2016 Member

Yeah, same field name, and opposite intention. :-|

@jonboulle
Contributor

@vbatts can we rather remove media type fields?

@vbatts
Member
vbatts commented Oct 28, 2016

@jonboulle like completely? or just this PR?

@jonboulle
Contributor

@vbatts like completely

@vbatts
Member
vbatts commented Oct 28, 2016 edited

well, honestly I like the self describing nature of it. I wish the descriptor field names could be fixed, such that it could self describe and have a separate field that describes the object pointing to.
Self-describing, while this approach is slightly more expensive, allows for an algorithm like

mt := v1.MediaTyped{}
if err := json.Unmarshal(buf, &mt); err != nil {
  // handle yer biz
}
switch mt.MediaType {
case v1.MediaTypeImageManifest:
  // something specific
default:
  // handle yer biz
}

So we can have discovery without having a map[string]interface{}{}.

@wking
Contributor
wking commented Oct 28, 2016

On Fri, Oct 28, 2016 at 07:07:03AM -0700, Vincent Batts wrote:

well, honestly I like the self describing nature of it.

There was a bit more discussion of this in
opencontainers/image-tools#5 starting with 1. @cyphar was arguing
for this peek-inside auto-detection in the proposed oci-refs because
tracking media types is tedious. I was arguing against peek-inside
auto-detection because it doesn't always work (it requires a JSON blob
whose mediaType is self-describing, so not the current descriptors or
any non-JSON type, e.g. a layer tarball).

We SHOULD descriptors for all blob references 2, so if the initial
publisher references the blob they published (and presumably they
would or future garbage collection would remove it), that descriptor
reference will have media type information for the blob. Anyone else
who stumbles across the same blob either:

a. Created it independently (in which case they should know what it is),
b. Found it by following a descriptor (in which case that descriptor
tells them what it is), or
c. Found it by walking the CAS (e.g. during an interal GC pass).

(c) is the only tricky case, and the solution is to have the GC engine
walk the CAS from pinned descriptors (so it's like (c)) instead of
walking the CAS via readdir(3) on the blob filesystem.

@cyphar
Member
cyphar commented Oct 29, 2016

My main concern is that without mediaType things are properly self-describing -- you have no clue what each object represents unless you've walked from a ref to the object itself. For me, this is just tedious to deal with (as a consumer I shouldn't be caring about the internals of all of the referencing if there's oci-image-tools that should handle this for me and just output the JSON blob that I am trying to look at).

@wking gave the example that mediaType won't on the layer tar+gzip archives. But that isn't helping the case against mediaType, because both tar and gzip have unique headers that identify what they are! There's no ambiguity with regards to whether a layer object is a layer object or not. The only thing that's ambiguous is the JSON blobs we're using which aren't adequately expressing what they represent.

In particular, the fact that descriptors don't say that they're descriptors is quite silly (mediaType is broken in that instance).

@wking
Contributor
wking commented Oct 29, 2016

The tar+gzip headers do not distinguish application/vnd.oci.image.layer.v1.tar+gzip from application/vnd.oci.image.layer.nondistributable.v1.tar+gzip or from a future application/vnd.oci.image.layer.v1.1.tar.

Walking the DAG sounds like an integral part of using CAS. Can you provide more details about what you mean by "just output the JSON blob that I am trying to look at"?

@cyphar
Member
cyphar commented Oct 29, 2016

Walking the DAG sounds like an integral part of using CAS. Can you provide more details about what you mean by "just output the JSON blob that I am trying to look at"?

Lets say I want to modify the config for an image. Currently the way I need to do this is:

  1. Get the reference descriptor using oci-refs get. I then have to parse this JSON to get the right blob for the manifest.
  2. Get the manifest using oci-cas get. I then have to parse this JSON to get the right blob for the config.
  3. I now have the config JSON and I can modify it using jq or whatever.

In order to update the image I now need to:

  1. Push the new modified config and get the new digest, which I do with oci-cas put.
  2. Update the manifest to use the new config, then push it to get a new digest with oci-cas put.
  3. Create a new reference descriptor and push it to get the new reference with oci-refs put.
  4. Clean everything up with the (currently non-existant) oci-gc.

Now, my main point is that the OCI tooling hasn't added anything. The process takes precisely as long as it would using sha256sum, jq, ls, cat and rm! The fact that so many of the things need to be done manually even with having tooling just makes this ridiculous. Imagine if oci-create-runtime-bundle or oci-unpack would only let you extract a single layer at a time -- that would also be useless.

I'm going to end up writing a wrapper for all of this, but I think we should be considering who the target user of these utilities is. Looking at oci-image-init, oci-unpack, oci-validate and oci-create-runtime-bundle I would assume that the target is people who just want to use images without having to understand the intricacies of the specification and how things need to have 15 layers of referencing and double-checking of sizes and so on. But maybe I was mistaken and the intended users are just us? In which case I'll be happy to just go and write the tooling I need and maintain it outside of this project -- it's just a bit of a shame that we're not concrete on who will be using this tooling and what they would want from such tooling.

@wking
Contributor
wking commented Oct 29, 2016

The process takes precisely as long as it would using sha256sum, jq, ls, cat and rm!

But ls, cat, and rm don't abstract away the CAS registry implementation. And the presence of low-level tools like oci-cas doesn't preclude additional tooling like oci-jq-manifest (or whatever your high-level tweaker looks like). In fact, oci-cas makes it much easier to write a CAS-engine agnostic oci-jq-manifest.

I'm missing the connection between "I'll wrap DAG walking because it's tedious" and "I need to discover media types by inspecting the blob content". In your example above you start with oci-refs get, so you'll be following descriptors (which tell you the media type before you have the blob content) all the way down.

@cyphar
Member
cyphar commented Oct 30, 2016

@wking My point will be more clear if I just make my wrapping tool. I'm working on it now, but in order to handle it properly I need to merge this PR into my branch -- so we can just continue with merging this (and we can discuss higher-level handlers later).

@wking
Contributor
wking commented Oct 30, 2016

My point is that I see no need for peek-inside type detection. But the mediaType properties (excepting Descriptor's) will be the same amount of work to remove regardless of this PR, so I don't have a problem with it landing.

@cyphar
Member
cyphar commented Oct 30, 2016 edited

@wking ... You can't have it both ways. In opencontainers/image-tools#5 you want to make the tooling around images as flexible as possible (so that you can support extensions that don't match the spec). That's fine, but you can't then also say that all of the objects should no longer be self-describing. The result is that you cannot use any generic OCI tooling for images, because all of the objects are defined relative to one another and thus you couldn't effectively modify a single object (you'd have to parse the image each time to know what object you're modifying -- and even then you'd need to track the object types implicitly). In other words, you'd have to craft new tooling for every new producer of an OCI image (because they could add extensions that make it impossible to normally parse the image).

Explicit is better than implicit.

@wking
Contributor
wking commented Oct 30, 2016

It's a Merkle DAG, you can never modify a single blob. Peek-inside type detection doesn't change that.

@wking
Contributor
wking commented Oct 30, 2016

Even if you're modifying the root blob, you'd still need to push a new ref.

@stevvooe
Contributor

@cyphar The model for CAS-based engine distribution typically requires referencing an object through type-qualified references, which are called descriptors. I disagree that using CAS in this manner prevents extension, as I don't think OCI could be introduced at all if the concepts defined in manifests and manifest lists didn't work. We have been doing this in compilers and runtime since forever.

The difference is the propagation of the modification. Typically, the process of modifying something requires a walk don't the qualified reference path, then anything that was touched or references something that was touched gets update. This is what happens with this technology (I also suspect this is part of why CAS is not adopted more often). When you touch something, it modifies the "address" of that thing and those changes and their references need to be updated.

I don't know what is going on in opencontainers/image-tools#5, but if CAS access to a tar file is being exposed over CLI, something is going way wrong.

In general, I'd like to see a proposal for the image-tools that are useful. I am not familiar with your work, but it would be helpful if you could identify the problems that have led to a "wrapper" or even how we can incorporate that work into OCI.

@wking
Contributor
wking commented Oct 31, 2016

On Mon, Oct 31, 2016 at 02:22:54PM -0700, Stephen Day wrote:

I don't know what is going on in opencontainers/image-tools#5, but
if CAS access to a tar file is being exposed over CLI, something is
going way wrong.

I disagree, but I think we should probably be discussing this point in
opencontainers/image-tools#5. Can you file your concerns in more
detail over there?

In general, I'd like to see a proposal for the image-tools that are
useful. I am not familiar with your work, but it would be helpful if
you could identify the problems that have led to a "wrapper" or even
how we can incorporate that work into OCI.

+1.

@stevvooe
Contributor

Can you file your concerns in more
detail over there?

I did, then you just opened three more PRs. I have no more time to waste.

@cyphar
Member
cyphar commented Nov 1, 2016 edited

@stevvooe Okay, so here's what I'm working on at the moment (the only reason I haven't published it yet is just because right now I'm doing a bunch of go-mtree related work rather than actually getting imagectl to work at the moment):

The main issue I have with the current incarnation of @wking's oci-cas and oci-refs is that it doesn't appear to me as though the tooling was written with the image-spec in mind. In particular, it looks more like "generic CAS tooling" that just happens to handle the CAS layout of the OCI image spec.

The tooling I would like (and am planning on writing) actually is written around images as a first-class concept. In particular, it lets you modify images rather than blobs. So you can say "modify this field in the config of this image ref" and that will translate to replacing the config, the manifest blob that referenced it and the ref blob that referenced that. Same thing applies to changing the set of layers and so on -- the tooling is actually written around the end-user goal of actually modifying images as a first-class concept.

And in my case, imagectl already knows the types of everything implicitly (because I had to walk through the CAS to get there) -- though that precludes a descriptor telling me what type it points to (or every blob being self-describing). However, the main issue I have with @wking's PR (aside from not being sure why we're exposing tar handling code) is that it exposes a generic CAS interface that isn't actually helpful to anyone unless we're also planning on writing some more tooling that can take our JSON blobs and do useful things with them -- and to do that you need to have self-describing blobs.

@wking
Contributor
wking commented Nov 1, 2016 edited

On Mon, Oct 31, 2016 at 08:22:31PM -0700, Aleksa Sarai wrote:

The main issue I have with the current incarnation of @wking's
oci-cas and oci-refs is that it doesn't appear to me as though
the tooling was written with the image-spec in mind. In particular,
it looks more like "generic CAS tooling" that just happens to handle
the CAS layout of the OCI image spec.

oci-cas is generic CAS tooling. oci-refs is generic mutable-refs
tooling. They're written with image-spec in mind because image-layout
is a generic CAS/refs format (although it's obviously not the only
possible CAS or refs format). The idea in putting a layer of
abstraction between the image-layout tooling and the rest of the
image-spec support is that it lets you adjust image-layout or swap in
some completely different CAS/refs engine without rewriting the
domain-specific logic.

The tooling I would like (and am planning on writing) actually is
written around the spec. In particular, it lets you modify
images rather than blobs. So you can say "modify this field in
the config of this image ref" and that will translate to replacing
the config, the manifest blob that referenced it and the ref blob
that referenced that. Same thing applies to changing the set of
layers and so on -- the tooling is actually written around the
end-user goal of actually modifying images as a first-class
concept
.

This sounds useful to me too. I don't see why having this
higher-level code and command-line interface makes oci-cas or
oci-refs a bad idea.

And in my case, imagectl already knows the types of everything
implicitly (because I had to walk through the CAS to get there) --
though that precludes a descriptor telling me what type it points to
(or every blob being self-describing).

So your imagectl doesn't need to self-describing blobs or a type-aware
CAS engine? I'm glad, although not particularly surprised, since I
expect most CAS consumers will be publishing content with known types
or walking down from a ref.

However, the main issue I have with @wking's PR (aside from not
being sure why we're exposing tar handling code)…

It's not just exposing tar-handling code, it's exposing a generic
CAS/refs handling API. One backend for those CAS and ref stores is
tar, but it also supports directories, and can be extended in the
future to support zip, HTTP, FTP, rsync, etc. 1. It's also possible
that it could be extended to support the Docker registry and other
existing CAS/ref stores.

… is that it exposes a generic CAS interface that isn't actually
helpful to anyone unless we're also planning on writing some more
tooling that can take our JSON blobs and do useful things with them…

I completely agree that we don't want oci-cas and oci-refs to be our
highest-level commands. oci-image-validate, oci-unpack, and
oci-create-runtime-bundle are already higher-level commands. Your
imagectl will be another higher-level command. But the CAS and refs
implementations decouple cleanly from the higher-level,
domain-specific logic. I don't think the presence of the high-level
commands is a reason to not expose the low-level CAS/refs API on the
command-line.

… and to do that you need to have self-describing blobs.

I'm still not understanding this leap. Can you present a workflow
that requires self-describing blobs?

@wking
Contributor
wking commented Nov 1, 2016

On Mon, Oct 31, 2016 at 09:45:02PM -0700, W. Trevor King wrote:

… and to do that you need to have self-describing blobs.

I'm still not understanding this leap. Can you present a workflow
that requires self-describing blobs?

The only situation I've seen where self-describing blobs is useful is
outside-of-CAS validation, where we currently attempt to auto-detect
the type when the user doesn't pass it through to us 1. That is
useful, but it doesn't involve the CAS engine. And I don't think it
is so useful that we need to preserve it going forwards. Do we
expect folks to frequently validate content whose type they don't
know? “Is this something that OCI can autodetect and validate?”
seems like a much less useful question to ask than “Is this a valid
application/vnd.oci.image.manifest.v1+json?”. But still,
autodetection for out-of-CAS validation was useful enough for folks to
add validation support for it. Is that a sufficient use-case for
extending self-describing media types to
application/vnd.oci.image.config.v1+json? How about
application/vnd.oci.descriptor.v1+json or
application/vnd.oci.image.layer.v1.tar+gzip? I don't think so, but I
consider it to be a policy question and not a technical question
(folks do know what they're validating, they just don't want to type
out the media type), so I'm comfortable with image-spec maintainers
ruling either way.

Are there other use-cases for self-describing blobs that do make this
a technical requirement?

@stevvooe
Contributor
stevvooe commented Nov 3, 2016

@cyphar I completely agree. CAS is too-level to be useful. Thank you for the break down!

docker/docker#27455 is a proposal to docker to add a manifest tool which has the same UX problem. Basically, you need to "check out" an image, modify it, then check it back in.

In the spirt of OCI being made up of proven technologies, I think it would be worthwhile to let imagectl inform the work in image tools, if possible.

@vbatts
Member
vbatts commented Nov 3, 2016

lots of philosophy and bike sheds, it seems to me. Is the motivation to remove mediatype from all but the descriptor, @stevvooe @jonboulle ?

@wking wking referenced this pull request in containers/image Nov 3, 2016
Open

Image signature format #59

@jonboulle
Contributor

@vbatts I think I need to convert this PR to an epub and put it on my kindle to have any hope of catching up

@jonboulle
Contributor

@vbatts as I see it, two things going on:
i) inconsistency of whether types have a mediaType field or not
ii) the descriptor type has a mediaType field which has a different meaning than the other types that have it (in other cases it's self-describing, in the descriptor case it's describing the entity the descriptor references)

IMHO ii MUST be fixed. i I don't feel incredibly strongly about whether we address that by removing it everywhere or adding it to the places it's missing (see: ad infinitum arguments either way), but would still prefer consistency.

@cyphar
Member
cyphar commented Nov 4, 2016

@jonboulle I'm 👍 on just doing ii in this PR and no longer discussing i (I'm starting to get frustrated). @wking can have his discussion elsewhere.

@wking
Contributor
wking commented Nov 4, 2016

On Fri, Nov 04, 2016 at 04:13:06AM -0700, Jonathan Boulle wrote:

ii) the descriptor type has a mediaType field which has a
different meaning than the other types that have it (in other cases
it's self-describing, in the descriptor case it's describing the
entity the descriptor references)

IMHO ii MUST be fixed. i I don't feel incredibly strongly about
whether we address that by removing it everywhere or adding it to
the places it's missing (see: ad infinitum arguments either way),
but would still prefer consistency.

If you feel (ii) must be fixed, it's because you want to support
peek-inside typing and not confuse the peekers with the different
descriptor semantics. But if you address that by removing mediaType
from non-descriptor types, you're still not supporting the peekers.

If you don't care about peek-inside typing (this is where I'm at), it
doesn't matter if different types have different semantics for
mediaTyped. You already know the blob type from the referencing
descriptor or because you just created thte blob yourself. With this
position, you can leave the existing schemas alone (and land this PR
as it stands or not depending on whether you like having MediaTyped
independent of Versioned).

If you do care about peek-inside typing but don't want descriptor
confusing things, you'd need to rename either descriptor's mediaType
or the non-descriptors' mediaType. I don't think either of those are
possible unless we drop our current “we can parse Docker blobs
directly into our types without semantic changes” requirement. Even
if (ii) is bugging you, I'd be surprised if it is bugging you enough
to push through the current Docker-blob-semantics requirement ;).

@cyphar cyphar referenced this pull request in cyphar/umoci Nov 5, 2016
Closed

umoci: implement reference #6

0 of 3 tasks complete
@stevvooe
Contributor

@jonboulle

We should remove or deprecate the mediaType field from types that aren't descriptors.

i) inconsistency of whether types have a mediaType field or not
ii) the descriptor type has a mediaType field which has a different meaning than the other types that have it (in other cases it's self-describing, in the descriptor case it's describing the entity the descriptor references)

i was never supposed to exist, which created the inconsistency described in ii.

Types are supposed to be resolved through a qualified references. This both ensures secure access of resources, as well as easily supportable type-equivalence and versioning.

@jonboulle
Contributor

OK can we just remove it everywhere else then.

@vbatts
Member
vbatts commented Nov 29, 2016

Is the consensus here to remove mediaType from everything except the descriptor?
Will this not break compatibility, or are we saying we're okay with that, @stevvooe ?

@stevvooe
Contributor

Will this not break compatibility, or are we saying we're okay with that, @stevvooe ?

This should be okay.

I think we should reserve the field, so it is not used.

@stevvooe
Contributor
stevvooe commented Dec 7, 2016

@vbatts Do you need help in removing/reserving the mediaType field from other types?

@vbatts vbatts added a commit to vbatts/oci-image-spec that referenced this pull request Dec 9, 2016
@vbatts vbatts specs-go: clarify mediatypes
Due to the conflicting use of the `mediatType` field across
documents, and after discussion on
opencontainers#411,
this changeset removes the use of `mediaType` where it is used to refers
to a document's own type. Leaving only the use of `mediaType` for
descriptors, where it is used to describe the type of a referenced object.

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
e517a60
@vbatts
Member
vbatts commented Dec 9, 2016

Updated. Also updated the description on this PR to reflect the commit message.
PTAL.

@wking
Contributor
wking commented Dec 9, 2016

e517a60 still needs Markdown and JSON Schema updates.

@vbatts
Member
vbatts commented Dec 9, 2016
@vbatts
Member
vbatts commented Dec 9, 2016

@stevvooe by removing mediaType from places like https://github.com/opencontainers/image-spec/blob/master/manifest.md#image-manifest-property-descriptions it will absolutely make the OCI manifest and manifest-list not expressly compatible with the docker equivalents. Thoughts?

@vbatts vbatts specs-go: clarify mediatypes
Due to the conflicting use of the `mediatType` field across
documents, and after discussion on
opencontainers#411,
this changeset removes the use of `mediaType` where it is used to refers
to a document's own type. Leaving only the use of `mediaType` for
descriptors, where it is used to describe the type of a referenced object.

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
5fc84e5
@wking
Contributor
wking commented Dec 9, 2016

... it will absolutely make the OCI manifest and manifest-list not expressly compatible with the docker equivalents.

Removing mediaType from the OCI schema makes things more compatible in the Docker -> OCI direction (because you can just leave the Docker mediaType alone). In the OCI -> Docker direction we'll need mediaType injection (where we previously needed mediaType translation), but that doesn't seem like a major compatible drop.

@cyphar
Member
cyphar commented Dec 10, 2016

@vbatts We discussed this in the call. Basically @philips said that the mediaType was a mistake for Docker to have and that we don't need it. Conversion isn't really that significant of a reason, if you look at the conversion code I wrote for skopeo containers/image#172 the conversion is just a matter of changing mediaTypes. If we remove the mediaType from non-Descriptors then it's just a matter of removing one line from the manifest configuration.

Since the config doesn't have a mediaType we don't need to worry about the sha256sum of objects changing through conversion.

@wking
Contributor
wking commented Dec 10, 2016

If we remove the mediaType from non-Descriptors then it's just a matter of removing one line from the manifest configuration.

Both manifests and manifest lists explicitly allow arbitrary extension properties, so unless we reserve the field (and I don't see a point to reserving it), you wouldn't need to remove the self-decribing mediaType. You probably will still need to translate descriptor mediaTypes, but you have to do that already.

Since the config doesn't have a mediaType we don't need to worry about the sha256sum of objects changing through conversion.

Manifest digests are going to change regardless of this PR, since you're translating layers[].mediaType, etc. Config digests are not going to change regardless of this PR, since config contains no mediaType properties.

@cyphar
Member
cyphar commented Dec 10, 2016

Manifest digests are going to change regardless of this PR, since you're translating layers[].mediaType, etc. Config digests are not going to change regardless of this PR, since config contains no mediaType properties.

That was my point. In Docker, manifests aren't "objects" in the same sense as in OCI. But configuration is.

- For this version of the specification, this MUST be set to `application/vnd.oci.image.manifest.list.v1+json`.
- For the media type(s) that this is compatible with, see the [matrix](media-types.md#compatibility-matrix).
+ This property is *reserved* for use, to [maintain compatibility][matrix].
+ When used, this field contains the media type of this document, which differs from the [descriptor](descriptor.md#properties) use of `mediaType`.
@wking
wking Dec 15, 2016 Contributor

This seems awkward. I'd expect "reserved" to mean "configs MUST NOT set this property, and we haven't assigned semantics to it". If it is optional and has defined semantics, what do you intend to change by reserving it too?

I think we'll have better Docker compatibility if we drop this property from the spec entirely (except for descriptor.mediaType).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment