Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: One Meta Data to Rule Them All => Labels #9882

Merged
merged 7 commits into from Mar 17, 2015

Conversation

Projects
None yet
@ibuildthecloud
Copy link
Contributor

commented Jan 3, 2015

This is yet another meta data PR in an attempt to pull together multiple PRs hopefully into something we can get into Docker 1.5 (seriously, let's move fast).

Background

Some background... (from what I can gather). There is currently #8955 for adding UserData to a Dockerfile. Basically it adds the USERDATA key=value key="long value" syntax to the Docker. Then there is #9013 which looks to add structured JSON data to a Dockerfile and a container in much the same fashion as Kubernete's annotations. Finally there is #9854 which discusses the need for meta data on container to help with Swarm and similar clustering systems. There are probably another 10 threads that @thaJeztah can find that also talk about the need to add meta data to containers.

We need this...

We need meta data, it's clear user want it.

Labels

We already have labels on Hosts (Docker daemon) today. It seems that going forward we should be able to add labels to everything: Hosts, containers, volumes, images, etc. Let's just continue with that approach. Labels are simple key/value pairs in the style of map[string]string. There have been comments to standarize the format of the keys such that we have namespaces or other means to prevent conflicts. Honestly I don't think that is all that necesary at the Docker level to dictate this. It's just good practice to namespace things. Whether you do "foo_bar" or "foo/bar" or "com.foo/bar", who really cares. If your going to use "id=3", well that's a bad name and somebody may clobber your value. I'm assuming some will disagree with me on this one. That's fine, I'll agree to a namespace standard just as long as it doesn't take three weeks and 132 comments to decide that DNS format is far superior to arbitrary strings split by "/".

Labels are not structured data, and as such this PR is different from #9013, so that discussion can happen differently. Honestly, I'm not in favor of adding structured data to objects and having Docker maintain it. But if others disagree, so be it, we can have structured data as something else.

What about ENV vars

Yes, labels are very close to environment variables. You can add ENV to a Dockerfile and you can add them at container create. The basic difference here is that these key/values are not visible to the running processes in the container.

Lookup by Label

Another key attribute of labels is that you should be able to find an object based on it's label. This should initially be kept very simple. You can either say "give me all images/containers that have key foo." Or you can say "give me all images/containers that have foo=bar".

How should we go about doing this?

I think #8955 is the right start. USERDATA should be renamed to LABEL. Now that we have LABEL on images we need to add --label key=value to docker run/create. This is in the same fashion as ENV and -e today.

Now the only thing left do to is to figure out how to query based on labels. We just need to add --label foo or --label foo=bar to docker images/ps.

It's just that simple folks

Okay, good? Alright, let's move forward...

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Jan 3, 2015

Thanks, Darren!

Honestly I don't think that is all that necesary at the Docker level to dictate this. It's just good practice to namespace things. Whether you do "foo_bar" or "foo/bar" or "com.foo/bar", who really cares.

Fully agree on this one. Docker should offer the means to store, search and retrieve the data, but have no opinion on what they are used for, or what (naming)conventions are used. If Docker itself is using meta-data for something, that is just an implementation, just like any other system using the meta-data.

The data stored in a label is just free-form text as well; if an implementation decides to use it for storing JSON, that's fine, but Docker doesn't offer special treatment for those values; no parsing, validation or nested search for JSON properties.

Indexing / performance

To be useful (for example, fetch a container via a "custom" id), querying meta-data should be fast. Useful indexes should probably be present, including "partial" matches or wild-card support, both on "keys" and "values". For example, getting all containers that have a labels with namespace/prefix starting with my.name.space.*

Scope / Visibility

We should probably ask if meta data is only accessible from "outside" containers (in case of meta-data on containers), or also from within a container; I can see use-cases where meta-data can be useful inside a container. How to control access is something to be discussed (also wrt read/read-write)

Inheritance

In case of Image and Container meta-data; should images share the same meta-data as containers running from it? Will they be inherited, but kept separate? Or are they "merged" when creating a container instance?

@phemmer

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2015

I also agree that docker should not dictate how the data gets used or formatted. I personally think docker tries to dictate things a bit too often. We should recommend a best practice, but if someone wants to ignore best practice, they might have a good reason for it.

However I don't like the term 'label'. Most systems I've ever dealt with have treated a label as value-only data, not key/value (one example being github labels). Just to clarify what I mean, a label would be something like --label foo_bar, where there is no =, and thus no presence of a key.

@jessfraz

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2015

I like this FWIW

@erikh

This comment has been minimized.

Copy link
Contributor

commented Jan 5, 2015

Paging @vieux and @aluzzardi, as this may be very relevant to their interests.

I also think it's a really good idea. Chef search is very similar and one of its best features.

@ibuildthecloud

This comment has been minimized.

Copy link
Contributor Author

commented Jan 5, 2015

I want to make it clear that my intention is to quickly move this forward. I want to find what we can implement now that will give the minimally viable value but also put us on clear path to adding more functionality.

@thaJeztah - Comments below

Indexing/Performance

I completely agree that fast lookup is required. That is one of the fundamental differences between environment variables and labels. For containers I think this can be easily achieved by just keeping the labels in memory in the current data structures. Searching for a label will just be iterating over the list of containers in memory. This means at it’s worse searching for a label will be as slow as docker ps. I think this is sufficient at first. We can improve down the line.

Images become more difficult as you have more images than containers, typically. For images a real index should be built. The problem though is that docker networks are coming soon and volumes will probably not be too far. It seems we should find one consistent approach for labels that works for all object types. As such, I would like to defer on search for images by tag. I’ve currently seen a higher demand for fast lookup of containers, but not the same for images. I’m not saying the use cases don’t exist, just that containers are a higher priority.

The way in which one can search is largely based on the underly index. So supporting wildcard, regexp, etc. has real technical implications. I think we need a good query syntax, but for the first pass I think it is safe to support “give me all containers that have label foo” and “give me all containers that have label foo equal to bar”. The syntax would be docker ps --label foo and docker ps --label foo=bar respectively.

Scope/Visibility

First off, scope visibility doesn’t really matter until we have an introspection service. So this is obviously a discussion that will happen elsewhere, but regardless I’ll say what I think it should be. Labels should follow the existing pattern of ports. That being that they are private by default and must be explicitly published. Defining a label is the same as EXPOSE in the Dockerfile and then publishing the label should follow a pattern like -P and -p .... This means all labels will not be seen by the container unless you explicitly --publish-all-labels or --publish-label foo. It seems a wildcard syntax should exist like --publish-label com.example/*.

Inheritance

I hope you notice a trend in my comments in that we should just follow existing patterns. For inheritance I would expect labels follow the same approach as environment variables. I honestly haven’t given a huge amount of thought to this, but I think the ENV approach should be sufficient.

@phemmer I completely agree that label is a bad term. Unfortunately the precedence has already been set with host labels and Kubernetes labels. I have a strong opinion that it's better to be consistently wrong then inconsistently right. I think we should just stick with the ill named “label.”

Implementation

I have every intention of pushing this through as fast as possible. I’m going to code the implementation of this hopefully today based off of @rhatdan’s existing work in #8955. I'm optimistic the community can come to a consensus.

@dnephin

This comment has been minimized.

Copy link
Member

commented Jan 6, 2015

Labels would be awesome for fig/docker compose. Tracking images with tags would allow users to use any name for their images and containers, and would address some of the performance issues with the current version.

Inheritance

It seems to me like inheritance could be entirely client-side. A client which is creating a container from an image should be able to decide which labels to copy over to the container.

@@ -1,3 +1,5 @@
Okay, I'm lazy. I'll write docs if I think this PR doesn't get violently opposed
to at first...

This comment has been minimized.

Copy link
@SvenDowideit

SvenDowideit Jan 6, 2015

Contributor

giggle - you'll want to remove this >:}

@@ -11,7 +11,7 @@ let b:current_syntax = "dockerfile"

syntax case ignore

syntax match dockerfileKeyword /\v^\s*(ONBUILD\s+)?(ADD|CMD|ENTRYPOINT|ENV|EXPOSE|FROM|MAINTAINER|RUN|USER|VOLUME|WORKDIR|COPY)\s/
syntax match dockerfileKeyword /\v^\s*(ONBUILD\s+)?(ADD|CMD|ENTRYPOINT|ENV|EXPOSE|FROM|MAINTAINER|RUN|USER|LABEL|VOLUME|WORKDIR|COPY)\s/

This comment has been minimized.

Copy link
@SvenDowideit

SvenDowideit Jan 6, 2015

Contributor

Ah, this means you need to mention LABEL as on of the ONBUILD instructions in the docs and man page


LABEL Description="This image is used to start the foobar executable" Vendor="ACME Products"
LABEL Version="1.0"

This comment has been minimized.

Copy link
@SvenDowideit

SvenDowideit Jan 6, 2015

Contributor

are these (or could they be) the same kind of labels as are happening for swarm? @vieux ?

This comment has been minimized.

Copy link
@ibuildthecloud

ibuildthecloud Jan 6, 2015

Author Contributor

@SvenDowideit The idea is that these labels should be conceptually the same as the host labels. Going forward we will just be able to apply labels to everything, on host, containers, images, and future networks and volumes.

This comment has been minimized.

Copy link
@SvenDowideit

SvenDowideit Jan 8, 2015

Contributor

ie, #everythingisawesome! 👍

"Vendor=Acme",
"License=GPL",
"Version=1.0"
],

This comment has been minimized.

Copy link
@phemmer

phemmer Jan 6, 2015

Contributor

Why are the labels stored/accessed as a dictionary/hash everywhere else, but provided as an array through the API?
What happens if we do something like:

"Labels": [
  "foo=bar",
  "foo=baz",
]

I think it would be more appropriate to treat it as a dictionary/hash everywhere.

Also assuming the inspect data would be an array as well, what if I want to do:

docker inspect --format '{{.Config.Labels.foo}}'

...to get a specific value. With an array, you cant.

This comment has been minimized.

Copy link
@ibuildthecloud

ibuildthecloud Jan 6, 2015

Author Contributor

@phemmer I agree a map is better, I actuallly updated the code as such, but not the documentation yet.

This comment has been minimized.

Copy link
@ibuildthecloud

ibuildthecloud Jan 6, 2015

Author Contributor

@vieux @phemmer Okay, I've just noticed that host labels are ["key=value"] in the API and not a map. @vieux Why was it chosen to be done this way? It does make the code a bit wonky.

This comment has been minimized.

Copy link
@ibuildthecloud

ibuildthecloud Jan 6, 2015

Author Contributor

So it appears that ["key=value"] is there to support multiple keys of the same name. We will stick with that syntax.

This comment has been minimized.

Copy link
@phemmer

phemmer Jan 7, 2015

Contributor

If were going with that, then I think we should remove any restriction that the label content contain a =.

Internally the code won't be able to use a map since you can have duplicate keys, so it'll just be an array of strings. If it's an array of strings, I don't see any reason why a = would be required.
Doing this also supports the term 'label' as well, as it's no longer a key/value pair (it can be used as such, but it's up to the client).

This comment has been minimized.

Copy link
@ibuildthecloud

ibuildthecloud Jan 7, 2015

Author Contributor

We need a notion of key and value to support search as I described. I would rather change the name to Metadata then remove the notion of =

@phemmer

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2015

Do we ever anticipate labels being added/removed on an already existing container? If so, storing them in the config data precludes this possibility.

@aanand

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2015

I agree with pretty much everything in this proposal - it's basically the one I would've written. As @dnephin says, Compose will be a primary consumer of this feature.

@bfirsh bfirsh added the UX label Jan 6, 2015

@rhatdan

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2015

I can go along with this, just as long as something finally gets merged to allow us to add Meta Data. Even if we can just settle on a name Meta->UserData->Label...

@ibuildthecloud ibuildthecloud force-pushed the ibuildthecloud:labels branch 4 times, most recently from 7bcc076 to 7db7ad1 Jan 6, 2015

"Vendor=Acme",
"License=GPL",
"Version=1.0"
],

This comment has been minimized.

Copy link
@bfirsh

bfirsh Jan 7, 2015

Contributor

Is this a list for a specific reason, or just consistency with Env? Could it be this?

"Labels": {
  "Vendor": "Acme",
  "License": "GPL",
  "Version": "1.0"
},

This comment has been minimized.

Copy link
@thaJeztah

thaJeztah Jan 7, 2015

Member

Apparently that is to allow multiple keys with the same name (#9882 (comment)).

I wonder (if it should be supported) would be better to store as;

    "Labels": {
      "Vendor": ["Acme"],
      "License": ["GPL"],
      "Version": ["1.0"],
      "foo": ["bar","baz","bam"]
    },

Note; I'm not suggesting that users can directly provide JSON arrays as argument, but only to store multiple --label foo=bar --label foo=baz --label foo=bam

One case that might cause problems here is if one of those labels doesn't have a value (--label foo). Perhaps this should be stored as [NULL,"bar","baz","bam"]? IDK

This comment has been minimized.

Copy link
@ibuildthecloud

ibuildthecloud Jan 7, 2015

Author Contributor

I think this would be a more logical data structure.

This comment has been minimized.

Copy link
@aanand

aanand Jan 7, 2015

Contributor

@thaJeztah Labels "without values" don't make a lot of sense to me. A label with the empty string as its value, sure, but having a distinct NULL value feels messy.

This comment has been minimized.

Copy link
@thaJeztah

thaJeztah Jan 7, 2015

Member

yes that's better; if we're treating label values as strings, then "no" value should be an empty string

This comment has been minimized.

Copy link
@thaJeztah

thaJeztah Jan 7, 2015

Member

Lolz. Reading back your comment, you probably suggested to dont add a value to the array at all?

Should probably be fine as well. There's no need to know how many times I set a label, so --label foo will result in {"foo":[]}` (an empty array), right?

@@ -1501,6 +1504,7 @@ removed before the image is removed.
--ipc="" Default is to create a private IPC namespace (POSIX SysV IPC) for the container
'container:<name|id>': reuses another container shared memory, semaphores and message queues
'host': use the host shared memory,semaphores and message queues inside the container. Note: the host mode gives the container full access to local shared memory and is therefore considered insecure.
-l, --label=[] Set labels

This comment has been minimized.

Copy link
@bfirsh

bfirsh Jan 7, 2015

Contributor

Should probably explain what labels are. E.g. "Set key/value metadata on container"

@jessfraz jessfraz added the Proposal label Jan 7, 2015

@ibuildthecloud ibuildthecloud force-pushed the ibuildthecloud:labels branch from 7db7ad1 to 77c4cd4 Jan 7, 2015


Use the `--label-file` flag to load multiple labels from a file. Delimit each
label in the file with an EOL mark. The example below loads labels from a
labels file in the current directory;

This comment has been minimized.

Copy link
@SvenDowideit

SvenDowideit Mar 17, 2015

Contributor

can I have multiple --label-file flags? what happens?

(as i read on i get the answer to my question)

please move the 'You can load multiple....` up to this paragraph, so the reader isn't distracted :)

$ sudo docker run --label-file ./labels ubuntu bash

The label-file format is similar to the format for loading environment variables
(see `--env-file` above). The following example illustrates a label-file format;

This comment has been minimized.

Copy link
@SvenDowideit

SvenDowideit Mar 17, 2015

Contributor

can this be a link?

A label is a `<key>` / `<value>` pair. Docker stores the values as *strings*.
You can specify multiple labels but each `<key>` / `<value>` must be unique. If
you specify the same `key` multiple times with different values, each subsequent
value overwrites the previous. Docker applies the last `key=value` you supply.

This comment has been minimized.

Copy link
@SvenDowideit

SvenDowideit Mar 17, 2015

Contributor

this sounds like it should be written as each <key> will be unique, and have one value

if you say each key=value must be unique, aren't you saying that a key can be in the store more than once?


These are guidelines and are not enforced. Docker does not *enforce* them.
Failing following these guidelines can result in conflicting labels. If you're
building a tool that uses labels, you *should* use namespaces for your label keys.

This comment has been minimized.

Copy link
@SvenDowideit

SvenDowideit Mar 17, 2015

Contributor

can we make this more explicit? conflict can mean several things - when really (if i understand correctly) the last LABEL for a particular key replaces all others. (for that image layer? - but leaving the previous image label alone?)

LABEL com.example.label-with-value="foo"
LABEL version="1.0"
LABEL description="This text illustrates \
that label-values can span multiple lines."

This comment has been minimized.

Copy link
@SvenDowideit

SvenDowideit Mar 17, 2015

Contributor

Can we note that the example above will result in 4 image layers - each of which will have different sets of labels?

and if the example as the same key used twice, we can show how the conflict resolution works, and that the subsequent label doesn't affect that of the lower layer?


You can store this struct in a label by serializing it to a string first:

LABEL com.example.image-specs="{\"Description\":\"A containerized foobar\",\"Usage\":\"docker run --rm example\\/foobar [args]\",\"License\":\"GPL\",\"Version\":\"0.0.1-beta\",\"aBoolean\":true,\"aNumber\":0.01234,\"aNestedArray\":[\"a\",\"b\",\"c\"]}"

This comment has been minimized.

Copy link
@SvenDowideit

SvenDowideit Mar 17, 2015

Contributor
LABEL com.example.image-specs="{\
    \"Description\":\"A containerized foobar\",\
    \"Usage\":\"docker run --rm example\\/foobar [args]\",\
    \"License\":\"GPL\",\
    \"Version\":\"0.0.1-beta\",\
    \"aBoolean\":true,\
    \"aNumber\":0.01234,\
    \"aNestedArray\":[\"a\",\"b\",\"c\"]\
}"

might be less painful.

@SvenDowideit

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2015

yup, minor nits - basically, most of my questions are answered elsewhere, but the reader won't know that at the time.

LGTM - though if you address the nits, it'll Look even better :)

@icecrime

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2015

Thanks all, and thanks Darren for your patience! @moxiegirl will take care of the final adjustments in a separate PR.

icecrime pushed a commit that referenced this pull request Mar 17, 2015

Arnaud Porterie
Merge pull request #9882 from ibuildthecloud/labels
Proposal: One Meta Data to Rule Them All => Labels

@icecrime icecrime merged commit b6ac111 into moby:master Mar 17, 2015

1 of 2 checks passed

windows Jenkins build Windows-PRs 469 has failed
Details
janky Jenkins build Docker-PRs 3445 has succeeded
Details
@ibuildthecloud

This comment has been minimized.

Copy link
Contributor Author

commented Mar 17, 2015

@icecrime Thank you so much for helping move this along.

@thaJeztah

This comment has been minimized.

Copy link
Member

commented Mar 17, 2015

Wow! Happy to see this merged. Thanks @ibuildthecloud for finally making this happen!

@wyaeld

This comment has been minimized.

Copy link

commented Mar 17, 2015

huge thank-you to all the people who worked on this and the various discussions that led to it, such a useful building block

One question (not to try and increase scope here though). Do people think it makes sense for the distribution components (docker search) to eventually allow for filtering on the labels?

-- `LABEL <key>[=<value>] [<key>[=<value>] ...]`
The **LABEL** instruction adds metadata to an image. A **LABEL** is a
key-value pair. To include spaces within a **LABEL** value, use quotes and
blackslashes as you would in command-line parsing.

This comment has been minimized.

Copy link
@TomasTomecek

TomasTomecek Mar 17, 2015

Contributor

blackslashes → backslashes

This comment has been minimized.

Copy link
@thaJeztah

thaJeztah Mar 17, 2015

Member

@TomasTomecek that's "dark matter" I think 😄 Do you want to make a pull-request to fix that?

This comment has been minimized.

Copy link
@TomasTomecek

TomasTomecek Mar 17, 2015

Contributor

@thaJeztah
@icecrime said that @moxiegirl will do final adjustments; I guess this can be included in those

This comment has been minimized.

Copy link
@thaJeztah

thaJeztah Mar 17, 2015

Member

Ah, you're right! Thanks for spotting it nevertheless :)

@rhatdan

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2015

This is huge. Only been waiting on this one for about a year....

@bfirsh

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2015

Yaaaay. What an excellent birthday present. Thanks all for your help.

We're going to try and get Compose support in for this in the next release: docker/compose#1066

@dreamcat4

This comment has been minimized.

Copy link

commented Mar 17, 2015

@bfirsh Well if you implement Compose support, please consider @thaJeztah's idea earlier on in this thread ^^. So compose can be auto-converting nested data structured into json text blob. So we can have basically annotations-like feature (immutable), and arbitrary or complex data structures saved into labels.

@nicornk

This comment has been minimized.

Copy link

commented Aug 3, 2015

Hi Guys,
is it possible / how is it possible to add a label to an already running container?
Thanks!

@cpuguy83

This comment has been minimized.

Copy link
Contributor

commented Aug 3, 2015

@nicornk Not possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.