Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Proposal: One Meta Data to Rule Them All => Labels #9882
This is yet another meta data PR in an attempt to pull together multiple PRs hopefully into something we can get into Docker 1.5 (seriously, let's move fast).
Some background... (from what I can gather). There is currently #8955 for adding UserData to a Dockerfile. Basically it adds the
We need this...
We need meta data, it's clear user want it.
We already have labels on Hosts (Docker daemon) today. It seems that going forward we should be able to add labels to everything: Hosts, containers, volumes, images, etc. Let's just continue with that approach. Labels are simple key/value pairs in the style of
Labels are not structured data, and as such this PR is different from #9013, so that discussion can happen differently. Honestly, I'm not in favor of adding structured data to objects and having Docker maintain it. But if others disagree, so be it, we can have structured data as something else.
What about ENV vars
Yes, labels are very close to environment variables. You can add ENV to a Dockerfile and you can add them at container create. The basic difference here is that these key/values are not visible to the running processes in the container.
Lookup by Label
Another key attribute of labels is that you should be able to find an object based on it's label. This should initially be kept very simple. You can either say "give me all images/containers that have key foo." Or you can say "give me all images/containers that have foo=bar".
How should we go about doing this?
I think #8955 is the right start.
Now the only thing left do to is to figure out how to query based on labels. We just need to add
It's just that simple folks
Okay, good? Alright, let's move forward...
referenced this pull request
Jan 3, 2015
Fully agree on this one. Docker should offer the means to store, search and retrieve the data, but have no opinion on what they are used for, or what (naming)conventions are used. If Docker itself is using meta-data for something, that is just an implementation, just like any other system using the meta-data.
The data stored in a label is just free-form text as well; if an implementation decides to use it for storing JSON, that's fine, but Docker doesn't offer special treatment for those values; no parsing, validation or nested search for JSON properties.
Indexing / performance
To be useful (for example, fetch a container via a "custom" id), querying meta-data should be fast. Useful indexes should probably be present, including "partial" matches or wild-card support, both on "keys" and "values". For example, getting all containers that have a labels with namespace/prefix starting with
Scope / Visibility
We should probably ask if meta data is only accessible from "outside" containers (in case of meta-data on containers), or also from within a container; I can see use-cases where meta-data can be useful inside a container. How to control access is something to be discussed (also wrt read/read-write)
In case of Image and Container meta-data; should images share the same meta-data as containers running from it? Will they be inherited, but kept separate? Or are they "merged" when creating a container instance?
I also agree that docker should not dictate how the data gets used or formatted. I personally think docker tries to dictate things a bit too often. We should recommend a best practice, but if someone wants to ignore best practice, they might have a good reason for it.
However I don't like the term 'label'. Most systems I've ever dealt with have treated a label as value-only data, not key/value (one example being github labels). Just to clarify what I mean, a label would be something like
This was referenced
Jan 4, 2015
referenced this pull request
Jan 5, 2015
I want to make it clear that my intention is to quickly move this forward. I want to find what we can implement now that will give the minimally viable value but also put us on clear path to adding more functionality.
@thaJeztah - Comments below
I completely agree that fast lookup is required. That is one of the fundamental differences between environment variables and labels. For containers I think this can be easily achieved by just keeping the labels in memory in the current data structures. Searching for a label will just be iterating over the list of containers in memory. This means at it’s worse searching for a label will be as slow as
Images become more difficult as you have more images than containers, typically. For images a real index should be built. The problem though is that docker networks are coming soon and volumes will probably not be too far. It seems we should find one consistent approach for labels that works for all object types. As such, I would like to defer on search for images by tag. I’ve currently seen a higher demand for fast lookup of containers, but not the same for images. I’m not saying the use cases don’t exist, just that containers are a higher priority.
The way in which one can search is largely based on the underly index. So supporting wildcard, regexp, etc. has real technical implications. I think we need a good query syntax, but for the first pass I think it is safe to support “give me all containers that have label foo” and “give me all containers that have label foo equal to bar”. The syntax would be
First off, scope visibility doesn’t really matter until we have an introspection service. So this is obviously a discussion that will happen elsewhere, but regardless I’ll say what I think it should be. Labels should follow the existing pattern of ports. That being that they are private by default and must be explicitly published. Defining a label is the same as
I hope you notice a trend in my comments in that we should just follow existing patterns. For inheritance I would expect labels follow the same approach as environment variables. I honestly haven’t given a huge amount of thought to this, but I think the ENV approach should be sufficient.
@phemmer I completely agree that label is a bad term. Unfortunately the precedence has already been set with host labels and Kubernetes labels. I have a strong opinion that it's better to be consistently wrong then inconsistently right. I think we should just stick with the ill named “label.”
I have every intention of pushing this through as fast as possible. I’m going to code the implementation of this hopefully today based off of @rhatdan’s existing work in #8955. I'm optimistic the community can come to a consensus.
referenced this pull request
Jan 5, 2015
Labels would be awesome for fig/docker compose. Tracking images with tags would allow users to use any name for their images and containers, and would address some of the performance issues with the current version.
It seems to me like inheritance could be entirely client-side. A client which is creating a container from an image should be able to decide which labels to copy over to the container.
As it's implemented right now, it is the following
I've written the code, tests, and some documentation. I feel the approach is pretty solid but there are two remaining issues I see.
Labels is consistent with host labels and Kubernetes. It has already been pointed out that because these labels allow multiple keys with the same name they are already different from Kubernetes. Labels is not the obvious term I believe because most people don't think of key/value pairs. Meta data seems like the more accepted term. Meta data would be inconsistent with host labels, but we could standardize on that name going forward as we apply this same approach to networks and volumes in the future.
The current code is using
Sorry to pile on the design questions, but I'm concerned about multiple labels with the same key. My gut reaction is to not allow it, because then it's impossible to override a label - if an image specifies
I'm guessing this is an artifact of how the ENV parser works?
On Wed, 2015-01-07 at 08:41 -0800, Aanand Prasad wrote:
@aanand I agree, I also don't like multiple labels with different names. My assumption is that there was already a long discussion about this for host labels, as host labels were implemented this way. Maybe @vieux can chime in.
Rancher.io will be a consumer of this API and I would personally prefer to not allow multiple keys as it complicates interactions. Additionally, even though we say you can have multiple keys with the same name it is not possible to do that from a Dockerfile. The Dockerfile is always assuming that you are overriding the value. Come to think of it the way the
pushed a commit
this pull request
Mar 17, 2015
Mar 17, 2015
huge thank-you to all the people who worked on this and the various discussions that led to it, such a useful building block
One question (not to try and increase scope here though). Do people think it makes sense for the distribution components (
@bfirsh Well if you implement Compose support, please consider @thaJeztah's idea earlier on in this thread ^^. So compose can be auto-converting nested data structured into