Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
Document why we don't use maps in the API in api-conventions.md #2004
Forked from #1980.
The API has a number of lists containing names embedded in objects, such as Volumes, Containers, Ports, Env, and VolumeMounts. Both configuration and field references (e.g., in filter expressions, events) are uglier and/or more verbose when using lists rather than maps. This puts us at a disadvantage compared to other systems with more elegant API and/or configuration schemas (e.g., Fig).
Automatically translating maps into lists of named objects appears to be hard. In JSON and YAML, structures and maps cannot be distinguished without a schema. It looks like we'd need either duplicate fields, duplicate schemas, or a custom parser in order to support both formats in the same API version. Duplicate schemas have proven hard to maintain, both in Kubernetes and internally. I don't think we want to maintain parallel API versions forever, either. The go-yaml parser is around 9k lines of code -- a custom parser is not something we want to own, IMO.
The specific proposal here is to:
Port name is currently optional. It would effectively be required. A convention of "p" would be straightforward for users/tools, however, such as in the case of ports auto-populated from Docker images (e.g., by podex).
This would be a breaking change. We could do it in v1beta3.
The names of top-level objects, in ObjectMeta in v1beta3, would not be changed. Those can be auto-populated in v1beta3 by clients in a straightforward manner.
We've gone back an forth on this in the past in various config discussions. Here is the last time I remember on k8s: #853 (comment)
The crux of this problem is that it isn't clear to the user what "left hand side strings" are "magic keywords" in the config system/API vs. which are user data.
Hoisting the example from that other bug -- compare these two:
ports: - name: www hostPort: 80 containerPort: 80 protocol: tcp
ports: www: hostPort: 80 containerPort: 80 protocol: tcp
While B is obviously shorter than A, I think it is more confusing for the novice user. When copy/pasting examples or reading unfamiliar configs, the novice user won't know what
If we follow your suggestion and a
ports: www: name: www hostPort: 80 containerPort: 80 protocol: tcp
Questions users'll be asking: why is
@jbeda Thanks for pointing me at the previous discussion. I remembered that it had come up before but couldn't remember where.
I see your point regarding distinguishing schema keys and user names. However, we don't have enough user data to be able to really know which they'd prefer.
Also, maps are used in competing solutions, such as Fig:
Admittedly, Fig does this only for their top-level objects -- containers in their case, and I do the same in my configuration generators. Although, they do accept both maps and lists for environment variables. I'll look at how they do this (note: Fig is python, not Go).
Another alternative could be to make all subobject names optional. There are no subobject names in Docker's API, nor in Fig. However, one reason we have names for subobjects is that we've added a parent object around containers and volumes.
With respect to the example above: Mainly subobject names would be needed internally rather than in serialized form. We could potentially omit them from serialized form by specifying "-" so that the fields are ignored during marshaling/unmarshaling.
We should work to reduce the amount of boiler plate, for sure, but I'm not sure that the API is the place to do it. If we want to have a more concise config language/schema -- go for it. There is room in the world to allow for different trade offs.
There are other things we want to avoid here -- significantly, each key should have one and only one form for what it accepts -- this let's us have a strongly typed schema instead of forcing us to interpret the yaml parse tree with custom code.