Proposal: Long term evolution of runtime.Scheme / conversion.Scheme #7111

smarterclayton · 2015-04-21T15:24:54Z

Continued from #2306

A runtime.Scheme combines a few concepts:

A registry of APIVersion -> Kind string -> Go object
A library of conversion routines which includes APIVersion -> Go object as well as Go object -> Go object
An encoding mechanism based around JSON

In order to better support multiple API patterns grouped together (Kubernetes user API, Kubelet API, OpenShift API, future extension API) these assumptions will start to break down.

Known problems:

A 3rd party API may have a "v1". That API may define an object "List", that they want to convert to a Kubernetes "v1" "List". In our current setup, it would be impossible for those two API objects to be compiled into the same runtime.Scheme (because they both want to reserve "v1/List" to different objects). It would also be impossible to convert from the third party API "v1" "List" to a Kubernetes "v1" "List" because they exist in different conversion schemes.

A 3rd party API may want to reuse the Kubernetes "Status" object in their own API schema for their version "v1beta1". It would be impossible for them to use the "v1" Kubernetes status object without having a separate runtime.Scheme, but their client library may want to convert both Kubernetes v1 objects to their internal type.

Finally, we may want to introduce new serialization types for objects - Gob, Protobuf, Msgpack, JSON+snappy, for various use cases (more efficient storage). The coupling of JSON to runtime.Scheme complicates those conversions or makes them less efficient.

Types of problems we want to solve:

It should be possible to take arbitrary serialized data and convert it to a versioned Go struct - call this "marshalling/unmarshalling". The interface for this is a Codec. There may be many codecs.
1. Where possible, we want to reuse the underlying infrastructure for marshaling and unmarshalling that exists in go - "encoding/json" / "encoding/gob" / "protobufs" efficiently.
2. It's likely that there exists one canonical "marshaller" for most objects, but there may be alternate mechanisms. For example, JSON is canonical for Kube, but we may occasionally want to marshal an annotated YAML object that has description comments for each field interleaved.
3. There are places where we want to convert map[string]interface{} (generic JSON object representation) or a map[string][]string (url.Values) into a versioned Go struct, which is dependent on a type of marshaling (i.e. following JSON rules, or Query parameter rules).
It should be possible to convert a versioned Go struct into a different versioned Go struct (typically by going to an intermediate format) on the server. This is "object versioning".
1. "Object versioning" requires the registration of an object Kind into an APIVersion for a specific API group. This was the original intent of runtime.Scheme.
It should be possible to convert arbitrary Go structs to different Go structs. This is "conversion".
1. Object versioning is a specific subset of conversion
2. We should try to make conversion less dependent on scheme?
??

Consumers:

The APIServer needs to take incoming objects with a content-type (marshaling format), determine the Kind (whether it is accepted by the endpoint), verify the provided APIVersion is allowed (or default it), and then unmarshal it to a versioned Go struct. We then want to use "object versioning" to return an internal representation. It also needs to take internal representations, convert them to an external object version, and then marshal that object version to a given content-type.
The Go client needs to read responses from the server (unmarshal), determine the Kind of the response, verify the APIVersion is recognized, and then convert it to an internal representation (object versioning) that the client can use to present information to a user.
The Go command line tool needs to take a representation returned by the server in one version, convert it to another version, and process that other version (talk to API v2, convert to API v1, output to template on CLI).
The Go command line tool needs to read objects on disk in known marshaller formats, be able to transform those objects in specific ways (set namespace for v1beta1 and v1beta3) or perform Swagger specific validation. The Go command line tool must support taking objects that it was not compiled with and be able to transmit them to a compatible API server that is not Kubernetes once we want to support true extensions.
??

Still a WIP, trying to gather thoughts.

The text was updated successfully, but these errors were encountered:

smarterclayton · 2015-04-21T15:34:40Z

@lavalamp @bgrant0607 at minimum.

davidopp · 2015-04-21T20:00:51Z

/sub

wojtek-t · 2015-04-22T06:25:11Z

/sub

lavalamp · 2015-04-22T21:13:13Z

I agree with the problem statement.

One quibble, I think we want the client to by default unmarshall into a versioned type, otherwise it's difficult for non-kubernetes components to write code that stays working for more than a day. Apiserver/scheduler etc still need to work with our unversioned objects. This is already possible today--since we support DecodeInto--but we don't take advantage of it.

smarterclayton · 2015-04-22T21:14:31Z

Agreed - I will clarify.

----- Original Message -----

I agree with the problem statement.

One quibble, I think we want the client to by default unmarshall into a
versioned type, otherwise it's difficult for non-kubernetes components to
write code that stays working for more than a day. Apiserver/scheduler etc
still need to work with our unversioned objects. This is already possible
today--since we support DecodeInto--but we don't take advantage of it.

Reply to this email directly or view it on GitHub:
#7111 (comment)

bgrant0607 · 2015-04-23T03:38:10Z

I agree we need to support a mix of multiple api prefixes and versions.

We should consider moving generic objects like List and Status to their own api prefix and version.

I agree that we should decouple encoding and schema.

Currently our only conversions between versioned structs must transition through the internal representation. I don't want to support O(n^2) conversions, so that seems like a good property to keep.

In general, clients should use a versioned rep and should not round-trip: #4874, #3955. I disagree that get, describe, etc. need to use the internal rep. We should assume the v1 API is stable and just use that until a new stable version is available. Cross-version conversion tools are a special case, not an every-day use case.

smarterclayton · 2015-04-23T03:42:36Z

On Apr 22, 2015, at 11:38 PM, Brian Grant notifications@github.com wrote:

I agree we need to support a mix of multiple api prefixes and versions.

We should consider moving generic objects like List and Status to their own api prefix and version.

I agree that we should decouple encoding and schema.

Currently our only conversions between versioned structs must transition through the internal representation. I don't want to support O(n^2) conversions, so that seems like a good property to keep.

In general, clients should use a versioned rep and should not round-trip: #4874, #3955. I disagree that get, describe, etc. need to use the internal rep. We should assume the v1 API is stable and just use that until a new stable version is available. Cross-version conversion tools are a special case, not an every-day use case.

So to tie that in here - we lack the ability today to detect the object schema version of an object (what fields are available on this particular v1 object). Assuming we wish to expose new properties into v1 over its lifetime, we'll need to take that into account as well, so that newer clients (1.2, 1.3) when talking to servers on 1.0 through v1 are not exposing themselves to flawed assumptions (client can set value, but server ignores it).
—
Reply to this email directly or view it on GitHub.

bgrant0607 · 2015-04-23T04:42:18Z

Server capabilities -- #2953. A version-specific renderer or generator could pull the swagger and check for the presence of the field(s) needed.

smarterclayton · 2015-04-23T13:35:18Z

That seems overly complex for 3rd party clients (i.e., no one will do it for small ruby client, but they still have the problem that their script breaks when it's run against an older server). Is a simple schema version per object type (possibly returned with the request) or per version feasible? This could be worked into conversion (everytime we regenerate the conversions, a number has to be incremented).

----- Original Message -----

Server capabilities -- #2953. A version-specific renderer or generator could
pull the swagger and check for the presence of the field(s) needed.

Reply to this email directly or view it on GitHub:
#7111 (comment)

smarterclayton · 2015-04-27T15:11:50Z

Some other points to argue about:

I think we should distinguish runtime.Object from runtime.VersionedObject
- I think we should add methods for getting Kind from a runtime.Object and methods for getting Kind and Version from runtime.VersionedObject
- We have cases today where a Kind is exposed as multiple names - however, for unversioned internal objects there is the primary name in the scheme (for nodes, "Node" internally) and for versioned objects there is a primary name (for nodes, "Minion" in v1beta1/2 and "Node" in v1beta3), so I do not believe that we need complex logic to represent that - every concrete object has a concrete kind and version.
A runtime.Object is really an object intended for opaque serialization and conversion, not a generic object interface we should be passing around to code outside of apiserver.
- Code outside of serialization and conversion should be using interface{} or a more appropriate specific interface, not runtime.Object.
We know we will have use cases where two concrete internal objects can be passed to the same code. Rather than forcing them to be the same object, we cast them to an interface (via wrappers) and make the code agnostic.
- I.e. type NodeInterface struct { *api.Node }; func (n *NodeInterface) Name() { return n.Node.Name }

smarterclayton · 2015-04-27T16:11:43Z

Also, we should consider adding the API prefix to the API version (or allowing it be optional)

bgrant0607 · 2015-04-28T18:48:06Z

Related: #4855

bgrant0607 · 2015-04-28T18:48:37Z

API prefix: Or make that yet another field? I agree that we need it in some form.

smarterclayton · 2015-04-28T19:22:07Z

I wasn't sure - technically "api/v1beta1", "pods/v1beta1", "kubelet/v1beta1" are all unique versions (nothing says a version can't have a qualifier in it). Or we can do "kubelet-v1beta1" which is easier on the eyes - the slash just seemed somewhat nice because we could tie that to prefix and say that by convention APIs should have a version prefix that matches their path prefix so that you can guarantee you could host multiple API groups on the same server.

----- Original Message -----

API prefix: Or make that yet another field? I agree that we need it in some
form.

Reply to this email directly or view it on GitHub:
#7111 (comment)

bgrant0607 · 2015-04-28T22:34:03Z

I'm fine with slash.

smarterclayton · 2015-04-29T14:57:31Z

Of course, as soon as you say that, I start wondering whether it's the Kind that should be prefixed - "k8s.io/Pod". That way probably lies madness.

----- Original Message -----

I'm fine with slash.

Reply to this email directly or view it on GitHub:
#7111 (comment)

bgrant0607 · 2015-04-29T15:12:24Z

The kind is specific to a particular version at a particular prefix -- version has a collection of kinds and prefix has a collection of versions. Really it's prefix/version/kind that is the identifier of the schema, so it either makes sense for prefix to be a separate field, like version, or as a prefix to version. We could add an even broader collection to prefix, like k8s.io/apiprefix/version/kind, but I wouldn't go there yet.

bgrant0607 · 2015-04-29T15:13:10Z

A little more abstract: domain/apigroup/version/kind

smarterclayton · 2015-04-29T16:59:14Z

That makes sense to me.

----- Original Message -----

A little more abstract: domain/apigroup/version/kind

Reply to this email directly or view it on GitHub:
#7111 (comment)

bgrant0607 · 2015-05-15T00:18:01Z

Related issue: Provider/domain field (e.g., kubernetes vs. openshift) for federated configuration -- https://github.com/projectatomic/atomicapp

smarterclayton · 2015-11-28T21:43:15Z

I'm also going to remove the use of empty string to mean "internal" and replace it with a constant "__internal", and do the same for "__unversioned" (create a special category in scheme which means - this object can be converted to any version as is)

smarterclayton · 2015-11-29T21:26:03Z

Design points to how codecs are retrieved:

Different use points of codecs today and the reqts. of each:

Rest server needs a codec for the endpoint, to handle the runtime.Object returned by the method. It needs to (per request) find one based on content type (in different directions, clients should be able to post json and get proto). It may need to return arbitrary objects (template process). It needs to handle runtime.Unknown cleanly.
It needs to enforce return type (orthogonal to serialization). Rest server needs more info about codec than other places. It needs to a parameter codec.

Kubectl generic handler needs each RestMapping to have a codec that can talk to the server (json for now). It does not need to do conversion, but does need to extract metadata.

Kubectl resource builder needs the catch all decoder that can read anything the client supports (json and proto) and may have additional info about the object such as filename or content type, and the ability to get a codec to write files to output.

Code internally (Kubelet) that reads files from disk needs the same thing resource builder needs (catch all decoder).

Client libraries should hide details of protocol and version negotiation - some clients (templates, lists) may send/receive arbitrary objects the codec is not aware of. Client libraries need a parameter codec which is also hidden.

Misc notes:

Versioned client libraries may need to deal with opaque object types in memory (v1.List) - currently our versioned objects do not allow those types to be worked with. We may need a way to work with runtime.Object in an external version but still guarantee the object gets serialized prior to putting over the wire (which requires a serializer that knows the destination version). Will punt on this for now until we get closer to versioned client. We could simply have an Object in runtime.RawExtension and require the caller to do the heavy lifting of serializing it while they work on it.

Serializers own the responsibility for setting type meta because serialization (how types are represented on the wire) may alter encapsulation of types: ie protobufs are most efficiently wrapped in a type envelope, while json can be efficiently extracted.

Some clients may need to deeply control / wrap serializers - it should be possible for clients to override the native serialization (similar to overriding negotiation).

deads2k · 2015-11-30T13:27:10Z

@lavalamp @deads2k since I'm going to tear up the guts of conversion with this, please let me know if you have other refactors in progress.

I'm currently trying to get the Codec and Scheme type correct with GroupVersion and GroupVersionKind. (See #17216 for a full list)

If you could keep those types in mind as you do your work, I think we'll end up fine.

I'm also going to remove the use of empty string to mean "internal"

Empty string as the version or empty string as got the group and version?

smarterclayton · 2015-11-30T14:55:00Z

Empty string for version. Group would be required for everything except
the original, non-group having API. See #17922 for an example of the new
signatures (uses GVK for most of them)

On Mon, Nov 30, 2015 at 8:27 AM, David Eads notifications@github.com
wrote:

@lavalamp https://github.com/lavalamp @deads2k
https://github.com/deads2k since I'm going to tear up the guts of
conversion with this, please let me know if you have other refactors in
progress.

I'm currently trying to get the Codec and Scheme type correct with
GroupVersion and GroupVersionKind. (See #17216
#17216 for a full list)

If you could keep those types in mind as you do your work, I think we'll
end up fine.

I'm also going to remove the use of empty string to mean "internal"

Empty string as the version or empty string as got the group and version?

—
Reply to this email directly or view it on GitHub
#7111 (comment)
.

lavalamp · 2015-11-30T20:45:44Z

@kubernetes/goog-csi @krousey @caesarxuchao @nikhiljindal

lavalamp · 2015-11-30T20:53:56Z

Also @wojtek-t

Codec needs to additionally be able to go to/from url.Values for query parameter versioning, as discussed elsewhere.

smarterclayton · 2015-11-30T21:46:15Z

I split that into ParameterCodec - I think wire serialization and URL
serialization as sufficiently different to necessitate two use interfaces.
The code that needs Codec and the code that needs ParameterCoder is not 1=1

On Mon, Nov 30, 2015 at 3:54 PM, Daniel Smith notifications@github.com
wrote:

Also @wojtek-t https://github.com/wojtek-t

Codec needs to additionally be able to go to/from url.Values for query
parameter versioning, as discussed elsewhere.

—
Reply to this email directly or view it on GitHub
#7111 (comment)
.

wojtek-t · 2015-12-01T07:31:14Z

I split that into ParameterCodec - I think wire serialization and URL
serialization as sufficiently different to necessitate two use interfaces.
The code that needs Codec and the code that needs ParameterCoder is not 1=1

I agree.

smarterclayton · 2015-12-28T17:48:45Z

After #17922 lands, the next change will be to remove conversion/scheme, and fold that function into runtime/scheme. A scheme will use a Convertor, a DeepCopier, and a set of maps for its type registry to handle anything that deals with Kind, Group, and Version and how Go structs are mapped between them.

lavalamp · 2016-02-10T21:26:05Z

Much of this has been done. @smarterclayton is there anything further you want to do for 1.2 here?

smarterclayton · 2016-02-11T04:34:26Z

Let me write up the new list and close this. I think we did the first cleanup, and later changes will be to enable external API groups, performance, and extracting the core packages.

smarterclayton · 2016-02-12T18:23:56Z

I had a revelation right now. We're trying to leverage native objects too much. We want runtime.Object to be a wrapper, not a native type. We want to do things like:

get a chunk of bytes from the wire
lazily decode it to a version
lazily apply defaults
lazily convert it to an object
get at specific object metadata
lazily convert it to an output version
occasionally, we need to go back and get at the original data

basically, we want runtime.Object to stop being an interface, and start being a smart wrapper runtime.SmartObject. That object would maybe be something like:

type smartObject struct {
  data []byte
  dataKind unversioned.GroupVersionKind
  directDecoded interface{}
  hasBeenDefaulted bool
  activeObject interface{}
  activeKind unversioned.GroupVersionKind
}

type SmartObject interface {
  RawGroupVersion() unversioned.GroupVersionKind
  Into(interface{}) error // attempts the correct decode into
  As(interface{}) (interface{}, error) // does a decode and cast to the specific type
  Raw() []byte
  DefaultedObject() interface{} // returns the object with defaults applied
  Copy() SmartObject // gives us a new handle that won't be changed
  Validate() field.Errors
}

Basically all of our structs would then simply go back to being dumb structs. Codec would be responsible for setting up smart objects. This also helps us achieve our performance goals of avoiding internal types, avoiding conversions, avoiding casts, etc.

smarterclayton · 2016-02-12T18:24:29Z

Tim brought this up a while ago so it's not a unique idea, but I think that's what I would close this issue with - we've reached the limits of what our current approach can do, and we need to take the next step.

deads2k · 2016-02-12T18:29:16Z

I think that would allow us do more cleanly nest and handle arbitrary SmartObjects using something like RawExtension that we have today. To do that, I might want a kind of "default hubbed type" that I could perform type assertions on though. We do that for handling arbitrary nested objects today and it's been pretty clean.

lavalamp · 2016-02-19T01:09:40Z

That's a very interesting idea. I'll think about that some more.

smarterclayton · 2017-06-01T17:20:02Z

I still want to do smart objects. But API server perf is not gating right now, so on the back burner.

fejta-bot · 2017-12-25T21:59:36Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

bgrant0607 · 2018-01-22T21:19:53Z

/remove-lifecycle stale
/lifecycle frozen

MadhavJivrajani · 2021-06-29T16:04:57Z

/remove-kind design
/kind feature

kind/design will soon be removed from k/k in favor of kind/feature. Relevant discussion can be found here: kubernetes/community#5641

smarterclayton added the kind/design Categorizes issue or PR as related to design. label Apr 21, 2015

This was referenced Apr 21, 2015

Our API code is hard to understand #2306

Closed

Proposal: change how api.List is handled internally #7070

Closed

bgrant0607 added area/api Indicates an issue on api area. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. priority/design labels Apr 21, 2015

smarterclayton mentioned this issue Apr 24, 2015

OpenShift master should expose a Kube compatible readiness check openshift/origin#1902

Closed

bgrant0607 mentioned this issue May 4, 2015

Versioned API objects should be in a self-contained package #7584

Closed

bgrant0607 mentioned this issue May 13, 2015

"v2" API (API/client redesign umbrella issue) #8190

Closed

smarterclayton mentioned this issue Nov 29, 2015

Split codec from scheme #17922

Merged

smarterclayton self-assigned this Dec 28, 2015

lavalamp modified the milestones: next-candidate, v1.2-candidate Feb 19, 2016

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2017

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 22, 2018

liggitt added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 7, 2019

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/design Categorizes issue or PR as related to design. labels Jun 29, 2021

Proposal: Long term evolution of runtime.Scheme / conversion.Scheme #7111

Proposal: Long term evolution of runtime.Scheme / conversion.Scheme #7111

Comments

smarterclayton commented Apr 21, 2015

Known problems:

Types of problems we want to solve:

Consumers:

smarterclayton commented Apr 21, 2015

davidopp commented Apr 21, 2015

wojtek-t commented Apr 22, 2015

lavalamp commented Apr 22, 2015

smarterclayton commented Apr 22, 2015

bgrant0607 commented Apr 23, 2015

smarterclayton commented Apr 23, 2015

bgrant0607 commented Apr 23, 2015

smarterclayton commented Apr 23, 2015

smarterclayton commented Apr 27, 2015

smarterclayton commented Apr 27, 2015

bgrant0607 commented Apr 28, 2015

bgrant0607 commented Apr 28, 2015

smarterclayton commented Apr 28, 2015

bgrant0607 commented Apr 28, 2015

smarterclayton commented Apr 29, 2015

bgrant0607 commented Apr 29, 2015

bgrant0607 commented Apr 29, 2015

smarterclayton commented Apr 29, 2015

bgrant0607 commented May 15, 2015

smarterclayton commented Nov 28, 2015

smarterclayton commented Nov 29, 2015

deads2k commented Nov 30, 2015

smarterclayton commented Nov 30, 2015

lavalamp commented Nov 30, 2015

lavalamp commented Nov 30, 2015

smarterclayton commented Nov 30, 2015

wojtek-t commented Dec 1, 2015

smarterclayton commented Dec 28, 2015

lavalamp commented Feb 10, 2016

smarterclayton commented Feb 11, 2016 via email

smarterclayton commented Feb 12, 2016

smarterclayton commented Feb 12, 2016

deads2k commented Feb 12, 2016

lavalamp commented Feb 19, 2016

smarterclayton commented Jun 1, 2017

fejta-bot commented Dec 25, 2017

bgrant0607 commented Jan 22, 2018

MadhavJivrajani commented Jun 29, 2021