Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Long term evolution of runtime.Scheme / conversion.Scheme #7111

Open
smarterclayton opened this issue Apr 21, 2015 · 41 comments
Open
Assignees
Labels
area/api Indicates an issue on api area. area/apiserver kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@smarterclayton
Copy link
Contributor

Continued from #2306

A runtime.Scheme combines a few concepts:

  • A registry of APIVersion -> Kind string -> Go object
  • A library of conversion routines which includes APIVersion -> Go object as well as Go object -> Go object
  • An encoding mechanism based around JSON

In order to better support multiple API patterns grouped together (Kubernetes user API, Kubelet API, OpenShift API, future extension API) these assumptions will start to break down.

Known problems:

A 3rd party API may have a "v1". That API may define an object "List", that they want to convert to a Kubernetes "v1" "List". In our current setup, it would be impossible for those two API objects to be compiled into the same runtime.Scheme (because they both want to reserve "v1/List" to different objects). It would also be impossible to convert from the third party API "v1" "List" to a Kubernetes "v1" "List" because they exist in different conversion schemes.

A 3rd party API may want to reuse the Kubernetes "Status" object in their own API schema for their version "v1beta1". It would be impossible for them to use the "v1" Kubernetes status object without having a separate runtime.Scheme, but their client library may want to convert both Kubernetes v1 objects to their internal type.

Finally, we may want to introduce new serialization types for objects - Gob, Protobuf, Msgpack, JSON+snappy, for various use cases (more efficient storage). The coupling of JSON to runtime.Scheme complicates those conversions or makes them less efficient.

Types of problems we want to solve:

  1. It should be possible to take arbitrary serialized data and convert it to a versioned Go struct - call this "marshalling/unmarshalling". The interface for this is a Codec. There may be many codecs.
    1. Where possible, we want to reuse the underlying infrastructure for marshaling and unmarshalling that exists in go - "encoding/json" / "encoding/gob" / "protobufs" efficiently.
    2. It's likely that there exists one canonical "marshaller" for most objects, but there may be alternate mechanisms. For example, JSON is canonical for Kube, but we may occasionally want to marshal an annotated YAML object that has description comments for each field interleaved.
    3. There are places where we want to convert map[string]interface{} (generic JSON object representation) or a map[string][]string (url.Values) into a versioned Go struct, which is dependent on a type of marshaling (i.e. following JSON rules, or Query parameter rules).
  2. It should be possible to convert a versioned Go struct into a different versioned Go struct (typically by going to an intermediate format) on the server. This is "object versioning".
    1. "Object versioning" requires the registration of an object Kind into an APIVersion for a specific API group. This was the original intent of runtime.Scheme.
  3. It should be possible to convert arbitrary Go structs to different Go structs. This is "conversion".
    1. Object versioning is a specific subset of conversion
    2. We should try to make conversion less dependent on scheme?
  4. ??

Consumers:

  1. The APIServer needs to take incoming objects with a content-type (marshaling format), determine the Kind (whether it is accepted by the endpoint), verify the provided APIVersion is allowed (or default it), and then unmarshal it to a versioned Go struct. We then want to use "object versioning" to return an internal representation. It also needs to take internal representations, convert them to an external object version, and then marshal that object version to a given content-type.
  2. The Go client needs to read responses from the server (unmarshal), determine the Kind of the response, verify the APIVersion is recognized, and then convert it to an internal representation (object versioning) that the client can use to present information to a user.
  3. The Go command line tool needs to take a representation returned by the server in one version, convert it to another version, and process that other version (talk to API v2, convert to API v1, output to template on CLI).
  4. The Go command line tool needs to read objects on disk in known marshaller formats, be able to transform those objects in specific ways (set namespace for v1beta1 and v1beta3) or perform Swagger specific validation. The Go command line tool must support taking objects that it was not compiled with and be able to transmit them to a compatible API server that is not Kubernetes once we want to support true extensions.
  5. ??

Still a WIP, trying to gather thoughts.

@smarterclayton smarterclayton added the kind/design Categorizes issue or PR as related to design. label Apr 21, 2015
@smarterclayton
Copy link
Contributor Author

@lavalamp @bgrant0607 at minimum.

@bgrant0607 bgrant0607 added area/api Indicates an issue on api area. area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. priority/design labels Apr 21, 2015
@davidopp
Copy link
Member

/sub

1 similar comment
@wojtek-t
Copy link
Member

/sub

@lavalamp
Copy link
Member

I agree with the problem statement.

One quibble, I think we want the client to by default unmarshall into a versioned type, otherwise it's difficult for non-kubernetes components to write code that stays working for more than a day. Apiserver/scheduler etc still need to work with our unversioned objects. This is already possible today--since we support DecodeInto--but we don't take advantage of it.

@smarterclayton
Copy link
Contributor Author

Agreed - I will clarify.

----- Original Message -----

I agree with the problem statement.

One quibble, I think we want the client to by default unmarshall into a
versioned type, otherwise it's difficult for non-kubernetes components to
write code that stays working for more than a day. Apiserver/scheduler etc
still need to work with our unversioned objects. This is already possible
today--since we support DecodeInto--but we don't take advantage of it.


Reply to this email directly or view it on GitHub:
#7111 (comment)

@bgrant0607
Copy link
Member

I agree we need to support a mix of multiple api prefixes and versions.

We should consider moving generic objects like List and Status to their own api prefix and version.

I agree that we should decouple encoding and schema.

Currently our only conversions between versioned structs must transition through the internal representation. I don't want to support O(n^2) conversions, so that seems like a good property to keep.

In general, clients should use a versioned rep and should not round-trip: #4874, #3955. I disagree that get, describe, etc. need to use the internal rep. We should assume the v1 API is stable and just use that until a new stable version is available. Cross-version conversion tools are a special case, not an every-day use case.

@smarterclayton
Copy link
Contributor Author

On Apr 22, 2015, at 11:38 PM, Brian Grant notifications@github.com wrote:

I agree we need to support a mix of multiple api prefixes and versions.

We should consider moving generic objects like List and Status to their own api prefix and version.

I agree that we should decouple encoding and schema.

Currently our only conversions between versioned structs must transition through the internal representation. I don't want to support O(n^2) conversions, so that seems like a good property to keep.

In general, clients should use a versioned rep and should not round-trip: #4874, #3955. I disagree that get, describe, etc. need to use the internal rep. We should assume the v1 API is stable and just use that until a new stable version is available. Cross-version conversion tools are a special case, not an every-day use case.

So to tie that in here - we lack the ability today to detect the object schema version of an object (what fields are available on this particular v1 object). Assuming we wish to expose new properties into v1 over its lifetime, we'll need to take that into account as well, so that newer clients (1.2, 1.3) when talking to servers on 1.0 through v1 are not exposing themselves to flawed assumptions (client can set value, but server ignores it).

Reply to this email directly or view it on GitHub.

@bgrant0607
Copy link
Member

Server capabilities -- #2953. A version-specific renderer or generator could pull the swagger and check for the presence of the field(s) needed.

@smarterclayton
Copy link
Contributor Author

That seems overly complex for 3rd party clients (i.e., no one will do it for small ruby client, but they still have the problem that their script breaks when it's run against an older server). Is a simple schema version per object type (possibly returned with the request) or per version feasible? This could be worked into conversion (everytime we regenerate the conversions, a number has to be incremented).

----- Original Message -----

Server capabilities -- #2953. A version-specific renderer or generator could
pull the swagger and check for the presence of the field(s) needed.


Reply to this email directly or view it on GitHub:
#7111 (comment)

@smarterclayton
Copy link
Contributor Author

Some other points to argue about:

  • I think we should distinguish runtime.Object from runtime.VersionedObject
    • I think we should add methods for getting Kind from a runtime.Object and methods for getting Kind and Version from runtime.VersionedObject
    • We have cases today where a Kind is exposed as multiple names - however, for unversioned internal objects there is the primary name in the scheme (for nodes, "Node" internally) and for versioned objects there is a primary name (for nodes, "Minion" in v1beta1/2 and "Node" in v1beta3), so I do not believe that we need complex logic to represent that - every concrete object has a concrete kind and version.
  • A runtime.Object is really an object intended for opaque serialization and conversion, not a generic object interface we should be passing around to code outside of apiserver.
    • Code outside of serialization and conversion should be using interface{} or a more appropriate specific interface, not runtime.Object.
  • We know we will have use cases where two concrete internal objects can be passed to the same code. Rather than forcing them to be the same object, we cast them to an interface (via wrappers) and make the code agnostic.
    • I.e. type NodeInterface struct { *api.Node }; func (n *NodeInterface) Name() { return n.Node.Name }

@smarterclayton
Copy link
Contributor Author

Also, we should consider adding the API prefix to the API version (or allowing it be optional)

@bgrant0607
Copy link
Member

Related: #4855

@bgrant0607
Copy link
Member

API prefix: Or make that yet another field? I agree that we need it in some form.

@smarterclayton
Copy link
Contributor Author

I wasn't sure - technically "api/v1beta1", "pods/v1beta1", "kubelet/v1beta1" are all unique versions (nothing says a version can't have a qualifier in it). Or we can do "kubelet-v1beta1" which is easier on the eyes - the slash just seemed somewhat nice because we could tie that to prefix and say that by convention APIs should have a version prefix that matches their path prefix so that you can guarantee you could host multiple API groups on the same server.

----- Original Message -----

API prefix: Or make that yet another field? I agree that we need it in some
form.


Reply to this email directly or view it on GitHub:
#7111 (comment)

@bgrant0607
Copy link
Member

I'm fine with slash.

@smarterclayton
Copy link
Contributor Author

Of course, as soon as you say that, I start wondering whether it's the Kind that should be prefixed - "k8s.io/Pod". That way probably lies madness.

----- Original Message -----

I'm fine with slash.


Reply to this email directly or view it on GitHub:
#7111 (comment)

@bgrant0607
Copy link
Member

The kind is specific to a particular version at a particular prefix -- version has a collection of kinds and prefix has a collection of versions. Really it's prefix/version/kind that is the identifier of the schema, so it either makes sense for prefix to be a separate field, like version, or as a prefix to version. We could add an even broader collection to prefix, like k8s.io/apiprefix/version/kind, but I wouldn't go there yet.

@bgrant0607
Copy link
Member

A little more abstract: domain/apigroup/version/kind

@smarterclayton
Copy link
Contributor Author

That makes sense to me.

----- Original Message -----

A little more abstract: domain/apigroup/version/kind


Reply to this email directly or view it on GitHub:
#7111 (comment)

@bgrant0607
Copy link
Member

Related issue: Provider/domain field (e.g., kubernetes vs. openshift) for federated configuration -- https://github.com/projectatomic/atomicapp

@smarterclayton
Copy link
Contributor Author

I'm also going to remove the use of empty string to mean "internal" and replace it with a constant "__internal", and do the same for "__unversioned" (create a special category in scheme which means - this object can be converted to any version as is)

@smarterclayton
Copy link
Contributor Author

Design points to how codecs are retrieved:

Different use points of codecs today and the reqts. of each:

Rest server needs a codec for the endpoint, to handle the runtime.Object returned by the method. It needs to (per request) find one based on content type (in different directions, clients should be able to post json and get proto). It may need to return arbitrary objects (template process). It needs to handle runtime.Unknown cleanly.
It needs to enforce return type (orthogonal to serialization). Rest server needs more info about codec than other places. It needs to a parameter codec.

Kubectl generic handler needs each RestMapping to have a codec that can talk to the server (json for now). It does not need to do conversion, but does need to extract metadata.

Kubectl resource builder needs the catch all decoder that can read anything the client supports (json and proto) and may have additional info about the object such as filename or content type, and the ability to get a codec to write files to output.

Code internally (Kubelet) that reads files from disk needs the same thing resource builder needs (catch all decoder).

Client libraries should hide details of protocol and version negotiation - some clients (templates, lists) may send/receive arbitrary objects the codec is not aware of. Client libraries need a parameter codec which is also hidden.

Misc notes:

Versioned client libraries may need to deal with opaque object types in memory (v1.List) - currently our versioned objects do not allow those types to be worked with. We may need a way to work with runtime.Object in an external version but still guarantee the object gets serialized prior to putting over the wire (which requires a serializer that knows the destination version). Will punt on this for now until we get closer to versioned client. We could simply have an Object in runtime.RawExtension and require the caller to do the heavy lifting of serializing it while they work on it.

Serializers own the responsibility for setting type meta because serialization (how types are represented on the wire) may alter encapsulation of types: ie protobufs are most efficiently wrapped in a type envelope, while json can be efficiently extracted.

Some clients may need to deeply control / wrap serializers - it should be possible for clients to override the native serialization (similar to overriding negotiation).

@deads2k
Copy link
Contributor

deads2k commented Nov 30, 2015

@lavalamp @deads2k since I'm going to tear up the guts of conversion with this, please let me know if you have other refactors in progress.

I'm currently trying to get the Codec and Scheme type correct with GroupVersion and GroupVersionKind. (See #17216 for a full list)

If you could keep those types in mind as you do your work, I think we'll end up fine.

I'm also going to remove the use of empty string to mean "internal"

Empty string as the version or empty string as got the group and version?

@smarterclayton
Copy link
Contributor Author

Empty string for version. Group would be required for everything except
the original, non-group having API. See #17922 for an example of the new
signatures (uses GVK for most of them)

On Mon, Nov 30, 2015 at 8:27 AM, David Eads notifications@github.com
wrote:

@lavalamp https://github.com/lavalamp @deads2k
https://github.com/deads2k since I'm going to tear up the guts of
conversion with this, please let me know if you have other refactors in
progress.

I'm currently trying to get the Codec and Scheme type correct with
GroupVersion and GroupVersionKind. (See #17216
#17216 for a full list)

If you could keep those types in mind as you do your work, I think we'll
end up fine.

I'm also going to remove the use of empty string to mean "internal"

Empty string as the version or empty string as got the group and version?


Reply to this email directly or view it on GitHub
#7111 (comment)
.

@lavalamp
Copy link
Member

@kubernetes/goog-csi @krousey @caesarxuchao @nikhiljindal

@lavalamp
Copy link
Member

Also @wojtek-t

Codec needs to additionally be able to go to/from url.Values for query parameter versioning, as discussed elsewhere.

@smarterclayton
Copy link
Contributor Author

I split that into ParameterCodec - I think wire serialization and URL
serialization as sufficiently different to necessitate two use interfaces.
The code that needs Codec and the code that needs ParameterCoder is not 1=1

On Mon, Nov 30, 2015 at 3:54 PM, Daniel Smith notifications@github.com
wrote:

Also @wojtek-t https://github.com/wojtek-t

Codec needs to additionally be able to go to/from url.Values for query
parameter versioning, as discussed elsewhere.


Reply to this email directly or view it on GitHub
#7111 (comment)
.

@wojtek-t
Copy link
Member

wojtek-t commented Dec 1, 2015

I split that into ParameterCodec - I think wire serialization and URL
serialization as sufficiently different to necessitate two use interfaces.
The code that needs Codec and the code that needs ParameterCoder is not 1=1

I agree.

@smarterclayton
Copy link
Contributor Author

After #17922 lands, the next change will be to remove conversion/scheme, and fold that function into runtime/scheme. A scheme will use a Convertor, a DeepCopier, and a set of maps for its type registry to handle anything that deals with Kind, Group, and Version and how Go structs are mapped between them.

@smarterclayton smarterclayton self-assigned this Dec 28, 2015
@lavalamp
Copy link
Member

Much of this has been done. @smarterclayton is there anything further you want to do for 1.2 here?

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Feb 11, 2016 via email

@smarterclayton
Copy link
Contributor Author

I had a revelation right now. We're trying to leverage native objects too much. We want runtime.Object to be a wrapper, not a native type. We want to do things like:

  • get a chunk of bytes from the wire
  • lazily decode it to a version
  • lazily apply defaults
  • lazily convert it to an object
  • get at specific object metadata
  • lazily convert it to an output version
  • occasionally, we need to go back and get at the original data

basically, we want runtime.Object to stop being an interface, and start being a smart wrapper runtime.SmartObject. That object would maybe be something like:

type smartObject struct {
  data []byte
  dataKind unversioned.GroupVersionKind
  directDecoded interface{}
  hasBeenDefaulted bool
  activeObject interface{}
  activeKind unversioned.GroupVersionKind
}

type SmartObject interface {
  RawGroupVersion() unversioned.GroupVersionKind
  Into(interface{}) error // attempts the correct decode into
  As(interface{}) (interface{}, error) // does a decode and cast to the specific type
  Raw() []byte
  DefaultedObject() interface{} // returns the object with defaults applied
  Copy() SmartObject // gives us a new handle that won't be changed
  Validate() field.Errors
}

Basically all of our structs would then simply go back to being dumb structs. Codec would be responsible for setting up smart objects. This also helps us achieve our performance goals of avoiding internal types, avoiding conversions, avoiding casts, etc.

@smarterclayton
Copy link
Contributor Author

Tim brought this up a while ago so it's not a unique idea, but I think that's what I would close this issue with - we've reached the limits of what our current approach can do, and we need to take the next step.

@deads2k
Copy link
Contributor

deads2k commented Feb 12, 2016

I think that would allow us do more cleanly nest and handle arbitrary SmartObjects using something like RawExtension that we have today. To do that, I might want a kind of "default hubbed type" that I could perform type assertions on though. We do that for handling arbitrary nested objects today and it's been pretty clean.

@lavalamp
Copy link
Member

That's a very interesting idea. I'll think about that some more.

@smarterclayton
Copy link
Contributor Author

I still want to do smart objects. But API server perf is not gating right now, so on the back burner.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2017
@bgrant0607
Copy link
Member

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 22, 2018
@liggitt liggitt added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Oct 7, 2019
@MadhavJivrajani
Copy link
Contributor

/remove-kind design
/kind feature

kind/design will soon be removed from k/k in favor of kind/feature. Relevant discussion can be found here: kubernetes/community#5641

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/design Categorizes issue or PR as related to design. labels Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/apiserver kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests