Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apiserver: re-architect storage #1451

Closed
lavalamp opened this issue Sep 25, 2014 · 26 comments
Closed

apiserver: re-architect storage #1451

lavalamp opened this issue Sep 25, 2014 · 26 comments
Labels
area/apiserver area/extensibility kind/design Categorizes issue or PR as related to design. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@lavalamp
Copy link
Member

Currently, each RESTStorage implementation has its own storage object, which eventually ends up being the same etcd instance for all components.

We should consider instead having apiserver pass a storage destination into each RESTStorage method. This would make it much more clear how to write plugins, both those that you compile into the master binary, and those that are external binaries.

This would also let us eventually do fancy things like sharding (or isolating) keys across different etcd clusters without having to make sweeping changes to all the storage objects.

(See #1355, which sparked this)

@smarterclayton
Copy link
Contributor

Part of the context maybe. Partition/shard seems like it varies by the context, vs being distinct.

@brendandburns brendandburns added kind/design Categorizes issue or PR as related to design. and removed kind/design Categorizes issue or PR as related to design. labels Sep 29, 2014
@bgrant0607 bgrant0607 added area/api Indicates an issue on api area. area/apiserver priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed area/api Indicates an issue on api area. labels Dec 4, 2014
@lavalamp
Copy link
Member Author

I've had some further thoughts about this.

Work items:

  • Put all "business logic" in one layer. Right now, pods and some other objects have logic both in their RESTStorage object and in their registry object. I would like all that logic to go in the RESTStorage object.
  • tools.EtcdHelper, generic.Registry, and all of the other registries should be combined. What I mean by that:
    • Define a single Storage interface. It should provide CRUD+watch, take keys & api.Objects. generic.Registry is much of the way there, but it could be improved.
    • Provide an implementation of the storage interface that uses etcd. tools.EtcdHelper is most of the way there.
    • Bonus points: provide a nice fake for tests.
  • TBD: Pass this Storage interface to RESTStorage objects either at construction time, or in the ctx parameter.

The motivations for this restructuring:

  • We have an extra layer right now--registries, tools.EtcdHelper, and RESTStorage objects should be two layers, not three.
  • We want to make it easy for the "business logic" to run in a separate process, potentially discovered at runtime (component plugins). This means that we have to have a well-defined interface by which that object can interact with our storage backend. We do not want component plugins to talk directly to etcd-- this makes schema changes or sharding by key/object type incredibly difficult to coordinate.

Ideally, when this is done, we can easily serve a REST endpoint that implements our storage interface for component plugins.

@smarterclayton
Copy link
Contributor

I've started fixing this with #3789 - I'm moving pure API behavior (the set of validation and creation prior to persisting an object into a new package pkg/api/rest, and introducing a common test layer. Only changing create right now, but there's huge holes in our update logic, so that's the next step. Once logic is normalized, we can fold rest into registry or vice versa, and use common update. A few operations that need to be in registry - like atomic merge and batch, can be factored so that the logic to mutate the object is separate from the storage logic.

@bgrant0607
Copy link
Member

IMO, no objects should have "business logic" in the apiserver, as per #2726.

/cc @davidopp

@bgrant0607
Copy link
Member

Well, ok, we need to do something once we add custom verbs, such as resize, clone, and stop.

@smarterclayton
Copy link
Contributor

By business logic I mean validation, and decision about what fields can be written during an update. If we have authz at a field level and fix authz to be able to know about the diff, we don't need the latter as much. Is that the biz logic you're concerned with?

@bgrant0607
Copy link
Member

No, I'm primarily concerned with on-the-fly computation of object status in the apiserver, as opposed to in the controller or other responsible component. ResourceLocation doesn't make me happy, either.

Pluggable defaulting and validation makes sense.

@smarterclayton
Copy link
Contributor

K, same page

@thockin
Copy link
Member

thockin commented Jan 29, 2015

To throw a bit of newly-understood information out:

Any API object must not ever include an unexported field, unless said
object also implements the semantic deep equal and conversion (where both
types are the same!). What we have done is created copy and compare
operators that exist outside of the objects they apply to.

If the API decoded into truly dumb structs - never any private values - and
then those got converted to "real classes" for actual use, we could make
that rule more obvious. As it is, I spent the last hour figuring out how
Quantity actually works, and the answer is "by sprinkling deep knowledge of
its structure in a handful of other places". Not so much of an
abstraction...

On Tue, Jan 27, 2015 at 8:07 PM, Clayton Coleman notifications@github.com
wrote:

K, same page

Reply to this email directly or view it on GitHub
#1451 (comment)
.

@bgrant0607
Copy link
Member

As we discussed, the important thing IMO is to group the code that implements the objects -- it doesn't have to be in classes. I really, really, really want to keep the "dumb structs" in a form that can be used by clients and third-party components without complex logic, with fully explicit desired and current state for common use cases, so that it doesn't require complex objects or client libraries to interpret, in general. However, for the apiserver, it would be desirable to group custom code by object for stripping out readonly fields, authz, defaulting, conversion, validation, custom verbs (which should just imperatively update the desired state and not directly trigger other actions), fuzzing, test-object generation, etc. We may also want some of that in kubectl, where it's useful to be able to decode and encode multiple API versions, but we don't want to make it impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by the relevant object controller.

@smarterclayton
Copy link
Contributor

Agree. The further we go here, the more the core objects remain dumb structs.

----- Original Message -----

As we discussed, the important thing IMO is to group the code that implements
the objects -- it doesn't have to be in classes. I really, really, really
want to keep the "dumb structs" in a form that can be used by clients and
third-party components without complex logic, with fully explicit desired
and current state for common use cases, so that it doesn't require complex
objects or client libraries to interpret, in general. However, for the
apiserver, it would be desirable to group custom code by object for
stripping out readonly fields, authz, defaulting, conversion, validation,
custom verbs (which should just imperatively update the desired state and
not directly trigger other actions), fuzzing, test-object generation, etc.
We may also want some of that in kubectl, where it's useful to be able to
decode and encode multiple API versions, but we don't want to make it
impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by the
relevant object controller.


Reply to this email directly or view it on GitHub:
#1451 (comment)

@thockin
Copy link
Member

thockin commented Jan 30, 2015

Yeah, I get that the API being defined as dumb structs is super-clean. But
having the objects that we use all over the codebase be dumb structs is
painful.

On Thu, Jan 29, 2015 at 7:15 AM, Clayton Coleman notifications@github.com
wrote:

Agree. The further we go here, the more the core objects remain dumb
structs.

----- Original Message -----

As we discussed, the important thing IMO is to group the code that
implements
the objects -- it doesn't have to be in classes. I really, really, really
want to keep the "dumb structs" in a form that can be used by clients and
third-party components without complex logic, with fully explicit desired
and current state for common use cases, so that it doesn't require
complex
objects or client libraries to interpret, in general. However, for the
apiserver, it would be desirable to group custom code by object for
stripping out readonly fields, authz, defaulting, conversion, validation,
custom verbs (which should just imperatively update the desired state and
not directly trigger other actions), fuzzing, test-object generation,
etc.
We may also want some of that in kubectl, where it's useful to be able to
decode and encode multiple API versions, but we don't want to make it
impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by the
relevant object controller.


Reply to this email directly or view it on GitHub:

#1451 (comment)

Reply to this email directly or view it on GitHub
#1451 (comment)
.

@smarterclayton
Copy link
Contributor

It's interesting you feel that way. I know we have a couple of hot touch points, but the rest seem not that bad. I.e accessors for lists (we could convert them to maps on conversion) and getting access to object meta and TypeMeta are the two I think of most. What are the ones you rank most highly?

On Jan 29, 2015, at 8:29 PM, Tim Hockin notifications@github.com wrote:

Yeah, I get that the API being defined as dumb structs is super-clean. But
having the objects that we use all over the codebase be dumb structs is
painful.

On Thu, Jan 29, 2015 at 7:15 AM, Clayton Coleman notifications@github.com
wrote:

Agree. The further we go here, the more the core objects remain dumb
structs.

----- Original Message -----

As we discussed, the important thing IMO is to group the code that
implements
the objects -- it doesn't have to be in classes. I really, really, really
want to keep the "dumb structs" in a form that can be used by clients and
third-party components without complex logic, with fully explicit desired
and current state for common use cases, so that it doesn't require
complex
objects or client libraries to interpret, in general. However, for the
apiserver, it would be desirable to group custom code by object for
stripping out readonly fields, authz, defaulting, conversion, validation,
custom verbs (which should just imperatively update the desired state and
not directly trigger other actions), fuzzing, test-object generation,
etc.
We may also want some of that in kubectl, where it's useful to be able to
decode and encode multiple API versions, but we don't want to make it
impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by the
relevant object controller.


Reply to this email directly or view it on GitHub:

#1451 (comment)

Reply to this email directly or view it on GitHub
#1451 (comment)
.


Reply to this email directly or view it on GitHub.

@thockin
Copy link
Member

thockin commented Jan 30, 2015

Look at Yu-Ju's PR to move defaulting out of the validation code. A
million tests create the "dumb struct" form of Pods, Services, etc. Now
they all need to know what fields are required to pass validation, whereas
they used to get defaults as part of validation. This goes to why
constructors are a good thing - you can assert things like "the DNS policy
(a string) holds a valid value". I can't assert that with a dumb struct.

If we instead said:

  1. all versioned APIs are defined in terms of dumb structs and trivial types
  2. validation and defaulting is applied to versioned dumb structs
  3. code uses internal "smart structs", which can include non-trivial types
  4. the only way to get "smart" structs is to convert from a versioned
    struct or else use a constructor

I haven't thought this through fully, it's just something that has been
annoying me and boucing around my head.

On Thu, Jan 29, 2015 at 5:33 PM, Clayton Coleman notifications@github.com
wrote:

It's interesting you feel that way. I know we have a couple of hot touch
points, but the rest seem not that bad. I.e accessors for lists (we could
convert them to maps on conversion) and getting access to object meta and
TypeMeta are the two I think of most. What are the ones you rank most
highly?

On Jan 29, 2015, at 8:29 PM, Tim Hockin notifications@github.com
wrote:

Yeah, I get that the API being defined as dumb structs is super-clean.
But
having the objects that we use all over the codebase be dumb structs is
painful.

On Thu, Jan 29, 2015 at 7:15 AM, Clayton Coleman <
notifications@github.com>
wrote:

Agree. The further we go here, the more the core objects remain dumb
structs.

----- Original Message -----

As we discussed, the important thing IMO is to group the code that
implements
the objects -- it doesn't have to be in classes. I really, really,
really
want to keep the "dumb structs" in a form that can be used by
clients and
third-party components without complex logic, with fully explicit
desired
and current state for common use cases, so that it doesn't require
complex
objects or client libraries to interpret, in general. However, for
the
apiserver, it would be desirable to group custom code by object for
stripping out readonly fields, authz, defaulting, conversion,
validation,
custom verbs (which should just imperatively update the desired
state and
not directly trigger other actions), fuzzing, test-object
generation,
etc.
We may also want some of that in kubectl, where it's useful to be
able to
decode and encode multiple API versions, but we don't want to make
it
impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by
the
relevant object controller.


Reply to this email directly or view it on GitHub:

#1451 (comment)

Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/1451#issuecomment-72041056>

.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub
#1451 (comment)
.

@smarterclayton
Copy link
Contributor

On Jan 29, 2015, at 8:46 PM, Tim Hockin notifications@github.com wrote:

Look at Yu-Ju's PR to move defaulting out of the validation code. A
million tests create the "dumb struct" form of Pods, Services, etc. Now
they all need to know what fields are required to pass validation, whereas
they used to get defaults as part of validation. This goes to why
constructors are a good thing - you can assert things like "the DNS policy
(a string) holds a valid value". I can't assert that with a dumb struct.

We have a "constructor" that could do that in scheme.New. Or we could expose scheme.Default(obj) and change all the places to use it.

If we instead said:

  1. all versioned APIs are defined in terms of dumb structs and trivial types
  2. validation and defaulting is applied to versioned dumb structs

Doesn't this mean validation has to be duplicated across api versions?

  1. code uses internal "smart structs", which can include non-trivial types
  2. the only way to get "smart" structs is to convert from a versioned
    struct or else use a constructor

Doesn't that mean creating a non trivial object in the versioned object stays equally hard?

Agree we do massive refactors in test cases on every trivial change... However, that particular problem wouldn't go away with a smart internal object or a constructor (we exchange go struct init syntax for X lines of setting fields and attributes).

I think the biggest pain is we have no smart refactor tool that works on init structures, so no matter what, any non-trivial object has to be changed in a billion places.

I haven't thought this through fully, it's just something that has been
annoying me and boucing around my head.

On Thu, Jan 29, 2015 at 5:33 PM, Clayton Coleman notifications@github.com
wrote:

It's interesting you feel that way. I know we have a couple of hot touch
points, but the rest seem not that bad. I.e accessors for lists (we could
convert them to maps on conversion) and getting access to object meta and
TypeMeta are the two I think of most. What are the ones you rank most
highly?

On Jan 29, 2015, at 8:29 PM, Tim Hockin notifications@github.com
wrote:

Yeah, I get that the API being defined as dumb structs is super-clean.
But
having the objects that we use all over the codebase be dumb structs is
painful.

On Thu, Jan 29, 2015 at 7:15 AM, Clayton Coleman <
notifications@github.com>
wrote:

Agree. The further we go here, the more the core objects remain dumb
structs.

----- Original Message -----

As we discussed, the important thing IMO is to group the code that
implements
the objects -- it doesn't have to be in classes. I really, really,
really
want to keep the "dumb structs" in a form that can be used by
clients and
third-party components without complex logic, with fully explicit
desired
and current state for common use cases, so that it doesn't require
complex
objects or client libraries to interpret, in general. However, for
the
apiserver, it would be desirable to group custom code by object for
stripping out readonly fields, authz, defaulting, conversion,
validation,
custom verbs (which should just imperatively update the desired
state and
not directly trigger other actions), fuzzing, test-object
generation,
etc.
We may also want some of that in kubectl, where it's useful to be
able to
decode and encode multiple API versions, but we don't want to make
it
impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by
the
relevant object controller.


Reply to this email directly or view it on GitHub:

#1451 (comment)

Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/1451#issuecomment-72041056>

.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub
#1451 (comment)
.


Reply to this email directly or view it on GitHub.

@bgrant0607
Copy link
Member

Whatever "constructor" we use, it needs to basically take whole dumb API structs as input. There's no point in duplicating the API objects with some kind of builder or options object. That would be unmaintainable.

It is true that our code is more aspect-oriented than object-oriented currently. As I said before, I think the most important thing is to group the code by API object.

Effectively, our "constructor" is DecodeInto:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/apiserver/resthandler.go#L218

Decoding from versioned structs into the internal rep could leverage this.

@smarterclayton I don't think @thockin necessarily meant that we'd apply validation to the versioned structs, but that we'd validate the internal rep before returning it from the "constructor".

If our tests used the versioned structs, we wouldn't need to change them when changing the internal rep.

@bgrant0607
Copy link
Member

We should address #2457 when this is done.

@bgrant0607
Copy link
Member

@bgrant0607
Copy link
Member

Related: #2977

@thockin
Copy link
Member

thockin commented Mar 1, 2015

I actually did mean that validation would run against versioned objects.
Yes it would mean duplicating validation, but it would be correct
validation with correct error messages.

I don't have time to pursue this at the moment, so I'm just watching
everything else shake out, but it's still very much on my mind.

On Fri, Feb 27, 2015 at 11:43 PM, Brian Grant notifications@github.com
wrote:

Related: #2977
#2977

Reply to this email directly or view it on GitHub
#1451 (comment)
.

@davidopp
Copy link
Member

davidopp commented Mar 1, 2015

I just came across this issue. I am not as familiar with the code as all of you, but from my experience so far, I definitely agree with Tim; I think the code would be simpler/more maintainable (or at least easier to understand) if we had true Go objects corresponding to the API objects. I agree that better grouping the code that manipulates each type would be helpful but I'm not sure it would be as helpful as using objects. I don't think using objects precludes use of "dumb structs" by clients, as the objects/smart structs would just be internal to the API server.

@lavalamp
Copy link
Member Author

lavalamp commented Mar 2, 2015

I think we need to version our client libraries; having it work with the unversioned types makes this hard to get behind. Once the client is versioned, then I think a model where we run the versioned types through a constructor to produce the internal object might be a good idea, but I want to consider it some more.

@wojtek-t
Copy link
Member

/subscribe

@nikhiljindal nikhiljindal added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 27, 2015
@bgrant0607 bgrant0607 added this to the v1.2-candidate milestone Sep 12, 2015
@lavalamp
Copy link
Member Author

This issue seems to have drifted considerably. It is true now that we have per-resource storage locations. That's a good step. Actually getting alternative storage backends should be easier now, but still challenging. No more will be done on this for 1.2.

@0xmichalis
Copy link
Contributor

@lavalamp I guess this has progressed since last year. Should we also move it to https://github.com/kubernetes/apiserver if there is still work tbd?

@smarterclayton
Copy link
Contributor

I think this can be closed and we can open something new to track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apiserver area/extensibility kind/design Categorizes issue or PR as related to design. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

10 participants