apiserver: re-architect storage #1451

lavalamp · 2014-09-25T20:41:19Z

Currently, each RESTStorage implementation has its own storage object, which eventually ends up being the same etcd instance for all components.

We should consider instead having apiserver pass a storage destination into each RESTStorage method. This would make it much more clear how to write plugins, both those that you compile into the master binary, and those that are external binaries.

This would also let us eventually do fancy things like sharding (or isolating) keys across different etcd clusters without having to make sweeping changes to all the storage objects.

(See #1355, which sparked this)

smarterclayton · 2014-09-27T14:23:20Z

Part of the context maybe. Partition/shard seems like it varies by the context, vs being distinct.

lavalamp · 2015-01-16T00:07:23Z

I've had some further thoughts about this.

Work items:

Put all "business logic" in one layer. Right now, pods and some other objects have logic both in their RESTStorage object and in their registry object. I would like all that logic to go in the RESTStorage object.
tools.EtcdHelper, generic.Registry, and all of the other registries should be combined. What I mean by that:
- Define a single Storage interface. It should provide CRUD+watch, take keys & api.Objects. generic.Registry is much of the way there, but it could be improved.
- Provide an implementation of the storage interface that uses etcd. tools.EtcdHelper is most of the way there.
- Bonus points: provide a nice fake for tests.
TBD: Pass this Storage interface to RESTStorage objects either at construction time, or in the ctx parameter.

The motivations for this restructuring:

We have an extra layer right now--registries, tools.EtcdHelper, and RESTStorage objects should be two layers, not three.
We want to make it easy for the "business logic" to run in a separate process, potentially discovered at runtime (component plugins). This means that we have to have a well-defined interface by which that object can interact with our storage backend. We do not want component plugins to talk directly to etcd-- this makes schema changes or sharding by key/object type incredibly difficult to coordinate.

Ideally, when this is done, we can easily serve a REST endpoint that implements our storage interface for component plugins.

smarterclayton · 2015-01-25T22:11:15Z

I've started fixing this with #3789 - I'm moving pure API behavior (the set of validation and creation prior to persisting an object into a new package pkg/api/rest, and introducing a common test layer. Only changing create right now, but there's huge holes in our update logic, so that's the next step. Once logic is normalized, we can fold rest into registry or vice versa, and use common update. A few operations that need to be in registry - like atomic merge and batch, can be factored so that the logic to mutate the object is separate from the storage logic.

bgrant0607 · 2015-01-27T20:49:23Z

IMO, no objects should have "business logic" in the apiserver, as per #2726.

/cc @davidopp

bgrant0607 · 2015-01-27T20:50:09Z

Well, ok, we need to do something once we add custom verbs, such as resize, clone, and stop.

smarterclayton · 2015-01-28T01:29:58Z

By business logic I mean validation, and decision about what fields can be written during an update. If we have authz at a field level and fix authz to be able to know about the diff, we don't need the latter as much. Is that the biz logic you're concerned with?

bgrant0607 · 2015-01-28T03:50:49Z

No, I'm primarily concerned with on-the-fly computation of object status in the apiserver, as opposed to in the controller or other responsible component. ResourceLocation doesn't make me happy, either.

Pluggable defaulting and validation makes sense.

smarterclayton · 2015-01-28T04:07:20Z

K, same page

thockin · 2015-01-29T08:04:15Z

To throw a bit of newly-understood information out:

Any API object must not ever include an unexported field, unless said
object also implements the semantic deep equal and conversion (where both
types are the same!). What we have done is created copy and compare
operators that exist outside of the objects they apply to.

If the API decoded into truly dumb structs - never any private values - and
then those got converted to "real classes" for actual use, we could make
that rule more obvious. As it is, I spent the last hour figuring out how
Quantity actually works, and the answer is "by sprinkling deep knowledge of
its structure in a handful of other places". Not so much of an
abstraction...

On Tue, Jan 27, 2015 at 8:07 PM, Clayton Coleman notifications@github.com
wrote:

K, same page

Reply to this email directly or view it on GitHub
#1451 (comment)
.

bgrant0607 · 2015-01-29T08:21:55Z

As we discussed, the important thing IMO is to group the code that implements the objects -- it doesn't have to be in classes. I really, really, really want to keep the "dumb structs" in a form that can be used by clients and third-party components without complex logic, with fully explicit desired and current state for common use cases, so that it doesn't require complex objects or client libraries to interpret, in general. However, for the apiserver, it would be desirable to group custom code by object for stripping out readonly fields, authz, defaulting, conversion, validation, custom verbs (which should just imperatively update the desired state and not directly trigger other actions), fuzzing, test-object generation, etc. We may also want some of that in kubectl, where it's useful to be able to decode and encode multiple API versions, but we don't want to make it impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by the relevant object controller.

smarterclayton · 2015-01-29T15:15:27Z

Agree. The further we go here, the more the core objects remain dumb structs.

----- Original Message -----

As we discussed, the important thing IMO is to group the code that implements
the objects -- it doesn't have to be in classes. I really, really, really
want to keep the "dumb structs" in a form that can be used by clients and
third-party components without complex logic, with fully explicit desired
and current state for common use cases, so that it doesn't require complex
objects or client libraries to interpret, in general. However, for the
apiserver, it would be desirable to group custom code by object for
stripping out readonly fields, authz, defaulting, conversion, validation,
custom verbs (which should just imperatively update the desired state and
not directly trigger other actions), fuzzing, test-object generation, etc.
We may also want some of that in kubectl, where it's useful to be able to
decode and encode multiple API versions, but we don't want to make it
impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by the
relevant object controller.

Reply to this email directly or view it on GitHub:
#1451 (comment)

thockin · 2015-01-30T01:29:38Z

Yeah, I get that the API being defined as dumb structs is super-clean. But
having the objects that we use all over the codebase be dumb structs is
painful.

On Thu, Jan 29, 2015 at 7:15 AM, Clayton Coleman notifications@github.com
wrote:

Agree. The further we go here, the more the core objects remain dumb
structs.

----- Original Message -----

As we discussed, the important thing IMO is to group the code that
implements
the objects -- it doesn't have to be in classes. I really, really, really
want to keep the "dumb structs" in a form that can be used by clients and
third-party components without complex logic, with fully explicit desired
and current state for common use cases, so that it doesn't require
complex
objects or client libraries to interpret, in general. However, for the
apiserver, it would be desirable to group custom code by object for
stripping out readonly fields, authz, defaulting, conversion, validation,
custom verbs (which should just imperatively update the desired state and
not directly trigger other actions), fuzzing, test-object generation,
etc.
We may also want some of that in kubectl, where it's useful to be able to
decode and encode multiple API versions, but we don't want to make it
impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by the
relevant object controller.

Reply to this email directly or view it on GitHub:

#1451 (comment)

Reply to this email directly or view it on GitHub
#1451 (comment)
.

smarterclayton · 2015-01-30T01:33:35Z

It's interesting you feel that way. I know we have a couple of hot touch points, but the rest seem not that bad. I.e accessors for lists (we could convert them to maps on conversion) and getting access to object meta and TypeMeta are the two I think of most. What are the ones you rank most highly?

On Jan 29, 2015, at 8:29 PM, Tim Hockin notifications@github.com wrote:

Yeah, I get that the API being defined as dumb structs is super-clean. But
having the objects that we use all over the codebase be dumb structs is
painful.

On Thu, Jan 29, 2015 at 7:15 AM, Clayton Coleman notifications@github.com
wrote:

Agree. The further we go here, the more the core objects remain dumb
structs.

----- Original Message -----

As we discussed, the important thing IMO is to group the code that
implements
the objects -- it doesn't have to be in classes. I really, really, really
want to keep the "dumb structs" in a form that can be used by clients and
third-party components without complex logic, with fully explicit desired
and current state for common use cases, so that it doesn't require
complex
objects or client libraries to interpret, in general. However, for the
apiserver, it would be desirable to group custom code by object for
stripping out readonly fields, authz, defaulting, conversion, validation,
custom verbs (which should just imperatively update the desired state and
not directly trigger other actions), fuzzing, test-object generation,
etc.
We may also want some of that in kubectl, where it's useful to be able to
decode and encode multiple API versions, but we don't want to make it
impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by the
relevant object controller.

Reply to this email directly or view it on GitHub:

#1451 (comment)

Reply to this email directly or view it on GitHub
#1451 (comment)
.

—
Reply to this email directly or view it on GitHub.

thockin · 2015-01-30T01:46:12Z

Look at Yu-Ju's PR to move defaulting out of the validation code. A
million tests create the "dumb struct" form of Pods, Services, etc. Now
they all need to know what fields are required to pass validation, whereas
they used to get defaults as part of validation. This goes to why
constructors are a good thing - you can assert things like "the DNS policy
(a string) holds a valid value". I can't assert that with a dumb struct.

If we instead said:

all versioned APIs are defined in terms of dumb structs and trivial types
validation and defaulting is applied to versioned dumb structs
code uses internal "smart structs", which can include non-trivial types
the only way to get "smart" structs is to convert from a versioned
struct or else use a constructor

I haven't thought this through fully, it's just something that has been
annoying me and boucing around my head.

On Thu, Jan 29, 2015 at 5:33 PM, Clayton Coleman notifications@github.com
wrote:

It's interesting you feel that way. I know we have a couple of hot touch
points, but the rest seem not that bad. I.e accessors for lists (we could
convert them to maps on conversion) and getting access to object meta and
TypeMeta are the two I think of most. What are the ones you rank most
highly?

On Jan 29, 2015, at 8:29 PM, Tim Hockin notifications@github.com
wrote:

Yeah, I get that the API being defined as dumb structs is super-clean.
But
having the objects that we use all over the codebase be dumb structs is
painful.

On Thu, Jan 29, 2015 at 7:15 AM, Clayton Coleman <
notifications@github.com>
wrote:

Agree. The further we go here, the more the core objects remain dumb
structs.

----- Original Message -----

As we discussed, the important thing IMO is to group the code that
implements
the objects -- it doesn't have to be in classes. I really, really,
really
want to keep the "dumb structs" in a form that can be used by
clients and
third-party components without complex logic, with fully explicit
desired
and current state for common use cases, so that it doesn't require
complex
objects or client libraries to interpret, in general. However, for
the
apiserver, it would be desirable to group custom code by object for
stripping out readonly fields, authz, defaulting, conversion,
validation,
custom verbs (which should just imperatively update the desired
state and
not directly trigger other actions), fuzzing, test-object
generation,
etc.
We may also want some of that in kubectl, where it's useful to be
able to
decode and encode multiple API versions, but we don't want to make
it
impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by
the
relevant object controller.

Reply to this email directly or view it on GitHub:

#1451 (comment)

Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/1451#issuecomment-72041056>

.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub
#1451 (comment)
.

smarterclayton · 2015-01-30T02:33:13Z

On Jan 29, 2015, at 8:46 PM, Tim Hockin notifications@github.com wrote:

Look at Yu-Ju's PR to move defaulting out of the validation code. A
million tests create the "dumb struct" form of Pods, Services, etc. Now
they all need to know what fields are required to pass validation, whereas
they used to get defaults as part of validation. This goes to why
constructors are a good thing - you can assert things like "the DNS policy
(a string) holds a valid value". I can't assert that with a dumb struct.

We have a "constructor" that could do that in scheme.New. Or we could expose scheme.Default(obj) and change all the places to use it.

If we instead said:

all versioned APIs are defined in terms of dumb structs and trivial types

validation and defaulting is applied to versioned dumb structs

Doesn't this mean validation has to be duplicated across api versions?

code uses internal "smart structs", which can include non-trivial types

the only way to get "smart" structs is to convert from a versioned
struct or else use a constructor

Doesn't that mean creating a non trivial object in the versioned object stays equally hard?

Agree we do massive refactors in test cases on every trivial change... However, that particular problem wouldn't go away with a smart internal object or a constructor (we exchange go struct init syntax for X lines of setting fields and attributes).

I think the biggest pain is we have no smart refactor tool that works on init structures, so no matter what, any non-trivial object has to be changed in a billion places.

I haven't thought this through fully, it's just something that has been
annoying me and boucing around my head.

On Thu, Jan 29, 2015 at 5:33 PM, Clayton Coleman notifications@github.com
wrote:

It's interesting you feel that way. I know we have a couple of hot touch
points, but the rest seem not that bad. I.e accessors for lists (we could
convert them to maps on conversion) and getting access to object meta and
TypeMeta are the two I think of most. What are the ones you rank most
highly?

On Jan 29, 2015, at 8:29 PM, Tim Hockin notifications@github.com
wrote:

Yeah, I get that the API being defined as dumb structs is super-clean.
But
having the objects that we use all over the codebase be dumb structs is
painful.

On Thu, Jan 29, 2015 at 7:15 AM, Clayton Coleman <
notifications@github.com>
wrote:

Agree. The further we go here, the more the core objects remain dumb
structs.

----- Original Message -----

As we discussed, the important thing IMO is to group the code that
implements
the objects -- it doesn't have to be in classes. I really, really,
really
want to keep the "dumb structs" in a form that can be used by
clients and
third-party components without complex logic, with fully explicit
desired
and current state for common use cases, so that it doesn't require
complex
objects or client libraries to interpret, in general. However, for
the
apiserver, it would be desirable to group custom code by object for
stripping out readonly fields, authz, defaulting, conversion,
validation,
custom verbs (which should just imperatively update the desired
state and
not directly trigger other actions), fuzzing, test-object
generation,
etc.
We may also want some of that in kubectl, where it's useful to be
able to
decode and encode multiple API versions, but we don't want to make
it
impossible to write clients in other languages, either.

The actual CRUD/REST operations should otherwise be generic.

Status and derived objects (e.g., endpoints) should be filled in by
the
relevant object controller.

Reply to this email directly or view it on GitHub:

#1451 (comment)

Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/1451#issuecomment-72041056>

.

Reply to this email directly or view it on GitHub.

Reply to this email directly or view it on GitHub
#1451 (comment)
.

—
Reply to this email directly or view it on GitHub.

bgrant0607 · 2015-01-30T05:29:45Z

Whatever "constructor" we use, it needs to basically take whole dumb API structs as input. There's no point in duplicating the API objects with some kind of builder or options object. That would be unmaintainable.

It is true that our code is more aspect-oriented than object-oriented currently. As I said before, I think the most important thing is to group the code by API object.

Effectively, our "constructor" is DecodeInto:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/apiserver/resthandler.go#L218

Decoding from versioned structs into the internal rep could leverage this.

@smarterclayton I don't think @thockin necessarily meant that we'd apply validation to the versioned structs, but that we'd validate the internal rep before returning it from the "constructor".

If our tests used the versioned structs, we wouldn't need to change them when changing the internal rep.

bgrant0607 · 2015-02-06T06:53:01Z

We should address #2457 when this is done.

bgrant0607 · 2015-02-23T20:57:48Z

For the record: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/kubernetes-dev/M66vUmBvBu4/V5mwq3v69P0J

bgrant0607 · 2015-02-28T07:43:24Z

Related: #2977

thockin · 2015-03-01T05:29:01Z

I actually did mean that validation would run against versioned objects.
Yes it would mean duplicating validation, but it would be correct
validation with correct error messages.

I don't have time to pursue this at the moment, so I'm just watching
everything else shake out, but it's still very much on my mind.

On Fri, Feb 27, 2015 at 11:43 PM, Brian Grant notifications@github.com
wrote:

Related: #2977
#2977

Reply to this email directly or view it on GitHub
#1451 (comment)
.

davidopp · 2015-03-01T20:12:12Z

I just came across this issue. I am not as familiar with the code as all of you, but from my experience so far, I definitely agree with Tim; I think the code would be simpler/more maintainable (or at least easier to understand) if we had true Go objects corresponding to the API objects. I agree that better grouping the code that manipulates each type would be helpful but I'm not sure it would be as helpful as using objects. I don't think using objects precludes use of "dumb structs" by clients, as the objects/smart structs would just be internal to the API server.

lavalamp · 2015-03-02T17:43:53Z

I think we need to version our client libraries; having it work with the unversioned types makes this hard to get behind. Once the client is versioned, then I think a model where we run the versioned types through a constructor to produce the internal object might be a good idea, but I want to consider it some more.

wojtek-t · 2015-05-20T16:54:50Z

/subscribe

lavalamp · 2016-02-10T21:01:35Z

This issue seems to have drifted considerably. It is true now that we have per-resource storage locations. That's a good step. Actually getting alternative storage backends should be easier now, but still challenging. No more will be done on this for 1.2.

0xmichalis · 2017-03-20T01:10:48Z

@lavalamp I guess this has progressed since last year. Should we also move it to https://github.com/kubernetes/apiserver if there is still work tbd?

smarterclayton · 2017-06-01T16:44:08Z

I think this can be closed and we can open something new to track.

brendandburns added kind/design Categorizes issue or PR as related to design. and removed kind/design Categorizes issue or PR as related to design. labels Sep 29, 2014

bgrant0607 added area/api Indicates an issue on api area. area/apiserver priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed area/api Indicates an issue on api area. labels Dec 4, 2014

smarterclayton mentioned this issue Jan 25, 2015

Generate names automatically when requested #3789

Merged

bgrant0607 mentioned this issue Feb 3, 2015

Our API code is hard to understand #2306

Closed

thockin added the team/cluster label Feb 6, 2015

This was referenced Feb 26, 2015

Versioned clients #4874

Closed

Pluggable storage backends (was Support for Consul K/V storage) #1957

Closed

davidopp added the team/master label Mar 1, 2015

bgrant0607 mentioned this issue May 4, 2015

Versioned API objects should be in a self-contained package #7584

Closed

bgrant0607 added the area/extensibility label May 12, 2015

bgrant0607 mentioned this issue May 20, 2015

Code to automatically generate functions for DeepCopy #8320

Merged

bgrant0607 mentioned this issue May 20, 2015

Move validation to be per-API-version #8550

Closed

This was referenced Jun 19, 2015

Supporting multiple API prefixes #10009

Closed

"v2" API (API/client redesign umbrella issue) #8190

Closed

nikhiljindal added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 27, 2015

erictune removed team/cluster labels Aug 19, 2015

bgrant0607 added this to the v1.2-candidate milestone Sep 12, 2015

lavalamp modified the milestones: next-candidate, v1.2-candidate Feb 10, 2016

bgrant0607 added the triaged label Mar 9, 2017

smarterclayton closed this as completed Jun 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apiserver: re-architect storage #1451

apiserver: re-architect storage #1451

lavalamp commented Sep 25, 2014

smarterclayton commented Sep 27, 2014

lavalamp commented Jan 16, 2015

smarterclayton commented Jan 25, 2015

bgrant0607 commented Jan 27, 2015

bgrant0607 commented Jan 27, 2015

smarterclayton commented Jan 28, 2015

bgrant0607 commented Jan 28, 2015

smarterclayton commented Jan 28, 2015

thockin commented Jan 29, 2015

bgrant0607 commented Jan 29, 2015

smarterclayton commented Jan 29, 2015

thockin commented Jan 30, 2015

smarterclayton commented Jan 30, 2015

thockin commented Jan 30, 2015

smarterclayton commented Jan 30, 2015

bgrant0607 commented Jan 30, 2015

bgrant0607 commented Feb 6, 2015

bgrant0607 commented Feb 23, 2015

bgrant0607 commented Feb 28, 2015

thockin commented Mar 1, 2015

davidopp commented Mar 1, 2015

lavalamp commented Mar 2, 2015

wojtek-t commented May 20, 2015

lavalamp commented Feb 10, 2016

0xmichalis commented Mar 20, 2017

smarterclayton commented Jun 1, 2017

apiserver: re-architect storage #1451

apiserver: re-architect storage #1451

Comments

lavalamp commented Sep 25, 2014

smarterclayton commented Sep 27, 2014

lavalamp commented Jan 16, 2015

smarterclayton commented Jan 25, 2015

bgrant0607 commented Jan 27, 2015

bgrant0607 commented Jan 27, 2015

smarterclayton commented Jan 28, 2015

bgrant0607 commented Jan 28, 2015

smarterclayton commented Jan 28, 2015

thockin commented Jan 29, 2015

bgrant0607 commented Jan 29, 2015

smarterclayton commented Jan 29, 2015

thockin commented Jan 30, 2015

smarterclayton commented Jan 30, 2015

thockin commented Jan 30, 2015

smarterclayton commented Jan 30, 2015

bgrant0607 commented Jan 30, 2015

bgrant0607 commented Feb 6, 2015

bgrant0607 commented Feb 23, 2015

bgrant0607 commented Feb 28, 2015

thockin commented Mar 1, 2015

davidopp commented Mar 1, 2015

lavalamp commented Mar 2, 2015

wojtek-t commented May 20, 2015

lavalamp commented Feb 10, 2016

0xmichalis commented Mar 20, 2017

smarterclayton commented Jun 1, 2017