Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: all controllers are separate web services. #703

Closed
erictune opened this issue Jul 30, 2014 · 33 comments
Closed

RFC: all controllers are separate web services. #703

erictune opened this issue Jul 30, 2014 · 33 comments
Labels
area/api Indicates an issue on api area. area/extensibility kind/design Categorizes issue or PR as related to design. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.

Comments

@erictune
Copy link
Member

This is building on #635, which I agree with, but think does not go far enough.

Premise: There are going to be a lot of third-party controllers

We've talked about several types of controllers. We all know about the replicationController. A run-once controller has been suggested in #662. A cron controller is mentioned in #170. I can't find the issue, but I think there was a suggestion of a run-after feature. My experience at my company is that people like to write their own workflow controllers, and no single one makes everyone happy.

By way of analogy, Mesos lists 19 frameworks:
http://mesos.apache.org/documentation/latest/mesos-frameworks/

Kubernetes core developers won't be able to keep up with all of them and should not.

Premise: Controllers should be separate from the core of Kubernetes

In #635 Clayton explains some reasons for keeping parts of the API separate from each other:

"to enforce boundaries, prevent coupling, allow higher security, allow api versioning at different rates, and allow experimentation for competing implementations"

Given that there will be untrusted controllers, I propose that all controllers, including replication controller, should be treated as untrusted, to keep the playing field level.

I'm thinking of the division betwee kubernetes proper and controllers as based on the following principle:
Kubernetes-proper supports objects whose creation require direct interaction with machines or network infrastructure, (and objects which are needed to support authentication, auditing and accounting). So, it would have:
* pods and volumes because these interact with machines.
* services, because these interact with the network infrastructure.
* podTemplates, because these are needed to narrow the scope of delegated authority.

Proposal: all controllers run as separate web services from the core of kubernetes.

Given that controllers are not trusted by the core, they can't be part of the same process,. So, they might as well be their own web services.

Sketch of how this might work:

Kubernetes core, and each controller runs as its own process, with its own deployment cycle and own ip+port. For example:

  • Kubernetes proper runs at a known location, say https://kubernetes.example.com/...
  • The replicationController might run at say https://kube-repl.example.com/....
  • A user written controller might run at https://my-kron.example.com/...

All use of controllers requires delegation. A user who wants to create a cron job that is managed by the aforementioned cron controller would do:

  1. Create a pod template using PUT https://kubernetes.example.com/podTemplate
  2. Create a cron-spec using PUT https://my-kron.example.com/cronSpec
  3. Creation of the cron-spec would trigger an Oauth flow where the user authorized https://my-kron.example.com to create pods using that podTemplate at https://kubernetes.example.com/....
  4. Periodically, https://my-kron.example.com runs a cron job using a token it got from the Oauth flow to start and monitor completion of a pod.

If a particular installation wants to provide a single endpoint then they can run a reverse proxy that maps each controller to a controller pseudo-resource path. Extending the above example, the proxy might map:
* https://kube-api.example.com/ to https://kubernetes.example.com/...
* GET https://kube-api.example.com/controllers would return static content with a list of supported controllers.
* https://kube-api.example.com/controllers/replication/... to https://kube-repl.example.com/
* https://kube-api.example.com/controllers/cron/... to https://my-kron.example.com/...

The kubernetes project would control allocation of names under the controllers/ path.

I haven't show this above, but each web service can have its own version identifier in its url. The reverse proxy may provide its own version identifier which identifies the version of the collection of APIs as a whole, and needs not bear any relation to the versions of the individual web services. It is distinguished by not having the same hostname as any individual service.

The kubernetes core and various replication controllers could share the same APIserver via use of an APIserver library, as suggested in #700.

Certain controllers, such as replicationController would be made a standard part of the kubernetes-in-the-large API by virtue of being started by kube-up.sh and being distributed as part of the kubernetes source. But all controllers would, by principle of least privilege, run separately from the core apiserver.

@lavalamp
Copy link
Member

I am on board with this direction. I'd like to start out by making a plugins/ directory. Once #549 is merged, the current replication controller won't have etcd dependencies and can be moved there without much work. The second thing to go there is the scheduler that I keep threatening to write. We can then start moving these things into their own repos.

We do need to decide if we want to allow controllers to store data in the k8s system, or if they are responsible for their own data.

If yes:

  • IMO, we should require that their data has Version and Kind fields, just like our data. This will force people to think about upgradability and hopefully they'll just reuse our api/ system which we will hopefully package up nicely. We may need to add a Namespace field to prevent collision between Kind fields.
  • We'd need to set some sort of quota so plugins don't store unreasonable quantities of data.

If no:

  • Simpler for k8s, but it seems like nearly everyone will want to store some sort of data and probably they won't do it in the same way, which sounds like a maintainability/compatibility problem.

@erictune
Copy link
Member Author

@lavalamp Probably the core docker API should not have a "store my data" interface. Probably the RESTful web service library suggested in #700 should include an etcd client which does namespacing, Version and Kind, in such a way that several binaries which use that library can all share a common etcd.

@smarterclayton
Copy link
Contributor

Re: store my data - Someone putting together a set of plugins for an environment may decide that they want to share the same data store - it's not the job of the upstream project to prevent that. If interfaces between components (API controllers) are strongly defined, you get the flexibility to make those choices as a deployer because clients must bake in the assumption that those components may be located separately.

Re: api object isolation - is it sufficient for namespace to be the package of the api type?

Re: Reverse proxy - should clarify that it doesn't have to be a full reverse proxy - it could be integration code provided by a particular deployer / setup that merges those endpoints (I had to reread a few times to be sure that's what you intended). Isolation by path vs port/hostname should be possible as well for the smaller deployment types.

@erictune
Copy link
Member Author

erictune commented Aug 1, 2014

@smarterclayton. I agree with you on Reverse Proxy with the caveat that I don't understand how the security and delegation is going to work well enough to know if the integration you suggest will reduce security due to some subtle issue (e.g. around CORS or certificates). Certainly, if the person chosing to do that additional integration trusts all the integrated components, then go for it.

@erictune
Copy link
Member Author

erictune commented Aug 1, 2014

@smarterclayton re: api object isolation: haven't thought it through. Don't have opinion one way or the other.

@thockin
Copy link
Member

thockin commented Aug 23, 2014

I had been operating on the model that these plugins would be co-scheduled
processes on a master node, but still distinct plugins. If we can break
the need to be on the same master node, and still present a reasonably
coherent API (e.g. Kind, versioning, etc), great. Looser coupling is
better, probably.

But what is the turtle at the bottom? If replication controller is a
plugin, how do we keep that plugin running? Perhaps we need $small_num
"core" controllers that are for bootstrap. E.g. a singleton controller
that starts a pod and re-schedules it in case of death. This sort of
limits how complicated controllers can be (e.g. a replication controller
can't be replicated, I guess).

On Fri, Aug 1, 2014 at 12:50 PM, erictune notifications@github.com wrote:

@smarterclayton https://github.com/smarterclayton re: api object
isolation: haven't thought it through. Don't have opinion one way or the
other.

Reply to this email directly or view it on GitHub
#703 (comment)
.

@smarterclayton
Copy link
Contributor

Obviously, we need to implement a TurtleController which can create itself. I feel like the config discussion at least addresses part of the concern in that you should be able to ensure a set of things are defined as existing repeatedly.

As a deployer / distributor, at small scales there's very little benefit and a lot of cost to treating everything as a bunch of independent processes. Having X things vs 1 thing is just a different cost metric. As size of deployment increases those things do need to be separate, but that number is in the tens of minions.

@thockin
Copy link
Member

thockin commented Aug 23, 2014

My point was that this "turtle controller" needs to be a guaranteed
property of the most minimal k8s master instantiation.

I'm having trouble parsing your second pgph. Are you saying that
cooperating processes are a good thing or a bad things? And are you
distinguishing "processes" from "pods"?

On Sat, Aug 23, 2014 at 1:49 PM, Clayton Coleman notifications@github.com
wrote:

Obviously, we need to implement a TurtleController which can create
itself. I feel like the config discussion at least addresses part of the
concern in that you should be able to ensure a set of things are defined as
existing repeatedly.

As a deployer / distributor, at small scales there's very little benefit
and a lot of cost to treating everything as a bunch of independent
processes. Having X things vs 1 thing is just a different cost metric. As
size of deployment increases those things do need to be separate, but that
number is in the tens of minions.

Reply to this email directly or view it on GitHub
#703 (comment)
.

@smarterclayton
Copy link
Contributor

Cooperating processes are good things at high scale, and irritating things at low scale. As an exaggeration, if I had to run 20 supporting pods (one for each extra resource and controller) for a cluster with two minions, the operational cost for those 20 pods is likely very high. So having one process / pod serve multiple resources at low scales is very desirable.

@erictune
Copy link
Member Author

What motivates an organization that only has 2 minions to use kubernetes?
Isn't kubernetes solving problems that only become evident at larger
scales?

On Mon, Aug 25, 2014 at 2:16 PM, Clayton Coleman notifications@github.com
wrote:

Cooperating processes are good things at high scale, and irritating things
at low scale. As an exaggeration, if I had to run 20 supporting pods (one
for each extra resource and controller) for a cluster with two minions, the
operational cost for those 20 pods is likely very high. So having one
process / pod serve multiple resources at low scales is very desirable.


Reply to this email directly or view it on GitHub
#703 (comment)
.

@KyleAMathews
Copy link
Contributor

All companies want to grow?

Also, even with just a few apps, nice to have standardized, high quality
patterns for deployment, logging, HA, etc.
On Aug 25, 2014 3:38 PM, "erictune" notifications@github.com wrote:

What motivates an organization that only has 2 minions to use kubernetes?
Isn't kubernetes solving problems that only become evident at larger
scales?

On Mon, Aug 25, 2014 at 2:16 PM, Clayton Coleman notifications@github.com

wrote:

Cooperating processes are good things at high scale, and irritating
things
at low scale. As an exaggeration, if I had to run 20 supporting pods
(one
for each extra resource and controller) for a cluster with two minions,
the
operational cost for those 20 pods is likely very high. So having one
process / pod serve multiple resources at low scales is very desirable.


Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/703#issuecomment-53333902>

.


Reply to this email directly or view it on GitHub
#703 (comment)
.

@lavalamp
Copy link
Member

IMO, we can just close this with "Yes". Our only controller at the moment is a separate binary, and new controllers will follow this pattern.

Also, I'm not so sure about the premise of lots of third-party controllers. I think if we support sharding and single-job controllers, that's like 95% of what people have asked for.

@smarterclayton
Copy link
Contributor

I think you'll be surprised. This is good enough for now though - controllers should be isolated from the core via APIs. Endpoints still needs to be separate controller but it needs the watch impl in my other pull... (Hint hint)

@thockin
Copy link
Member

thockin commented Aug 26, 2014

@lavalamp "Our only controller at the moment is a separate binary"

It is a separate binary, but very much not a plugin.

The point of this RFC (as I understood it, anyway) was to ask whether we think plugin modules (most controllers, probably) should exist as distinct services with (e.g.) DNS names of their own, or whether they need to register with a core apiserver and let that thing proxy or redirect to them.

@lavalamp
Copy link
Member

It is a separate binary, but very much not a plugin.

What would it have to do differently for you to consider it a plugin?

The point of this RFC (as I understood it, anyway) was to ask whether we think plugin modules (most controllers, probably) should exist as distinct services with (e.g.) DNS names of their own, or whether they need to register with a core apiserver and let that thing proxy or redirect to them.

I say the latter. (#991)

@smarterclayton
Copy link
Contributor

We should clarify terminology on controller - it took me a while to grok that you guys considered controller as the API endpoint + the synchronization loop process (or at least.... I think you do?). If you do, that's what I thought was a component in the other thread. If you don't, I'm confused why a controller would register with a core apiserver (since it doesn't expose an API)

@erictune
Copy link
Member Author

replicationController is not the best name for an API object. It holds
settings for controlling replication, but it isn't the process that runs
the control loop. That process should, IMO, get to be called
the replicationController.

On Tue, Aug 26, 2014 at 4:33 PM, Clayton Coleman notifications@github.com
wrote:

We should clarify terminology on controller - it took me a while to grok
that you guys considered controller as the API endpoint + the
synchronization loop process (or at least.... I think you do?). If you do,
that's what I thought was a component in the other thread. If you don't,
I'm confused why a controller would register with a core apiserver (since
it doesn't expose an API)


Reply to this email directly or view it on GitHub
#703 (comment)
.

@lavalamp
Copy link
Member

Yeah, I view the "dial home and register yourself" step as only necessary if you're extending the k8s api.

@lavalamp
Copy link
Member

Yeah, it's confusing to have the api object and process have the same name.

@smarterclayton
Copy link
Contributor

So what should we call a controller (the process) that extends the k8s api (with a new resource)? Is that a subtype of controller? What if you extend the k8s api but don't add a new controller process?

@lavalamp
Copy link
Member

Controllers are a subset of plugins.

"Plugin" is a category that covers extensions to the api objects, master behavior (controller, scheduler), or minion behavior (e.g., kubelet GCE PD mounting).

If your plugin adds an api endpoint, I maintain that you should register this with apiserver so it can proxy/redirect requests to you.

If your plugin is a master component that operates on existing types, congrats, you can do that right now. (The current controller manager is this.)

If your plugin adds new fields to an API object, we need to hash that out 'cause you can't do that now. (GCE PD mount settings should work like this.) Likewise if you need kubelet to do something fancy with your API Objects/extensions.

@thockin
Copy link
Member

thockin commented Aug 26, 2014

To be a plugin, in my mind, means that there are NO external references
into the plugin's "concept space" from the core. e.g. The apiserver does
not have a string "replicationControllers" anywhere in its code. The
apiserver would work correctly without the plugin.

On Tue, Aug 26, 2014 at 4:26 PM, Daniel Smith notifications@github.com
wrote:

It is a separate binary, but very much not a plugin.

What would it have to do differently for you to consider it a plugin?

The point of this RFC (as I understood it, anyway) was to ask whether we
think plugin modules (most controllers, probably) should exist as distinct
services with (e.g.) DNS names of their own, or whether they need to
register with a core apiserver and let that thing proxy or redirect to them.

I say the latter. (#991
#991)

Reply to this email directly or view it on GitHub
#703 (comment)
.

@lavalamp
Copy link
Member

Yeah, so if that's a requirement, then controller-manager should be handling the replicationController resource, and should register itself with apiserver as doing so. Before we can do that we should split out PodTemplate, though.

@thockin
Copy link
Member

thockin commented Aug 27, 2014

IMO

Plugin is anything that can be swapped in and out from a framework without
that framework knowing. That might be modules within the code (see the
work on cloud provider plugins last week), or it might be whole binaries or
even services. The main point is that core code knows nothing about the
plugin's concepts. Go interfaces are ALMOST plugins except there's always
somewhere that knows the real type to instantiate it.

If you want to extend the kubernetes API, that is an API plugin. TBD
whether API plugins register with a central API server and pretend to be a
single API, or not. If "not", it's less of a plugin and more of an app :)
Where this really gets interesting is when you have plugins that layer on
each other. Or maybe we just won't allow that.

The ControllerManager as it exists today is a weird hybrid thing. It
assumes responsibility for a portion of the API space, but that space is
pre-defined by the apiserver.

Agree with your statement about adding fields to existing API objects.
Need to sort that out.

On Tue, Aug 26, 2014 at 4:44 PM, Daniel Smith notifications@github.com
wrote:

Controllers are a subset of plugins.

"Plugin" is a category that covers extensions to the api objects, master
behavior (controller, scheduler), or minion behavior (e.g., kubelet GCE PD
mounting).

If your plugin adds an api endpoint, I maintain that you should register
this with apiserver so it can proxy/redirect requests to you.

If your plugin is a master component that operates on existing types,
congrats, you can do that right now. (The current controller manager is
this.)

If your plugin adds new fields to an API object, we need to hash that out
'cause you can't do that now. (GCE PD mount settings should work like
this.) Likewise if you need kubelet to do something fancy with your API
Objects/extensions.

Reply to this email directly or view it on GitHub
#703 (comment)
.

@thockin
Copy link
Member

thockin commented Aug 27, 2014

I think "controller" is the live object that implements some control loop.
A controller is hosted in a processes. today "controller_manager" hosts
all replicationControllers. "Manager" is about as generic as it comes in
terms of terminology, but it works well enough.

On Tue, Aug 26, 2014 at 4:38 PM, Clayton Coleman notifications@github.com
wrote:

So what should we call a controller (the process) that extends the k8s api
(with a new resource)? Is that a subtype of controller? What if you extend
the k8s api but don't add a new controller process?

Reply to this email directly or view it on GitHub
#703 (comment)
.

@smarterclayton
Copy link
Contributor

Where this really gets interesting is when you have plugins that layer on
each other. Or maybe we just won't allow that.

The way we're describing these sounds more like an interface than an implementation. A restful API is an interface you talk to - the controller is part of the implementation (although a controller could bridge two interfaces by reading one and writing to another in which case it's probably an adaptor). A plugin to me has always meant a particular implementation of a downward interface for the specific purpose of varying behavior. So I naturally assume a GCE volume mount type is a plugin, but volume mounts and replication controllers don't seem to me to be the same sorts of abstractions and so I balk a bit at saying a ReplController "plugs-in" to the system.

Replication controllers and services are composition objects that expose their own interfaces (upwards) and depend on the pod interface (below). You don't need to talk to the pod interface if you have a repl interface - it does it for you. When you guys say "layer" I hear composition as well (since layering is about hiding details below).

I can imagine several levels of objects or applications that could depend on the interfaces described above. It does seem like composition has diminishing returns when you're dealing with restful objects (I'd rather not have 5 layers of objects that propagate context down whenever the top changes), but at the same time the ability to compose is incredibly powerful. Don't like services for talking about serving http from the edge? Wrap them in a higher level concept that ties DNS, edge visibility, and caching constructs around it.

All IMO of course

@erictune
Copy link
Member Author

@lavalamp @smarterclayton
What do you call a thing that:

  • has code living somewhere other than the k8s github repository (e.g. wants different license)
  • interacts with k8s objects
  • reuses k8s identity, project, and authorization.

Is that a plugin or not? It is the sort of thing I was thinking of when I filed this bug and which I think may come to exist, and that we want to be possible to create.

@lavalamp
Copy link
Member

lavalamp commented Sep 2, 2014

@erictune I think it must also inject something into the API to be a plugin. Plain old regular apps should be able to do all the things you list.

@bgrant0607 bgrant0607 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 3, 2014
@bgrant0607 bgrant0607 added the area/api Indicates an issue on api area. label Feb 28, 2015
This was referenced May 12, 2015
@smarterclayton
Copy link
Contributor

Do we need this issue still? Most things are controller loops, and almost all run in pods. We still need to make it easy to script a custom controller

@thockin
Copy link
Member

thockin commented Sep 12, 2015

I took this to mean that we further break down the monolithic controller
manager. Did I misinterpret?

On Fri, Sep 11, 2015 at 5:09 PM, Clayton Coleman notifications@github.com
wrote:

Do we need this issue still? Most things are controller loops, and almost
all run in pods. We still need to make it easy to script a custom controller


Reply to this email directly or view it on GitHub
#703 (comment)
.

@smarterclayton
Copy link
Contributor

Is the monolithic one a problem? There's a reductionist quality to that
that I don't think is practical. My read was originally controllers meant
"distinct components", and it doesn't seem that we need one process for
each API resource or one process for each controller. We should be able
to split them, but we don't have to.

On Sep 12, 2015, at 3:09 AM, Tim Hockin notifications@github.com wrote:

I took this to mean that we further break down the monolithic controller
manager. Did I misinterpret?

On Fri, Sep 11, 2015 at 5:09 PM, Clayton Coleman notifications@github.com
wrote:

Do we need this issue still? Most things are controller loops, and almost
all run in pods. We still need to make it easy to script a custom
controller


Reply to this email directly or view it on GitHub
<
https://github.com/kubernetes/kubernetes/issues/703#issuecomment-139690682>
.


Reply to this email directly or view it on GitHub
#703 (comment)
.

@thockin
Copy link
Member

thockin commented Sep 15, 2015

I'm fine to close this

On Sat, Sep 12, 2015 at 11:24 AM, Clayton Coleman notifications@github.com
wrote:

Is the monolithic one a problem? There's a reductionist quality to that
that I don't think is practical. My read was originally controllers meant
"distinct components", and it doesn't seem that we need one process for
each API resource or one process for each controller. We should be able
to split them, but we don't have to.

On Sep 12, 2015, at 3:09 AM, Tim Hockin notifications@github.com wrote:

I took this to mean that we further break down the monolithic controller
manager. Did I misinterpret?

On Fri, Sep 11, 2015 at 5:09 PM, Clayton Coleman <notifications@github.com

wrote:

Do we need this issue still? Most things are controller loops, and almost
all run in pods. We still need to make it easy to script a custom
controller


Reply to this email directly or view it on GitHub
<
#703 (comment)

.


Reply to this email directly or view it on GitHub
<
#703 (comment)

.


Reply to this email directly or view it on GitHub
#703 (comment)
.

@erictune
Copy link
Member Author

I don't think we need to go this far anymore. Closing

vishh pushed a commit to vishh/kubernetes that referenced this issue Apr 6, 2016
This makes it easier to ensure the error conditions are handled well and
that we don't leak watches.

Fixes kubernetes#703
seans3 pushed a commit to seans3/kubernetes that referenced this issue Apr 10, 2019
b3atlesfan pushed a commit to b3atlesfan/kubernetes that referenced this issue Feb 5, 2021
…_s390x

network order functionality changed based on endianess
linxiulei pushed a commit to linxiulei/kubernetes that referenced this issue Jan 18, 2024
Update comment to be consistent with reality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/extensibility kind/design Categorizes issue or PR as related to design. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done.
Projects
None yet
Development

No branches or pull requests

7 participants