-
Couldn't load subscription status.
- Fork 43
Add design-proposal for discussion. #3
Conversation
|
After sent out the PR, I feel like it has too much wording? |
5d6c3ca to
9b4f02c
Compare
9b4f02c to
dc29ed7
Compare
| 3. The third approach is somehow a combination of the first two. | ||
| It has the benefits of the both worlds, rktlet lives outside of Kubelet, so it could lead to higher iteration speed, smaller Kubelet size, and better modularity. | ||
| Also on the other side, the 'rkt API service' is compiled with rkt, and communicate with the rktlet through gRPC, so we still can use a lot of the existing rkt facilities. | ||
| The potential con is that instead of one gPRC service, now we introduce two, which may or may not cause some overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also increases blast radius. I'm 👎 on 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to know how much overhead there is, just for the reference. I think @feiskyer did some latency measurement before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Y, there is a measurement of logging api on json-based http api, but gRPC should have better performance.
|
As a general sidenote, this CRI and the client/server proposals seem to go in the exact opposite direction of the goals we initially set: one (or more) new long-running high-privileged services are being introduced here, becoming a SPOF for the whole node. I'm sure this is not the first time this concern is being raised, but here it becomes quite apparent. |
@lucab A non privileged service + setuid + exec'ing the rkt binary to solve this? And for listing/introspection tasks, we can use library, or something like today's rkt api service? |
|
@lucab this should be thought of as an extension of kubelet and potentially could be vendored in to kubelet to reduce the SPOF concern; that being said, there's also benefit in decoupling and defining clear API bounderies, which is a trend seen among other Kubernetes components. The kubelet will be a highly privileged spof no matter what; that's baked into its design. This is an extension of that, and we can potentially work to drop privileges/re-exec in order to reduce the privileges here, but when we frame this as more a component of kubelet than a component of rkt, it is less clearly against other goals explicitly and more a continuation of the state of things. Additionally, the SPoF should be limited in that the lifecycle of On to my opinion on what the notable pros / cons of each given approach are. There are two axis to consider: 1) whether it's a separate binary or vendored, and 2) how it interacts with rkt (library vs exec) Separate process (vs vendored)Pros
Cons
library-like vendoring (vs process exec)Pros
Cons
ConclusionIt's possible that I missed details above, and it's also up for interpretation which axis matter. I have a preference for integrating as a service in no small part because the node team, iiuc, encourages that. The improved boundary / separation of responsibilities is nice, though we really get that either way. Library vs exec is a hard one, but I feel like being able to avoid version-skew issues makes a library or distributing a specific version of the rkt binary along with the rktklet quite attractive. I worry that some of the values of rkt's design aren't aligned with being used as a library by a long-lived daemon which exclusively manages an owned data-directory, but I can also individually see attractive qualities of each of those choices. The extreme opposite, of conforming to CRI via this code, but re-vendoring into kubelet and then exec-ing rkt for everything is still afaik a viable option to discuss, @lucab, and seems in-line with the current state of things (other than introducing the app-level concept and a related layer of abstraction). |
|
Thanks @euank for the write-up. Since most are leaning towards having a separate process (variant 1), replacing the current rkt api-service, I'd like to throw in another idea for variant 1. Instead of deleting/pruning the rkt api-service from the rkt code base, and migrating it here as We would also be removed from the "burden" of providing a rkt-library as things would continue to function mechanically as with the Existing distribution models would continue working, CoreOS simply continues shipping rkt, and users having rkt installed on their machines can start the "new" |
|
Thanks for the analysis @euank, I agree that those are the two main decisional points. It took me a bit to digest this all and also went digging around existing k8s proposals to get a better understanding. As a casual/naive observer, these points raised my original concerns:
But I see they are coming from the server/client proposal and intended to be designed exactly like this. If we can play outside of that proposal than my wishlist would be:
|
|
OOB discussed with @euank . It sounds like setuid bits for rkt might cause some security concern we are not certain about. Another idea is creating the unix socket that is only accessible by some groups (say kubernetes group), and having kubelet run with that group ID. Other than that, are there any ideas that we can make kubelet to contact with a separate process/service that has root privilege (in order to fetch images, launch containers), while keeping kubelet non root? |
|
@euank @yifan-gu maybe we should consider distributing multiple binaries with k8s releases. Since including an additional binary is no different than supporting a runtime and creating a repo for it under the organization |
|
I think the security concerns, while important to consider, aren't the primary focus of this issue. Let's just proceed now with the assumption that any communication can be a root-only unix socket and if we need to break it down further we can. Based on @lucab's comments and some further OOB discussion (and an attempt to raise this discussion at sig-node), it seems likely to me that the most productive course of action would be to implement as a client/server two-processes approach. This does kick distribution issues down the road and I have no doubt there will be some problems to deal with there, but it's definitely the quickest way to start getting something functional out there (especially given the other runtime integrations working in a similar way). I'll assert that converting from a standalone service to a vendored library will also be possible if we implement with that in mind, while the reverse would be quite difficult, so this also isn't a bad path for ending up in either state. Based on the above, I think that the right course of action is to implement it as a client/server runtime for the moment. The exec vs library discussion is a more difficult one to pick a horse in. One of the valuable things to consider is what the rkt project thinks is the right approach here because, well, they're going to end up determining how well supported (or not). I'll lean towards an initial implementation of client/server + exec, with the knowledge that coming back and merging in rkt code to end up with a I'll also propose that our |
This is reasonable and not far away for what was already floating around (a root-like group ownership). It keeps things simple, and can be evolved later.
Ack. Do you prefer to start from the current api-service codebase or build from scratch?
Let's start this way, as I think it will encounter less blockers on the rkt side. In parallel we'll try to polish a bit the interface/modules exported by rkt and see if it is feasible to have one binary less to distribute. |
|
@lucab I think it's better to start from the work @tmrts did, and disregard the api-service. If we use exec, the api-service has no value for us (since it was an initial version of the 'use rkt as a library' thing really and I'd argue it's less supported than the rkt cli equivalent operations), but it also doesn't save too much in the |
|
closing in favor of #4 |
|
This is a long thread, but let me just say this. This is a decision that can be revisited if need be. I lean towards building it in, but developing out of core-tree. I think this gives the best testing and least version-skew-hell. But that can always be revisited and the rktlet (OMG better name needed) moved out. |
cc @tmrts @euank @alban @iaguis @lucab @s-urbaniak
Try to address #2