New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use shared informers in gc controller if possible #45427
Conversation
@k8s-bot bazel test this |
// add the event to the dependencyGraphBuilder's graphChanges. | ||
AddFunc: func(obj interface{}) { | ||
setObjectTypeMeta(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this was originally needed. Can runtime objects exist w/o a Kind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you don't have this, kind and apiVersion are empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we needed it because TypeMeta is reset when unmarshalled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubernetes/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/conversion.go
Lines 157 to 163 in db0b0bd
// +k8s:conversion-fn=drop | |
func Convert_v1_TypeMeta_To_v1_TypeMeta(in, out *TypeMeta, s conversion.Scope) error { | |
// These values are explicitly not copied | |
//out.APIVersion = in.APIVersion | |
//out.Kind = in.Kind | |
return nil | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing this? Does it mean that any controller that cares about TypeMeta needs to reset the fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kargakis I don't remember the history. @smarterclayton or @deads2k or @liggitt or @lavalamp probably do?
return nil, fmt.Errorf("expected runtime.Object, got %#v", obj) | ||
} | ||
if clone { | ||
copy, err := api.Scheme.DeepCopy(runtimeObject) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this DeepCopy affect the performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's presumably going to do memory allocations and take some cpu time that it wouldn't otherwise be doing. I haven't measured it yet. I wanted to get this up for review first.
I just looked a bit more, and it only uses the object so it can get access to the meta.Accessor
and meta.TypeAccessor
. Maybe we could be more efficient and not use the full object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could not clone and instead include the GVK as part of the event object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible change that removes the need to clone: https://gist.github.com/ncdc/24b2716d448cd820eac897c4e5ba5521
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the changes in my gist pass unit & integration tests locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed this as the 2nd commit in this PR. PTAL
@k8s-bot bazel test this |
1 similar comment
@k8s-bot bazel test this |
rbac.NewRule("list", "watch").Groups(certificatesGroup).Resources("certificatesigningrequests").RuleOrDie(), | ||
rbac.NewRule("list", "watch").Groups(storageGroup).Resources("storageclasses").RuleOrDie(), | ||
// Needed for all shared informers | ||
rbac.NewRule("list", "watch").Groups("*").Resources("*").RuleOrDie(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of scary. It means the core controllers can see anything in the entire system, whereas before it had to be enumerated. Not terribly different from the loopback controller, but means we don't have a resource that you can't add to garbage collector (like can you use this to fish for token names by doing timing channel attacks on the gc controller?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Loopback client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated doing this vs enumerating. The problem with enumerating is that as new types are added, people likely will forget to add them to the policy, and then you'll see RBAC denials in the logs that may or may not be meaningful (depending on whether or not the new types need to participate in GC).
Also the current GC controller rule is exactly this, but it's scoped to the GC controller and not the controller-manager's client. So this does expand what the controller-manager client can list/watch, but it doesn't broaden the GC controller's powers, as they're already this broad.
We don't need to rush this PR in, so let's discuss how we want to proceed and I'll make the changes once we reach a consensus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton have you had any more time to think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Game theory:
- We add a new resource which is super powerful and the garbage collection controller shouldn't have.
- We close the hole where garbage collector seeing secrets means it's root on the cluster anyway
- Would everyone have to migrate from
*
to a maintained list because 1 now shouldn't be given to GC?
Is there anything more powerful than the GC controller already is (can delete everything in the cluster)? Should the GC controller not control PodSecurityPolicy by default? Should the GC controller control RBAC by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton I would expect 1. the encryption in the storage to be configurable so I can encrypt more than secrets and 2. I could see the etcd write key being stored in its own TPR that I would probably not want the GC to have access to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would everyone have to migrate from * to a maintained list because 1 now shouldn't be given to GC?
So we would expect anyone contributing an aggregated API server or a customresource will create another role and bind the garbage collector to it? It could be done, but we should set the expectation now so we can start working through examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We add a new resource which is super powerful and the garbage collection controller shouldn't have.
Add it where - to the core? As a TPR? Via an extension apiserver? Or maybe it doesn't matter and the answer is just "yes" 😄.
To recap where we are today, in the master branch, without my PR:
- The GC controller client is allowed to get, list, watch, patch, update, and delete / (
rbac.NewRule("get", "list", "watch", "patch", "update", "delete").Groups("*").Resources("*").RuleOrDie(), - The GC controller only sets up monitoring resources when it is constructed, and it doesn't refresh (e.g. to pick up new TPRs) unless I'm missing seeing it (
if err := gb.monitorsForResources(deletableResources); err != nil {
My PR grants the controller-manager the ability to list/watch /, so all current and future shared informers can function without having to remember to modify the policy.
Maybe we need to consider looking into a way to restrict which portions of the code (controllers) are allowed to use specific shared informers? Otherwise I don't have a good idea for how to use shared informers in the GC controller without granting list/watch of / to the controller-manager client.
@k8s-bot bazel test this |
}, | ||
} | ||
|
||
shared, err := gb.sharedInformers.ForResource(resource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where you're restarting the shared informers. This could suddenly start watching a previously unwatched resource, so you'll need to be sure that you start the informers again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the informer needs to restart when gc restarts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current flow with this PR looks like this:
- Create GC controller
- Create "monitors" aka controllers and set up event handlers
- Run GC controller
- Call
Run()
on all the monitors. If the monitor is a dummy controller from a shared informer, this will be a no-op. Otherwise, it will start the lister/watcher/reflector from the non-shared controller.
- Call
- Instantiate and run various other controllers
- Start the shared informers
Given that the GC controller doesn't currently reread discovery and create new monitors for newly seen resources (or does it???), can we punt on this for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the GC controller doesn't currently reread discovery and create new monitors for newly seen resources (or does it???), can we punt on this for now?
It will happens in 1.7. I guess it doesn't have to be this pull, but definitely place a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all controllers (gc, replicaset controller, etc.) always restart at the same time? I think it's true, so i think the current PR works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caesarxuchao what are you referring to re controllers restarting? AFAIK all controllers start when the kube-controller-manager starts, and they all stop when that process terminates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ignore the "restarting" part. What i was concerned with is the following situation: the shared informer had started by other controllers, then GC starts, and it misses the earlier events. This case seems to be impossible since GC calls Run() on all monitors when it starts.
But will other controllers call Run() on a shared informer? Will that cause GC see duplicate events (though GC should operate correctly if there are duplicate events)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 similar functions that can start these things:
- shared informer factory
Start()
- this starts any initialized but unstarted shared informers in the factory. It's a no-op for ones that are already started. - shared informer
Run()
- this starts running the given shared informer, and is what (1) calls. This is never invoked anywhere in the GC controller code. - controller
Run()
- this runs a controller, and is what the GC controller calls when it's starting monitors. For shared informers, they return a dummy controller whose Run() implementation is a no-op, so we're safe here.
There is a potential danger if multiple callers invoke a shared informer's Run() function, and we should probably guard against that happening, but it's not in the code right now.
Does this answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, @ncdc .
this runs a controller, and is what the GC controller calls when it's starting monitors. For shared informers, they return a dummy controller whose Run() implementation is a no-op, so we're safe here.
Does this mean the gc will miss the ADD events for the objects that are already in the sharedInformer before gc invokes the dummyController.Run()
?
// TODO: consider store in one storage. | ||
glog.V(5).Infof("create storage for resource %s", resource) | ||
client, err := gb.metaOnlyClientPool.ClientForGroupVersionKind(kind) | ||
if err != nil { | ||
return nil, err | ||
} | ||
gb.registeredRateLimiterForControllers.registerIfNotPresent(resource.GroupVersion(), client, "garbage_collector_monitoring") | ||
setObjectTypeMeta := func(obj interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this mutation become unnecessary in the non-shared informer case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I pass down the gvk, which has all the info we need, in the event
// Check if the node already exsits | ||
existingNode, found := gb.uidToNode.Read(accessor.GetUID()) | ||
switch { | ||
case (event.eventType == addEvent || event.eventType == updateEvent) && !found: | ||
newNode := &node{ | ||
identity: objectReference{ | ||
OwnerReference: metav1.OwnerReference{ | ||
APIVersion: typeAccessor.GetAPIVersion(), | ||
Kind: typeAccessor.GetKind(), | ||
APIVersion: event.gvk.GroupVersion().String(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build up the string, don't rely on the stringification for display purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC typeAccessor.GetAPIVersion()
is doing exactly this, but I can build it up instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted with @deads2k on irc. typeAccessor.GetAPIVersion() just does this under the covers:
obj.GetObjectKind().GroupVersionKind().GroupVersion().String()
which is a multi-line function that has some special casing for "v1" vs everything else. David said he won't block the PR on this (given that it's not a change in functionality).
All tests are green. Ready for final review, I think. We also still need to resolve @smarterclayton's question about granting the controller manager client permission to list/watch everything. I'll squash once we've resolved the outstanding policy question. |
Restart when? Controllers aren't restarted, are they?
…On Tue, May 16, 2017 at 8:14 PM Chao Xu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/controller/garbagecollector/graph_builder.go
<#45427 (comment)>
:
> + },
+ DeleteFunc: func(obj interface{}) {
+ // delta fifo may wrap the object in a cache.DeletedFinalStateUnknown, unwrap it
+ if deletedFinalStateUnknown, ok := obj.(cache.DeletedFinalStateUnknown); ok {
+ obj = deletedFinalStateUnknown.Obj
+ }
+ event := &event{
+ eventType: deleteEvent,
+ obj: obj,
+ gvk: kind,
+ }
+ gb.graphChanges.Add(event)
+ },
+ }
+
+ shared, err := gb.sharedInformers.ForResource(resource)
Do all controllers (gc, replicaset controller, etc.) always restart at the
same time? I think it's true, so i think the current PR works.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#45427 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABYov72nKW-NmwlRzlimz7wpDIJJw_ks5r6jvpgaJpZM4NSLxa>
.
|
No, the dummy controller is just to satisfy the interface.
…On Thu, May 18, 2017 at 6:11 PM Chao Xu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/controller/garbagecollector/graph_builder.go
<#45427 (comment)>
:
> + },
+ DeleteFunc: func(obj interface{}) {
+ // delta fifo may wrap the object in a cache.DeletedFinalStateUnknown, unwrap it
+ if deletedFinalStateUnknown, ok := obj.(cache.DeletedFinalStateUnknown); ok {
+ obj = deletedFinalStateUnknown.Obj
+ }
+ event := &event{
+ eventType: deleteEvent,
+ obj: obj,
+ gvk: kind,
+ }
+ gb.graphChanges.Add(event)
+ },
+ }
+
+ shared, err := gb.sharedInformers.ForResource(resource)
Thanks for the explanation, @ncdc <https://github.com/ncdc> .
this runs a controller, and is what the GC controller calls when it's
starting monitors. For shared informers, they return a dummy controller
whose Run() implementation is a no-op, so we're safe here.
Does this mean the gc will miss the ADD events for the objects that are
already in the sharedInformer before gc invokes the dummyController.Run()?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#45427 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABYh6Y1Z7gu0AkE8AywTShoUOwAhU9ks5r7MIlgaJpZM4NSLxa>
.
|
@ncdc you are right. They are handled by the listener and processor logic. Sorry i should had read the sharedInformer code more patiently. |
Bump, anyone have any other comments? |
I'm fine with |
Squashed |
@k8s-bot pull-kubernetes-federation-e2e-gce test this |
@ncdc: The following test(s) failed:
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Green tests (federation is non-blocking due to flakiness for now) |
I'd like one or two reviewers to chime in |
Definitely |
lgtm |
The gc changes lgtm. |
/lgtm as per reviewers |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ncdc, smarterclayton
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Automatic merge from submit-queue (batch tested with PRs 46201, 45952, 45427, 46247, 46062) |
Modify the garbage collector controller to try to use shared informers for resources, if possible, to reduce the number of unique reflectors listing and watching the same thing.
cc @kubernetes/sig-api-machinery-pr-reviews @caesarxuchao @deads2k @liggitt @sttts @smarterclayton @timothysc @soltysh @Kargakis @kubernetes/rh-cluster-infra @derekwaynecarr @wojtek-t @gmarek