-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to create shared index informers with custom key functions #114321
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Tue Dec 6 21:54:24 UTC 2022. |
aa625f0
to
13d80ad
Compare
13d80ad
to
41c597c
Compare
I don't understand how this can help non-KCP clusters? I think people need to make an informer per cluster and not mix multiple cluster's objects up in the same cache(s). |
/assign @lavalamp |
@lavalamp Apart from the KCP project, in SIG Cluster Lifecycle we've been talking of better supporting controllers that have to watch and list objects across every Cluster API cluster. For Cluster API specifically, we're looking to support ways to watch and list multiple clusters that a controller has access to, for example by RBAC permissions or namespace boundary. In turn, in Controller Runtime we'd have to have ways to know where the data is coming from. Note that this might be per-cluster, but also have other partitioning information information in the key and cache. Those are the main use cases, apart from those, the change in and of itself seems reasonable to avoid forks or copy-paste these utilities around; there are many precedents of allowing more customizations and passing in options. Is there a specific reason why we'd want to disallow customizing key functions in the informers specifically? |
OK but why would such objects get mixed in the same cache such that a better sharding function would be useful? It sounds like a major anti-pattern to me, any bug in the cache handling could cause data from one cluster to be sent to another cluster (ugly CVE potential!). I think there needs to be a more robust separation system. |
@lavalamp is there any novel mechanism for CVE there on top of what already exists with cross-namespace items in the cache? FWIW if informers expose |
@lavalamp Allowing different options to customize the internal machineries has been a normal pattern in the community. Can you clarify why such a change doesn't benefit the community given the use cases provided? cc @fedebongio |
|
If I have the ability from Cluster API to watch and list resources from multiple clusters, with a proxy of sorts through the cluster registry, I have to be able to understand where an object is coming from.
Sure! Although we have multiple needs to do so in Cluster API today, and we'd like that applications built on top of it to achieve similar ways in a more unified approach through Controller Runtime. For these applications running on what we call a management cluster, they already have access to multiple clusters through the Cluster API's kubeconfig, there is no more no less access. This is really just a way to improve our controllers watching multiple clusters, potentially build a way to proxy multiple watch/list calls, and streamline our fleet management story.
Just to give some feedback here, again if you put away the use cases at hand, why wouldn't we want to allow the customization of a single field in the shared index informer? This is a backward compatible change that avoids copying code and duplicating machinery. |
Looking through the commit in this package, a similar but way more extensive change was done to add support for Transformer functions:
|
I agree with this but I don't see how this PR is a step on the way to solving this problem.
I don't understand how this change helps. You have some knowledge I don't and are talking as if this is an established fact; I have been asking for an explanation but I keep hearing this repeated as an assertion without additional details. Maybe I am not being clear; "I don't understand X" is literal and means "therefore please explain it to me, link documents, etc"
I am not at all opposed to customization in the abstract. The problem is, every expansion of the surface area increases the burden on people maintaining this (testing, bugs, etc), and on people using it. So, I am obligated to make sure this ongoing cost has a corresponding benefit, to a reasonably large fraction of users.
|
Perfectly fine thought, we had similar issues in SIG Cluster Lifecycle. As I mentioned to @fedebongio, and as we discussed back in June/July, I'd be more than happy to step up and help in this and other areas, and mentor more people to help. If the maintenance burden is the key point, I'm happy to be added as a reviewer and eventually maintainer in this and every other area that need help. Does that help with your concerns? |
Maintenance help is of course greatly appreciated, but my primary concern is whether this actually helps with cluster API's stated goals or not; please give more detail. I am willing to pay the cost for features that will be broadly used and helpful! |
Preface
Details
|
The data is not actually duplicated, each leaf cluster's object differs at least in UID. The change you're asking for won't help the data get de-duplicated, it will help you put it in one cache but it won't reduce memory usage as there will still be N copies of each object (because each object is actually different). If you really intend to hook reflectors/informers from different clusters up to the same cache, I am pretty sure this is not the only code that would need to change to get that to work. The words you're using make the design sound very hypothetical still, I think generally we ask for things to go in the other order; demonstration / proof of concept before upstreaming. Does that seem reasonable? I don't think we'd want to put this in and then have Cluster API discover a better technique and not use it after all. |
If we have a controller watching a multitude of clusters, which then flow into a single Controller Runtime cache/reconciler/client, we have a single manager running multiple controllers achieving different goals, all reading the GVKs in question from a shared cache. In the current world, to achieve a similar goal, I'd need to deploy a single manager (in a pod) for each single Workload cluster, then have hacky ways to read also data from the Management cluster due to the lack of multi-cluster support in Controller Runtime.
Probably not, but again this is a valid use case that we're exploring and one that would simplify fleet management a lot. You'll have my full support in both reviewing and maintaining any code I contribute going forward, and help with maintenance tasks as well.
I'm confused at this point what else you'd need for this specific change. Regardless of Cluster API, using the Shared Informers with a custom function can be beneficial for other projects as well in the future. Controller Runtime can use this now though. Throughout this PR I've provided plenty of reasons and explanations why such a change would be beneficial, exploratory or not, like I mentioned in the PR description:
In the current state of things, only half the functionality is customizable, the other half isn't. I'd like to point out again that other PRs linked above have had similar reasons and more extensive changes than the one proposed and they didn't get into as many details that are being asked in this case; and no demonstration nor PoC, nor data, nor extreme justifications were demanded. |
As has come up multiple times over the years from sig-multi-cluster, sig-apimachinery, and sig-arch, the k/k repo (and by extension k/client-go) is focused on a single kubernetes clusters. Manipulating core primitives to support multi-cluster uses cases is an anti-goal and adds complexity to core. If new features are not useful for the mission of k/k, they do not belong in this library and can be built on top of our existing primitives in another library. The idea submitted in this PR appears to add risk on the interface we have today, without providing a clear benefit. If another library wishes to build on top, they can certain do so, but I don't think embedding here is the appropriate level. |
We're not adding or manipulating core primitives to add multi-cluster support within this library. That was the underlying use case that was asked for, but it's besides the point of customizability of an exposed library, which in this case it's a cache for clients.
Agreed! And Controller Runtime Cluster API are ever more popular k/k sub-projects that are focused on creating controllers on one side, and fleet of clusters on the other. Again, this change proposed doesn't affect k/k in any way and I'll happily provide support for the code contributed.
The benefit has been clarified over and over throughout the entire PR comments.
Are you suggesting that we as a community should prefer forks of this and other parts of Kubernetes and the inherent division that comes along with it? And to be very clear: Controller Runtime, which builds on top of client-go, cannot build anything on top to support multi-cluster use cases without the option proposed in this PR. We're not looking to add support for multi-cluster within this library, but adding opt-in options is a very well established pattern in this entire ecosystem; build the 80% use cases and make the 20% possible. That's what an inclusive community looks like. |
41c597c
to
fea3c7b
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vincepri The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
fea3c7b
to
6b088ca
Compare
72fb76e
to
cc6d113
Compare
Signed-off-by: Vince Prignano <vince@prigna.com>
cc6d113
to
c6c8df0
Compare
Things consuming informers as implemented by client-go can currently assume name/namespace are sufficient to distinguish one object in the cache from another. Making the key function indeterminate breaks that assumption. It becomes possible to pass an informer into existing controllers which will make their handling of objects retrieved from the cache incorrect. That doesn't seem like a neutral change.
Before adding flexibility/customization, it's reasonable to ask what the benefits and costs are. The costs appear to be less certainty/clarity around what makes an object in the cache unique, and possibility for resulting bugs/security issues in controllers. If the primary benefit is a newly enabled use case, evaluating whether that use case is one we want to encourage is reasonable. I don't think it's a good idea to expand this implementation to be able to emit objects that aren't sufficiently identified by name/namespace, but it's worth discussing. If that use case was desired, I'd look for an approach that protected against confused controller bugs, and also probably reconsidered some of the Store interface methods (like |
That makes sense and it's a good point, I'm wondering though if the
Just a small clarification here, the defaults would stay the same, it'd be up to implementers to potentially override behaviors of Informers using in specific clients, if they want to. As mentioned in the PR description, I'm happy to consider different paths/approaches and ultimately come together to an outcome that helps the community at large, some other ideas could be:
These would guarantee that existing implementations remain untouched in terms of key function internals and we'd still expose the ability to customize it, if subprojects might find that necessary. Does that sound like a good path forward?
A possible approach we can consider is the complete decoupling of Controller Runtime from client-go, although to achieve that we still need changes within this library to avoid duplicating the entire client-go package or repository. What I'm personally trying to avoid is create division and confusion within Kubernetes and how people build controllers. A revisit of the |
/close |
@vincepri: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Vince Prignano vince@prigna.com
What type of PR is this?
/kind feature
What this PR does / why we need it:
In the current state of things, a shared informer can only ever be created with the default KeyFunc which calls the
DeletionHandlingMetaNamespaceKeyFunc
function. In an effort to allow more customizability of the KeyFunc this PR adds a new exported function that allows users to pass in a custom KeyFunc, which is then wrapped with the DeletionHandling logic.This allows better compatibility when using custom indexers and key function logic, for example if I want to partition the data in the store by regions or availability zones, I can setup custom indexers, although the moment a key is retrieved through
GetByKey
, the partitioning logic doesn't work anymore, given that the items are keyed with namespace/name./sig api-machinery
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
The PR above is needed in controller runtime for better support for controllers spanning across multiple clusters, see this proposal which was accepted a while back.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: