-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
controller/operators: label ConfigMaps, don't assume they are #3119
controller/operators: label ConfigMaps, don't assume they are #3119
Conversation
/approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, the reason behind the label filter is to reduce the memory usage that can be a problem with OLM as it literally caches every configmap on the cluster which can be a lot. In practice, OLM doesn't really care much about configmaps except for the ones that it creates for bundle unpack operation.
As you pointed out, it is possible for older configmaps not to have that label. However, there are codes to handle that situation by adding the label to those configmaps in case of cache miss. I wonder if there is a bug somewhere that prevents that logic from working. Did you observe this issue recently?
Either way, I have no problems with this fix as long as the filtering will be applied later to prevent OLM from caching every configmap.
@dinhxuanvu yes, there must be a bug in the code to label on cache miss, as we're seeing this in the wild. |
In the past, OLM moved to using a label selector to filter the informers that track ConfigMaps in the cluster. However, when this was done, previous ConfigMaps on the cluster that already existed were not labelled. Therefore, on old clusters there is a mix of data - ConfigMaps that OLM created and managed but has now forgotten since they are missing labels, and conformant objects with the label. We use ConfigMaps to track whether or not Jobs should be labelled - if a Job has an OwnerReference to a ConfigMap and the ConfigMap has an OwnerReference to an OLM GVK, we know that the Job is created and managed by OLM. During runtime, the two-hop lookup described above is done by using a ConfigMap informer, so we're light on client calls during the labelling phase of startup. However, before the recent labelling work went in, the ConfigMap informer was *already* filtered by label, so our lookups were dead-ends for the few old ConfigMaps that had never gotten labels in the past. However, on startup we use live clients to determine if there are unlabelled objects we need to handle, so we end up in a state where the live lookup can detect the errant Jobs but the informer-based labellers can't see them as needing labels. This commit is technically a performance regression, as it reverts the unequivocal ConfigMap informer filtering - we see all ConfigMaps on the cluster during startup, but continue to filter as expected once everything has labels. Ideally, we can come up with some policies for cleanup of things like these Jobs and ConfigMaps in the future; at a minimum all of the OLM objects should be labelled and visible to the OLM operators from here on out. Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
cf0c02c
to
a90ba66
Compare
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
7fbf8fd
to
db9172c
Compare
ConfigMaps provided for the internal source type are user-created and won't have our labels, so we need to use a live client to fetch them. Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
db9172c
to
39202f3
Compare
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
/lgtm |
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: awgreene, ncdc, stevekuznetsov The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
the flakes are so horrendous now, holy crap.
|
Our good friend |
Wow! three!
|
|
e2b3768
wow! |
In the past, OLM moved to using a label selector to filter the informers that track ConfigMaps in the cluster. However, when this was done, previous ConfigMaps on the cluster that already existed were not labelled. Therefore, on old clusters there is a mix of data - ConfigMaps that OLM created and managed but has now forgotten since they are missing labels, and conformant objects with the label.
We use ConfigMaps to track whether or not Jobs should be labelled - if a Job has an OwnerReference to a ConfigMap and the ConfigMap has an OwnerReference to an OLM GVK, we know that the Job is created and managed by OLM.
During runtime, the two-hop lookup described above is done by using a ConfigMap informer, so we're light on client calls during the labelling phase of startup. However, before the recent labelling work went in, the ConfigMap informer was already filtered by label, so our lookups were dead-ends for the few old ConfigMaps that had never gotten labels in the past. However, on startup we use live clients to determine if there are unlabelled objects we need to handle, so we end up in a state where the live lookup can detect the errant Jobs but the informer-based labellers can't see them as needing labels.
This commit is technically a performance regression, as it reverts the unequivocal ConfigMap informer filtering - we see all ConfigMaps on the cluster during startup, but continue to filter as expected once everything has labels.
Ideally, we can come up with some policies for cleanup of things like these Jobs and ConfigMaps in the future; at a minimum all of the OLM objects should be labelled and visible to the OLM operators from here on out.