Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ipcache/md: Handle deletions asynchronously
Now that the IPCache handles metadata updates incrementally, we can switch the deletion path over to use this same incremental update logic. This allows the users of the deletion APIs to inject updates into the ipcache in a similar manner to the users of the add APIs, and have the updates incrementally triggered into the subsequent subsystems (policy, datapath) consistently. This means that updates into the ipcache from both add and delete paths for the kube-apiserver policy feature will only actually occur from a single goroutine, triggered by the TriggerLabelInjection call. This removes the need to reason about multiple concurrent adds or deletions into the IPcache occurring from the kube-apiserver policy feature, and it also lays the path to do the same in future for CIDR policy and other subsystems. This way, each user of the IPCache can propagate the information that it intends to push into the IPCache, then the IPCache itself can decide how to handle those updates and how to combine information from the various subsystems. Some key observations here: * The previous commit actually includes 90% of the logic required to implement deletions, based partially on the previous 'add' code and partially on the code being deleted here. * What changes here is adding support in InjectLabels() for the case where a set of labels is removed from a prefix. If all labels are removed, then this results in the 'IPCache.metadata' map having no corresponding entry, so in this case the corresponding old identity currently in 'IPCache' should be removed from the selectorcache, policymaps, and the IPCache. * Caveat: This is only the case if _only_ the metadata map has references to the identity. At this point in time, CIDR policies for instance are not yet converted over to the metadata map approach for associating labels with prefixes, so that path may independently allocate their own identities. If those are still referenced from CIDR policy, then the label injector should simply remove references to the corresponding identities but not remove it from the ipcache. * Another case is when there are some set of labels (eg A, B) associated with a prefix, then one set (eg B) is removed. The result is that a previous identity with labels (A, B) must be removed, and a new identity with labels (A) should be allocated / associated with the prefix. In general, this is very similar to the existing case where a set of labels is expanded by associating new labels with the prefix (already handled in the previous commit). * This also has a curly case: Each set of labels has a source associated, for instance initially there could be "remote-node" (source: custom-resource) and "kube-apiserver" (source: kube-apiserver). When previously upserting into the IPCache, the source will be kube-apiserver. If the kube apiserver is no longer associated with the IP, and hence that label removed, then the resulting set of labels will only be "remote-node", with source "custom-resource". Given that the source "custom-resource" has a lower priority in pkg/source than kube-apiserver source, we cannot update the ipcache directly with the new set of labels using the "custom-resource" source. However, the label removal is still legitimate. To work around the clunky APIs, the function here just overrides the source check in the IPCache.Upsert(). We should be able to remove this clunkiness over time when the metadata map is the primary source of information for prefixes, but more refactoring is necessary to get to that point. * Now that the label removal doesn't have its own independent logic from a separate goroutine, there is no longer a need to use the 'applyChangesMU' mutex in the metadata cache to ensure safety around the critical section. Furthermore, the core InjectLabels() call doesn't modify the metadata map. So, we can remove one lock and reduce the other lock down to a read mutex rather than holding it for write. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io>
- Loading branch information