-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(kuma-cp) upsert with retry on conflict #1236
Conversation
Do we really want to retry upserting until the error is gone, taking into account that fresher Insights are more relatable? I'd rather ignore an error and let the next |
Ok, it could work like this when you have a ticker, but what about the case of updating the resource from different parts of code, like DataplaneInsight cert and stats? I think retry in Not sure about insights resyncer though. |
Yes, that makes sense to skip reties for dataplane/zone sink (and for insight resyncer as well). Maybe we can introduce an option |
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
28bed54
to
bd64fad
Compare
Ok, changed so
|
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com> (cherry picked from commit 5e6c524)
Signed-off-by: Jakub Dyszkiewicz <jakub.dyszkiewicz@gmail.com>
Problem
The problem right now affects mostly Kubernetes. When we enabled Kubernetes Client cache, the KubernetesStore is no longer consistent. if in 1 thread we execute Get, Update, Get, Update quickly enough, the second Get may not be fresh with proper Version for optimistic locking.
The problem technically can be also visible outside of Kubernetes if we are executing Upsert from different parts of the code. Right now we are Upserting DataplaneInsights from status tracker and SDS to update certs times.
I noticed this problem with DataplaneInsights when there are a lot of changes and dataplane status sink essentially is in the loop. Then it happens once every ~100 flushes.
Solution
UpdateForce()
toResourceStore#Update
. This one would ignore optimistic locking. Unfortunatelly I could not implement this on Kubernetes. Update cannot ignore this. Patch operation also cannot bypass it. The only option that potentially could bypass it is Patch of type Apply, but it is available since Kubernetes 1.18+Does not really solve the problem
I picked the third option since it seems to be most reasonable. I brought it as a required argument to Upsert to force users of the API to think of this specific case.
This is a draft to confirm I should proceed with this implementation.
In addition to this change, I want to increase the sink timer for Dataplane Insight and Zone Insight (as a separate PR) so we can try to avoid situations where sink is in the loop. The default 1s is really excessive for this.
Documentation