-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate HelmReleaseProxy resources after move - potential race #188
Comments
Looking at the resources that are created again, I see that the The moved The moved resource: metadata:
name: cluster-autoscaler-self-hosted-18gxrd-7q6p4
namespace: self-hosted-wokpvl
labels:
cluster.x-k8s.io/cluster-name: self-hosted-18gxrd
helmreleaseproxy.addons.cluster.x-k8s.io/helmchartproxy-name: cluster-autoscaler-self-hosted-18gxrd
...
resourceVersion: "2729" The duplicate resource: metadata:
name: cluster-autoscaler-self-hosted-18gxrd-kmc7h
namespace: self-hosted-wokpvl
...
labels:
cluster.x-k8s.io/cluster-name: self-hosted-18gxrd
helmreleaseproxy.addons.cluster.x-k8s.io/helmchartproxy-name: cluster-autoscaler-self-hosted-18gxrd
resourceVersion: "2730" Is this a race in the cache? Should this check use a non-caching client perhaps? Should reconciliation detect duplicates and consolidate here? |
Thinking about this, I think the |
Thanks for bringing this to my attention. At a first glance, I'm not sure what's going on with the race condition, but I think it's fair to skip reconciliation for paused Clusters. I figured we'd already be doing that but it must have been an oversight. Did skipping reconciliation for paused Clusters fix the race condition, or do you think that's a separate issue? |
If the issue is with the name generation, an easy workaround could be to have the HCP controller set HRP.Spec.ReleaseName when it gets created, rather than the HRP controller setting it on its own and updating. |
This is not a problem with |
Can you help me understand why this reconciles on
ClusterUnpaused
ORClusterControlPlaneInitialized
and not on both of those (AND)? I've seen what looks like a race on move where I find multipleHelmReleaseProxies
for the sameHelmChartProxy
and cluster - labels are set up correctly and created at same time (albeit at 1s fidelity fromCreationTimestamp
so can't tell which is created first).I think that changing this logic to when a cluster is not paused AND the control plane has been initialized would solve this problem.I see that wouldn't work because the ClusterUnpaused explicitly requires the cluster to have previously been in unpaused state before to continue on update, which wouldn't be the case just following cluster control plane initialized.From the move logs you can see the order that causes this issue:
By the time the
HelmReleaseProxy
has been moved, a newHelmReleaseProxy
has already been created it seems.The text was updated successfully, but these errors were encountered: