Duplicate HelmReleaseProxy resources after move - potential race #188

jimmidyson · 2024-03-13T14:37:44Z

Can you help me understand why this reconciles on ClusterUnpaused OR ClusterControlPlaneInitialized and not on both of those (AND)? I've seen what looks like a race on move where I find multiple HelmReleaseProxies for the same HelmChartProxy and cluster - labels are set up correctly and created at same time (albeit at 1s fidelity from CreationTimestamp so can't tell which is created first).

~~I think that changing this logic to when a cluster is not paused AND the control plane has been initialized would solve this problem.~~ I see that wouldn't work because the ClusterUnpaused explicitly requires the cluster to have previously been in unpaused state before to continue on update, which wouldn't be the case just following cluster control plane initialized.

From the move logs you can see the order that causes this issue:

Creating Cluster="self-hosted-18gxrd" Namespace="self-hosted-wokpvl"
...
Creating HelmChartProxy="cluster-autoscaler-self-hosted-18gxrd" Namespace="self-hosted-wokpvl"
...
Creating HelmReleaseProxy="cluster-autoscaler-self-hosted-18gxrd-7q6p4" Namespace="self-hosted-wokpvl"

By the time the HelmReleaseProxy has been moved, a new HelmReleaseProxy has already been created it seems.

The text was updated successfully, but these errors were encountered:

jimmidyson · 2024-03-13T15:00:41Z

Looking at the resources that are created again, I see that the HelmReleaseProxy was moved before the new HelmReleaseProxy was created. Labels are the same on both resources and as such there should only be one. This is a common issue when using GenerateName

The moved HelmReleaseProxy has an earlier resourceVersion than the duplicate HelmReleaseProxy.

The moved resource:

metadata:
  name: cluster-autoscaler-self-hosted-18gxrd-7q6p4
  namespace: self-hosted-wokpvl
  labels:
    cluster.x-k8s.io/cluster-name: self-hosted-18gxrd
    helmreleaseproxy.addons.cluster.x-k8s.io/helmchartproxy-name: cluster-autoscaler-self-hosted-18gxrd
  ...
  resourceVersion: "2729"

The duplicate resource:

metadata:
  name: cluster-autoscaler-self-hosted-18gxrd-kmc7h
  namespace: self-hosted-wokpvl
  ...
  labels:
    cluster.x-k8s.io/cluster-name: self-hosted-18gxrd
    helmreleaseproxy.addons.cluster.x-k8s.io/helmchartproxy-name: cluster-autoscaler-self-hosted-18gxrd
  resourceVersion: "2730"

Is this a race in the cache? Should this check use a non-caching client perhaps? Should reconciliation detect duplicates and consolidate here?

jimmidyson · 2024-03-13T16:01:01Z

Thinking about this, I think the HelmChartProxy controller should skip clusters that are paused, just as they do for clusters that do not yet have control plane initialized. I will send in a PR - let me know what you think.

Jont828 · 2024-03-13T22:34:43Z

Thanks for bringing this to my attention. At a first glance, I'm not sure what's going on with the race condition, but I think it's fair to skip reconciliation for paused Clusters. I figured we'd already be doing that but it must have been an oversight. Did skipping reconciliation for paused Clusters fix the race condition, or do you think that's a separate issue?

Jont828 · 2024-03-13T22:35:49Z

If the issue is with the name generation, an easy workaround could be to have the HCP controller set HRP.Spec.ReleaseName when it gets created, rather than the HRP controller setting it on its own and updating.

jimmidyson · 2024-03-14T12:20:27Z

This is not a problem with HRP.Spec.ReleaseName but the HRP.Metadata.Name which always uses a generated name. I think move is the only way that this race can be hit (2 clients creating potentially duplicate HRPs at the same time with different names).

jimmidyson mentioned this issue Mar 13, 2024

✨ Wait for control plane to be initialized in HelmReleaseProxy controller #119

Merged

jimmidyson mentioned this issue Mar 13, 2024

🐛 Skip HelmChartProxy reconciliation for paused clusters #189

Merged

k8s-ci-robot closed this as completed in #189 Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate HelmReleaseProxy resources after move - potential race #188

Duplicate HelmReleaseProxy resources after move - potential race #188

jimmidyson commented Mar 13, 2024 •

edited

Loading

jimmidyson commented Mar 13, 2024 •

edited

Loading

jimmidyson commented Mar 13, 2024

Jont828 commented Mar 13, 2024

Jont828 commented Mar 13, 2024

jimmidyson commented Mar 14, 2024 •

edited

Loading

Duplicate HelmReleaseProxy resources after move - potential race #188

Duplicate HelmReleaseProxy resources after move - potential race #188

Comments

jimmidyson commented Mar 13, 2024 • edited Loading

jimmidyson commented Mar 13, 2024 • edited Loading

jimmidyson commented Mar 13, 2024

Jont828 commented Mar 13, 2024

Jont828 commented Mar 13, 2024

jimmidyson commented Mar 14, 2024 • edited Loading

jimmidyson commented Mar 13, 2024 •

edited

Loading

jimmidyson commented Mar 13, 2024 •

edited

Loading

jimmidyson commented Mar 14, 2024 •

edited

Loading