Updating the in-cluster autopilot config while a helm chart is being created results in the chart being created twice #4047

laverya · 2024-02-09T17:01:44Z

Before creating an issue, make sure you've checked the following:

You are running the latest released version of k0s
Make sure you've searched for existing issues, both open and closed
Make sure you've searched for PRs too, a fix might've been merged already
You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 6.2.0-1019-azure #19~22.04.1-Ubuntu SMP Wed Jan 10 22:57:03 UTC 2024 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version

v1.29.1+k0s.1

Sysinfo

k0s sysinfo

Machine ID: "ae13f7395464d07eda6ba68f7e94b56d64b68c4b3527b66ccb7ab4868f9c13b1" (from machine) (pass) Total memory: 15.6 GiB (pass) Disk space available for /var/lib/k0s: 24.3 GiB (pass) Name resolution: localhost: [::1 127.0.0.1] (pass) Operating system: Linux (pass) Linux kernel release: 6.2.0-1019-azure (pass) Max. file descriptors per process: current: 1048576 / max: 1048576 (pass) AppArmor: active (pass) Executable in PATH: modprobe: /usr/sbin/modprobe (pass) Executable in PATH: mount: /usr/bin/mount (pass) Executable in PATH: umount: /usr/bin/umount (pass) /proc file system: mounted (0x9fa0) (pass) Control Groups: version 2 (pass) cgroup controller "cpu": available (is a listed root controller) (pass) cgroup controller "cpuacct": available (via cpu in version 2) (pass) cgroup controller "cpuset": available (is a listed root controller) (pass) cgroup controller "memory": available (is a listed root controller) (pass) cgroup controller "devices": available (device filters attachable) (pass) cgroup controller "freezer": available (cgroup.freeze exists) (pass) cgroup controller "pids": available (is a listed root controller) (pass) cgroup controller "hugetlb": available (is a listed root controller) (pass) cgroup controller "blkio": available (via io in version 2) (pass) CONFIG_CGROUPS: Control Group support: no kernel config found (warning) CONFIG_NAMESPACES: Namespaces support: no kernel config found (warning) CONFIG_NET: Networking support: no kernel config found (warning) CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: no kernel config found (warning) CONFIG_PROC_FS: /proc file system support: no kernel config found (warning)

What happened?

A helm chart was 'created' twice, resulting in an error. I believe this happened because the clusterconfig object was updated while the helm chart was being initially created. (Dynamic config is being used here)

Steps to reproduce

Install a cluster with dynamic config, and multiple charts in the config object
Modify those charts while creation is in progress
Observe that sometimes one chart will not be successfully installed and will have an error status

Expected behavior

Charts are not applied twice in parallel.

Actual behavior

Charts are applied twice in parallel, resulting in helm errors.

A chart that has a status.error can't install loadedChart ```ingress-nginx```: cannot re-use a name that is still in use cannot be further updated by k0s, because it will attempt to recreate the chart once again.

Screenshots and logs

the clusterconfig, with errors:

apiVersion: v1
items:
- apiVersion: k0s.k0sproject.io/v1beta1
  kind: ClusterConfig
  metadata:
    creationTimestamp: "2024-02-09T16:36:34Z"
    generation: 2
    name: k0s
    namespace: kube-system
    resourceVersion: "1489"
    uid: 7bb36730-9c81-4486-bd41-eff56d4f62aa
  spec:
    api:
      address: 10.244.239.133
      k0sApiPort: 9443
      port: 6443
      sans:
      - 10.244.239.133
      - fe80::5c7c:63ff:feb5:4b47
    controllerManager: {}
    extensions:
      helm:
        charts:
        - chartname: oci://registry.replicated.com/library/admin-console
          name: admin-console
          namespace: kotsadm
          order: 3
          timeout: 0
          values: |
            automation:
              appVersionLabel: 0.1.9
              license:
                data: |
                  apiVersion: kots.io/v1beta1
                  kind: License
                  metadata:
                    creationTimestamp: null
                    name: on-prtestcustomer
                  spec:
                    appSlug: embedded-cluster-smoke-test-staging-app
                    channelID: 2bVjTIz1TkO8pCwy6ipVLGwHpGX
                    channelName: on-pr
                    customerName: on-pr test customer
                    endpoint: https://staging.replicated.app
                    entitlements:
                      expires_at:
                        description: License Expiration
                        title: Expiration
                        value: ""
                        valueType: String
                    isAirgapSupported: true
                    licenseID: 2bVjmMR5t2QmFMQYOo04v4tsj9N
                    licenseSequence: 1
                    licenseType: dev
                    signature: omitted
                  status: {}
                slug: embedded-cluster-smoke-test-staging-app
            embeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd
            isHelmManaged: false
            kurlProxy:
              enabled: true
              nodePort: 30000
            minimalRBAC: false
            service:
              enabled: false
          version: 1.107.2
        - chartname: ingress-nginx/ingress-nginx
          name: ingress-nginx
          namespace: ingress-nginx
          order: 4
          timeout: 0
          values: |
            controller:
              service:
                type: NodePort
                nodePorts:
                  http: "80"
                  https: "443"
          version: 4.8.3
        - chartname: openebs/openebs
          name: openebs
          namespace: openebs
          order: 1
          timeout: 0
          values: |
            localprovisioner:
              deviceClass:
                enabled: false
              hostpathClass:
                isDefaultClass: true
            ndm:
              enabled: false
            ndmOperator:
              enabled: false
          version: 3.10.0
        - chartname: oci://registry.replicated.com/library/embedded-cluster-operator
          name: embedded-cluster-operator
          namespace: embedded-cluster
          order: 2
          timeout: 0
          values: |
            embeddedBinaryName: embedded-cluster
            embeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd
            embeddedClusterK0sVersion: v1.29.1+k0s.1
            embeddedClusterVersion: dev-777b11e
            kotsVersion: 1.107.2
          version: 0.22.5
        concurrencyLevel: 1
        repositories:
        - caFile: ""
          certFile: ""
          insecure: false
          keyfile: ""
          name: openebs
          password: ""
          url: https://openebs.github.io/charts
          username: ""
        - caFile: ""
          certFile: ""
          insecure: false
          keyfile: ""
          name: ingress-nginx
          password: ""
          url: https://kubernetes.github.io/ingress-nginx
          username: ""
      storage:
        create_default_storage_class: false
        type: external_storage
    images:
      calico:
        cni:
          image: quay.io/k0sproject/calico-cni
          version: v3.26.1-1
        kubecontrollers:
          image: quay.io/k0sproject/calico-kube-controllers
          version: v3.26.1-1
        node:
          image: quay.io/k0sproject/calico-node
          version: v3.26.1-1
      coredns:
        image: quay.io/k0sproject/coredns
        version: 1.11.1
      default_pull_policy: IfNotPresent
      konnectivity:
        image: quay.io/k0sproject/apiserver-network-proxy-agent
        version: v0.1.4
      kubeproxy:
        image: quay.io/k0sproject/kube-proxy
        version: v1.28.4
      kuberouter:
        cni:
          image: quay.io/k0sproject/kube-router
          version: v1.6.0-iptables1.8.9-0
        cniInstaller:
          image: quay.io/k0sproject/cni-node
          version: 1.1.1-k0s.1
      metricsserver:
        image: registry.k8s.io/metrics-server/metrics-server
        version: v0.6.4
      pause:
        image: registry.k8s.io/pause
        version: "3.8"
      pushgateway:
        image: quay.io/k0sproject/pushgateway-ttl
        version: 1.4.0-k0s.0
    installConfig:
      users:
        etcdUser: etcd
        kineUser: kube-apiserver
        konnectivityUser: konnectivity-server
        kubeAPIserverUser: kube-apiserver
        kubeSchedulerUser: kube-scheduler
    konnectivity:
      adminPort: 8133
      agentPort: 8132
    network:
      calico:
        flexVolumeDriverPath: /usr/libexec/k0s/kubelet-plugins/volume/exec/nodeagent~uds
        mode: vxlan
        mtu: 0
        overlay: Always
        vxlanPort: 4789
        vxlanVNI: 4096
        wireguard: false
      clusterDomain: cluster.local
      dualStack: {}
      kubeProxy:
        iptables:
          minSyncPeriod: 0s
          syncPeriod: 0s
        ipvs:
          minSyncPeriod: 0s
          syncPeriod: 0s
          tcpFinTimeout: 0s
          tcpTimeout: 0s
          udpTimeout: 0s
        metricsBindAddress: 0.0.0.0:10249
        mode: iptables
      kuberouter:
        autoMTU: true
        hairpin: Enabled
        ipMasq: false
        metricsPort: 8080
        mtu: 0
        peerRouterASNs: ""
        peerRouterIPs: ""
      nodeLocalLoadBalancing:
        envoyProxy:
          apiServerBindPort: 7443
          image:
            image: quay.io/k0sproject/envoy-distroless
            version: v1.29.0
          konnectivityServerBindPort: 7132
        type: EnvoyProxy
      podCIDR: 10.244.0.0/16
      provider: calico
      serviceCIDR: 10.96.0.0/12
    scheduler: {}
    storage:
      etcd:
        peerAddress: 10.244.239.133
      type: etcd
    telemetry:
      enabled: false
kind: List
metadata:
  resourceVersion: ""
operator logs:
2024-02-09T16:39:51Z	INFO	chart errors	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "20103e8f-bf5e-4055-bcf6-6020532c01c9", "errors": "failed to update helm charts: can't install loadedChart `ingress-nginx`: cannot re-use a name that is still in use"}
2024-02-09T16:39:51Z	INFO	Installation reconciliation ended	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "20103e8f-bf5e-4055-bcf6-6020532c01c9"}
2024-02-09T16:40:32Z	INFO	Reconciling installation	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "1cd2b51d-19fc-4ccc-8316-dc0b7b57042a"}
2024-02-09T16:40:32Z	INFO	Reconciling addons	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "1cd2b51d-19fc-4ccc-8316-dc0b7b57042a"}
2024-02-09T16:40:32Z	INFO	chart errors	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "1cd2b51d-19fc-4ccc-8316-dc0b7b57042a", "errors": "failed to update helm charts: can't install loadedChart `ingress-nginx`: cannot re-use a name that is still in use"}
2024-02-09T16:40:32Z	INFO	Installation reconciliation ended	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "1cd2b51d-19fc-4ccc-8316-dc0b7b57042a"}
2024-02-09T16:41:54Z	INFO	Reconciling installation	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "86d8d24a-8d35-4040-afc6-ab1f5a8e1d8e"}
2024-02-09T16:41:54Z	INFO	Reconciling addons	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "86d8d24a-8d35-4040-afc6-ab1f5a8e1d8e"}
2024-02-09T16:41:54Z	INFO	chart errors	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "86d8d24a-8d35-4040-afc6-ab1f5a8e1d8e", "errors": "failed to update helm charts: can't install loadedChart `ingress-nginx`: cannot re-use a name that is still in use"}
2024-02-09T16:41:54Z	INFO	Installation reconciliation ended	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "86d8d24a-8d35-4040-afc6-ab1f5a8e1d8e"}

the installed helm charts (the ingress-nginx chart has the error):

apiVersion: v1
items:
- apiVersion: helm.k0sproject.io/v1beta1
  kind: Chart
  metadata:
    annotations:
      k0s.k0sproject.io/last-applied-configuration: |
        {"apiVersion":"helm.k0sproject.io/v1beta1","kind":"Chart","metadata":{"finalizers":["helm.k0sproject.io/uninstall-helm-release"],"name":"k0s-addon-chart-admin-console","namespace":"kube-system"},"spec":{"chartName":"oci://registry.replicated.com/library/admin-console","namespace":"kotsadm","releaseName":"admin-console","timeout":"0s","values":"\nautomation:\n  appVersionLabel: 0.1.9\n  license:\n    data: |\n      apiVersion: kots.io/v1beta1\n      kind: License\n      metadata:\n        creationTimestamp: null\n        name: on-prtestcustomer\n      spec:\n        appSlug: embedded-cluster-smoke-test-staging-app\n        channelID: 2bVjTIz1TkO8pCwy6ipVLGwHpGX\n        channelName: on-pr\n        customerName: on-pr test customer\n        endpoint: https://staging.replicated.app\n        entitlements:\n          expires_at:\n            description: License Expiration\n            title: Expiration\n            value: \"\"\n            valueType: String\n        isAirgapSupported: true\n        licenseID: 2bVjmMR5t2QmFMQYOo04v4tsj9N\n        licenseSequence: 1\n        licenseType: dev\n        signature: omitted\n      status: {}\n    slug: embedded-cluster-smoke-test-staging-app\nembeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd\nisHelmManaged: false\nkurlProxy:\n  enabled: true\n  nodePort: 30000\nminimalRBAC: false\nservice:\n  enabled: false\n","version":"1.107.2"}}
      k0s.k0sproject.io/stack-checksum: 88480fbd2e8349d5cf20f6bbd4f4020d
    creationTimestamp: "2024-02-09T16:36:38Z"
    finalizers:
    - helm.k0sproject.io/uninstall-helm-release
    generation: 2
    labels:
      k0s.k0sproject.io/stack: helm
    name: k0s-addon-chart-admin-console
    namespace: kube-system
    resourceVersion: "1802"
    uid: 936cbfd5-381c-456e-aa42-d1d2d8885e5f
  spec:
    chartName: oci://registry.replicated.com/library/admin-console
    namespace: kotsadm
    releaseName: admin-console
    timeout: 0s
    values: |2

      automation:
        appVersionLabel: 0.1.9
        license:
          data: |
            apiVersion: kots.io/v1beta1
            kind: License
            metadata:
              creationTimestamp: null
              name: on-prtestcustomer
            spec:
              appSlug: embedded-cluster-smoke-test-staging-app
              channelID: 2bVjTIz1TkO8pCwy6ipVLGwHpGX
              channelName: on-pr
              customerName: on-pr test customer
              endpoint: https://staging.replicated.app
              entitlements:
                expires_at:
                  description: License Expiration
                  title: Expiration
                  value: ""
                  valueType: String
              isAirgapSupported: true
              licenseID: 2bVjmMR5t2QmFMQYOo04v4tsj9N
              licenseSequence: 1
              licenseType: dev
              signature: omitted
            status: {}
          slug: embedded-cluster-smoke-test-staging-app
      embeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd
      isHelmManaged: false
      kurlProxy:
        enabled: true
        nodePort: 30000
      minimalRBAC: false
      service:
        enabled: false
    version: 1.107.2
  status:
    appVersion: 1.107.2
    namespace: kotsadm
    releaseName: admin-console
    revision: 2
    updated: 2024-02-09 16:39:31.016168254 +0000 UTC m=+59.884245154
    valuesHash: 5aab5a4aed2df2490f5369e776d44a432e7f4e07bb8d9f8b5f664f55babbfb2a
    version: 1.107.2
- apiVersion: helm.k0sproject.io/v1beta1
  kind: Chart
  metadata:
    annotations:
      k0s.k0sproject.io/last-applied-configuration: |
        {"apiVersion":"helm.k0sproject.io/v1beta1","kind":"Chart","metadata":{"finalizers":["helm.k0sproject.io/uninstall-helm-release"],"name":"k0s-addon-chart-embedded-cluster-operator","namespace":"kube-system"},"spec":{"chartName":"oci://registry.replicated.com/library/embedded-cluster-operator","namespace":"embedded-cluster","releaseName":"embedded-cluster-operator","timeout":"0s","values":"\nembeddedBinaryName: embedded-cluster\nembeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd\nembeddedClusterK0sVersion: v1.29.1+k0s.1\nembeddedClusterVersion: dev-777b11e\nkotsVersion: 1.107.2\n","version":"0.22.5"}}
      k0s.k0sproject.io/stack-checksum: 4c93dcd1df9d6de6df2a345537ce4a82
    creationTimestamp: "2024-02-09T16:36:38Z"
    finalizers:
    - helm.k0sproject.io/uninstall-helm-release
    generation: 2
    labels:
      k0s.k0sproject.io/stack: helm
    name: k0s-addon-chart-embedded-cluster-operator
    namespace: kube-system
    resourceVersion: "1622"
    uid: 9740d5f0-4796-4a28-922c-b4bb6228d23c
  spec:
    chartName: oci://registry.replicated.com/library/embedded-cluster-operator
    namespace: embedded-cluster
    releaseName: embedded-cluster-operator
    timeout: 0s
    values: |2

      embeddedBinaryName: embedded-cluster
      embeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd
      embeddedClusterK0sVersion: v1.29.1+k0s.1
      embeddedClusterVersion: dev-777b11e
      kotsVersion: 1.107.2
    version: 0.22.5
  status:
    namespace: embedded-cluster
    releaseName: embedded-cluster-operator
    revision: 2
    updated: 2024-02-09 16:39:06.655903745 +0000 UTC m=+35.523980665
    valuesHash: 9e936e9f6eb13b1fb933921c1a5b1fcd77d61768472bfac6341881455188e139
    version: 0.22.5
- apiVersion: helm.k0sproject.io/v1beta1
  kind: Chart
  metadata:
    annotations:
      k0s.k0sproject.io/last-applied-configuration: |
        {"apiVersion":"helm.k0sproject.io/v1beta1","kind":"Chart","metadata":{"finalizers":["helm.k0sproject.io/uninstall-helm-release"],"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"},"spec":{"chartName":"ingress-nginx/ingress-nginx","namespace":"ingress-nginx","releaseName":"ingress-nginx","timeout":"0s","values":"\ncontroller:\n  service:\n    type: NodePort\n    nodePorts:\n      http: \"80\"\n      https: \"443\"\n","version":"4.8.3"}}
      k0s.k0sproject.io/stack-checksum: 4976759ef6f053e7134c038d493a38cf
    creationTimestamp: "2024-02-09T16:36:38Z"
    finalizers:
    - helm.k0sproject.io/uninstall-helm-release
    generation: 1
    labels:
      k0s.k0sproject.io/stack: helm
    name: k0s-addon-chart-ingress-nginx
    namespace: kube-system
    resourceVersion: "2314"
    uid: d27f0ce7-e8c7-416f-a691-8f29f8a478ff
  spec:
    chartName: ingress-nginx/ingress-nginx
    namespace: ingress-nginx
    releaseName: ingress-nginx
    timeout: 0s
    values: |2

      controller:
        service:
          type: NodePort
          nodePorts:
            http: "80"
            https: "443"
    version: 4.8.3
  status:
    error: 'can''t install loadedChart `ingress-nginx`: cannot re-use a name that
      is still in use'
    updated: 2024-02-09 16:41:54.824811545 +0000 UTC m=+203.692888445
    valuesHash: 6183ed64f0eb5e20401df18c636d397fe783bbd76f5684aeaa4f9671fc7fb561
- apiVersion: helm.k0sproject.io/v1beta1
  kind: Chart
  metadata:
    annotations:
      k0s.k0sproject.io/last-applied-configuration: |
        {"apiVersion":"helm.k0sproject.io/v1beta1","kind":"Chart","metadata":{"finalizers":["helm.k0sproject.io/uninstall-helm-release"],"name":"k0s-addon-chart-openebs","namespace":"kube-system"},"spec":{"chartName":"openebs/openebs","namespace":"openebs","releaseName":"openebs","timeout":"0s","values":"\nlocalprovisioner:\n  deviceClass:\n    enabled: false\n  hostpathClass:\n    isDefaultClass: true\nndm:\n  enabled: false\nndmOperator:\n  enabled: false\n","version":"3.10.0"}}
      k0s.k0sproject.io/stack-checksum: d01c1c23c102be75743e7e6435310d17
    creationTimestamp: "2024-02-09T16:36:37Z"
    finalizers:
    - helm.k0sproject.io/uninstall-helm-release
    generation: 1
    labels:
      k0s.k0sproject.io/stack: helm
    name: k0s-addon-chart-openebs
    namespace: kube-system
    resourceVersion: "1441"
    uid: eabb4703-4d42-4d2b-8add-a2556491b56b
  spec:
    chartName: openebs/openebs
    namespace: openebs
    releaseName: openebs
    timeout: 0s
    values: |2

      localprovisioner:
        deviceClass:
          enabled: false
        hostpathClass:
          isDefaultClass: true
      ndm:
        enabled: false
      ndmOperator:
        enabled: false
    version: 3.10.0
  status:
    appVersion: 3.10.0
    namespace: openebs
    releaseName: openebs
    revision: 1
    updated: 2024-02-09 16:38:35.936221183 +0000 UTC m=+4.804298103
    valuesHash: 6cc59a285465ad2e97237da6efd7972daa3deeb90613fc8a18c8772739723bf7
    version: 3.10.0
kind: List
metadata:
  resourceVersion: ""

Additional context

No response

The text was updated successfully, but these errors were encountered:

laverya · 2024-02-09T17:25:55Z

I would expect this to be resolved on single-node if we added a sync.Mutex to

k0s/pkg/component/controller/extensions_controller.go

Line 206 in d8e1ba4

    
           func (cr *ChartReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {

but that wouldn't fully handle the multi-node case. (we're seeing this while bootstrapping a cluster from 0, so there's only one node)

laverya · 2024-02-13T23:11:07Z

I tested adding a mutex to the Reconcile function - no improvement, which was a surprise to me! I know that the cannot re-use a name that is still in use error comes from calling 'install' when a chart already exists: https://github.com/helm/helm/blob/169561a1b381ae1a6a3974d84c303f19f324ffa0/pkg/action/install.go#L531

But I'm not sure how that would happen besides a race where Reconcile is running twice for the same chart object, and the mutex should have fixed that if there's only one copy running.

laverya · 2024-02-14T03:07:40Z

I tried an alternate solution, that being to call upgrade if a cannot re-use a name which is already in use error was returned.

I got failed to update helm charts: can't upgrade loadedChart ```ingress-nginx```: \"ingress-nginx\" has no deployed releases instead 😆 (probably because the secret had been created, but it was not yet in a deployed state?)

laverya · 2024-02-15T01:19:32Z

I'm becoming less sure that this is triggered by making an update to the cluster config while the chart apply is still ongoing - I should look at the list of things that can trigger a reconcile 😄

laverya · 2024-02-15T03:16:03Z

This may be from running systemctl restart k0scontroller.service immediately after waiting for the first few (but not all!) of the charts to deploy pods

Edit: looks like both restarting the service and editing the config can trigger the same bug

ricardomaraschini · 2024-02-15T13:40:27Z

What follows is a theory for a possible cause for this race condition:

An install of a Helm Chart only happens when the Status.Release for the given chart is empty. The update in the Helm Chart Status.ReleaseName only happens at the end of the function (here or here).

What would happen if something patched the Helm Chart spec section while the Helm Install is ongoing ? That would trigger a Conflict when attempting to save the Chart Status in the cluster so the Status.ReleaseName would not be updated and in the next reconcile the Controller will attempt to install the chart once again.

I inspect k0s logs and I noticed that this error is happening quite often with different charts:

Feb 15 11:40:46 ec k0s[28576]: time="2024-02-15 11:40:46" level=error msg="Failed to update status for chart releasek0s-addon-chart-memcached" component=extensions_controller error="Operation cannot be fulfilled on charts.helm.k0sproject.io \"k0s-addon-chart-memcached\": the object has been modified; please apply your changes to the latest version and try again" extensions_type=he

This seems to indicates that SOMETHING has changed the Chart object while the chart was being reconciled by the controller and the Status could not be updated in the cluster.

Looking at the logic used to update those charts objects it seems like this situation is possible as it just dumps them in a directory and something else loads them into the cluster.

So, in other words:

A new chart is created in the cluster config.
The chart install has started.
Before the end of the chart installation the chart in the cluster config is updated.
This last change dumps a new Chart yaml into the manifests directory.
Something loads the manifest into the cluster.
The install finishes.
The controller attempts to update the Chart status and see a conflict.
During the next Chart reconcile it attempts to install the chart again and we see the issue.

Does this make sense ?

ricardomaraschini · 2024-02-15T13:56:52Z

I manage to reproduce this by adding a new chart to the Cluster Config and then just after saving it I edited the manifest file for the chart in the /var/lib/k0s/manifests directory and changed something. The chart now is on

{
  "error": "can't install loadedChart `memcached`: cannot re-use a name that is still in use",
  "updated": "2024-02-15 13:54:58.207462269 +0000 UTC m=+386.995226471",
  "valuesHash": "201c25ad8c659602ec3934d0b3153586f112da4406a2b683587aef4a76390beb"
}

Inspecting the k0s log this was logged:

Feb 15 11:32:32 ec k0s[28576]: time="2024-02-15 11:32:32" level=error msg="Failed to update status for chart releasek0s-addon-chart-memcached" component=extensions_controller error="Operation cannot be fulfilled on charts.helm.k0sproject.io \"k0s-addon-chart-memcached\": the object has been modified; please apply your changes to the latest version and try again" extensions_type=helm

It might be the case that the theory is actually right.

ricardomaraschini · 2024-02-15T19:57:05Z

@jnummelin I have raised a possible workaround for the problem on #4064, please let me know what you think.

ggolin · 2024-03-13T15:39:43Z

I manage to reproduce this by adding a new chart to the Cluster Config and then just after saving it I edited the manifest file for the chart in the /var/lib/k0s/manifests directory and changed something. The chart now is on
{
  "error": "can't install loadedChart `memcached`: cannot re-use a name that is still in use",
  "updated": "2024-02-15 13:54:58.207462269 +0000 UTC m=+386.995226471",
  "valuesHash": "201c25ad8c659602ec3934d0b3153586f112da4406a2b683587aef4a76390beb"
}
Inspecting the k0s log this was logged:
Feb 15 11:32:32 ec k0s[28576]: time="2024-02-15 11:32:32" level=error msg="Failed to update status for chart releasek0s-addon-chart-memcached" component=extensions_controller error="Operation cannot be fulfilled on charts.helm.k0sproject.io \"k0s-addon-chart-memcached\": the object has been modified; please apply your changes to the latest version and try again" extensions_type=helm
It might be the case that the theory is actually right.

This is also reproducible with 1.29 and 1.28 by merely adjusting the chart values via the config object and applying k0s.yaml again. For example:

kind: Cluster
metadata:
  name: k0s-cluster
spec:
  hosts:
  - ssh:
      address: 1.2.3.4
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
    files:
      - name: metallb-crd
        src: manifests/metallb-crd
        dstDir: /var/lib/k0s/manifests/metallb-crd
      - name: wildcard-cert
        src: manifests/wildcard-cert
        dstDir: /var/lib/k0s/manifests/wildcard-cert
  - ssh:
      address: 1.2.3.5
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
  - ssh:
      address: 1.2.3.6
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
  - ssh:
      address: 1.2.3.7
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  - ssh:
      address: 1.2.3.8
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  - ssh:
      address: 1.2.3.9
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  k0s:
    version: v1.28.7+k0s.0
    versionChannel: stable
    dynamicConfig: false
    config:
      spec:
        extensions:
          helm:
            charts:
            - chartname: appscode/gateway-api
              name: gateway-api
              namespace: kube-system
              order: 0
              version: v1.0.0
            - chartname: metallb/metallb
              name: metallb
              namespace: metallb
              order: 1
            - chartname: oci://private-registry-redacted.com/helm-charts/lib
              name: echo-app
              namespace: default
              version: 0.1.0
              order: 3
              values: |2
                image:
                  repository: library/whoami
                  registry: private-registry-redacted.com
                  tag: latest
                ingress:
                  enabled: true
                  hostname: echo.poligon-lb1.redacted.com
                  tls: true
                  existingSecret: wildcard-poligon-lb1-redacted-com-cert
                ports:
                - name: http
                  containerPort: 80
                  protocol: TCP  
            - chartname: oci://private-registry-redacted.com/helm-charts/nginx-ingress-controller
              name: nginx-ingress-controller
              namespace: nginx-ingress
              order: 2
              values: |
                image:
                  registry: private-registry-redacted.com
                  tag: 1.9.6-debian-12-r8
                  repository: registry/nginx-ingress-controller
                defaultBackend:
                  image:
                    registry: private-registry-redacted.com
                    tag: 1.24.0-debian-11-r5
                    repository: library/bitnami/nginx
              version: 10.7.0
            repositories:
            - name: metallb
              url: https://metallb.github.io/metallb
            - name: appscode
              url: https://charts.appscode.com/stable
        network:
          nodeLocalLoadBalancing:
            enabled: true
            type: EnvoyProxy
          provider: calico

And then adding a value to the whoami-app chart (ingressClassName was missing above):

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s-cluster
spec:
  hosts:
  - ssh:
      address: 1.2.3.4
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
    files:
      - name: metallb-crd
        src: manifests/metallb-crd
        dstDir: /var/lib/k0s/manifests/metallb-crd
      - name: wildcard-cert
        src: manifests/wildcard-cert
        dstDir: /var/lib/k0s/manifests/wildcard-cert
  - ssh:
      address: 1.2.3.5
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
  - ssh:
      address: 1.2.3.6
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
  - ssh:
      address: 1.2.3.7
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  - ssh:
      address: 1.2.3.8
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  - ssh:
      address: 1.2.3.9
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  k0s:
    version: v1.28.7+k0s.0
    versionChannel: stable
    dynamicConfig: false
    config:
      spec:
        extensions:
          helm:
            charts:
            - chartname: appscode/gateway-api
              name: gateway-api
              namespace: kube-system
              order: 0
              version: v1.0.0
            - chartname: metallb/metallb
              name: metallb
              namespace: metallb
              order: 1
            - chartname: oci://private-registry-redacted.com/helm-charts/lib
              name: echo-app
              namespace: default
              version: 0.1.0
              order: 3
              values: |2
                image:
                  repository: library/whoami
                  registry: private-registry-redacted.com
                  tag: latest
                ingress:
                  enabled: true
                  hostname: echo.poligon-lb1.redacted.com
                  tls: true
                  existingSecret: wildcard-poligon-lb1-redacted-com-cert
                  ingressClassName: nginx # -- new value!
                ports:
                - name: http
                  containerPort: 80
                  protocol: TCP  
            - chartname: oci://private-registry-redacted.com/helm-charts/nginx-ingress-controller
              name: nginx-ingress-controller
              namespace: nginx-ingress
              order: 2
              values: |
                image:
                  registry: private-registry-redacted.com
                  tag: 1.9.6-debian-12-r8
                  repository: registry/nginx-ingress-controller
                defaultBackend:
                  image:
                    registry: private-registry-redacted.com
                    tag: 1.24.0-debian-11-r5
                    repository: library/bitnami/nginx
              version: 10.7.0
            repositories:
            - name: metallb
              url: https://metallb.github.io/metallb
            - name: appscode
              url: https://charts.appscode.com/stable
        network:
          nodeLocalLoadBalancing:
            enabled: true
            type: EnvoyProxy
          provider: calico

Applying this config does not work as I would expect (helm upgrade since I added a value), and the chart crd now looks like this:

apiVersion: helm.k0sproject.io/v1beta1
kind: Chart
metadata:
  annotations:
    k0s.k0sproject.io/last-applied-configuration: |
      {"apiVersion":"helm.k0sproject.io/v1beta1","kind":"Chart","metadata":{"finalizers":["helm.k0sproject.io/uninstall-helm-release"],"name":"k0s-addon-chart-echo-app","namespace":"kube-system"},"spec":{"chartName":"oci://private-registry-redacted.com/helm-charts/lib","namespace":"default","releaseName":"echo-app","timeout":"0s","values":"\nimage:\n  repository: library/whoami\n  registry: private-registry-redacted.com\n  tag: latest\ningress:\n  enabled: true\n  hostname: echo.poligon-lb1.redacted.com\n  tls: true\n  existingSecret: wildcard-poligon-lb1-redacted-com-cert\n  ingressClassName: nginx\nports:\n- name: http\n  containerPort: 80\n  protocol: TCP  \n","version":"0.1.0"}}
    k0s.k0sproject.io/stack-checksum: 49567ef87546fbe43dffc2f201b8db20
  creationTimestamp: "2024-03-12T23:20:21Z"
  finalizers:
  - helm.k0sproject.io/uninstall-helm-release
  generation: 4
  labels:
    k0s.k0sproject.io/stack: helm
  name: k0s-addon-chart-echo-app
  namespace: kube-system
  resourceVersion: "191084"
  uid: b7c950d7-4c42-4319-bb27-f3d72a25e68f
spec:
  chartName: oci://private-registry-redacted.com/helm-charts/lib
  namespace: default
  releaseName: echo-app
  timeout: 0s
  values: "\nimage:\n  repository: library/whoami\n  registry: private-registry-redacted.com\n
    \ tag: latest\ningress:\n  enabled: true\n  hostname: echo.poligon-lb1.redacted.com\n
    \ tls: true\n  existingSecret: wildcard-poligon-lb1-redacted-com-cert\n  ingressClassName:
    nginx\nports:\n- name: http\n  containerPort: 80\n  protocol: TCP  \n"
  version: 0.1.0
status:
  error: 'can''t install loadedChart `lib`: cannot re-use a name that is still in
    use'
  updated: 2024-03-13 08:32:08.958195386 -0700 PDT m=+683.794333516
  valuesHash: f4f589b227ece419b3e9207f90bacc688a83d613b31539fb8980808c7bc824ce

twz123 · 2024-03-27T11:31:01Z

Closing this, as #4064 has been merged and backported. Feel free to ping here or open another issue if the problem persists.

laverya added the bug Something isn't working label Feb 9, 2024

jnummelin assigned juanluisvaladas Feb 12, 2024

ricardomaraschini mentioned this issue Feb 15, 2024

bug: retry saving helm chart object status #4064

Merged

16 tasks

juanluisvaladas mentioned this issue Mar 7, 2024

[Backport release-1.28] bug: retry saving helm chart object status #4150

Merged

twz123 closed this as completed Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating the in-cluster autopilot config while a helm chart is being created results in the chart being created twice #4047

Updating the in-cluster autopilot config while a helm chart is being created results in the chart being created twice #4047

laverya commented Feb 9, 2024 •

edited

Loading

laverya commented Feb 9, 2024 •

edited

Loading

laverya commented Feb 13, 2024

laverya commented Feb 14, 2024

laverya commented Feb 15, 2024

laverya commented Feb 15, 2024 •

edited

Loading

ricardomaraschini commented Feb 15, 2024

ricardomaraschini commented Feb 15, 2024 •

edited

Loading

ricardomaraschini commented Feb 15, 2024

ggolin commented Mar 13, 2024

twz123 commented Mar 27, 2024

Updating the in-cluster autopilot config while a helm chart is being created results in the chart being created twice #4047

Updating the in-cluster autopilot config while a helm chart is being created results in the chart being created twice #4047

Comments

laverya commented Feb 9, 2024 • edited Loading

Before creating an issue, make sure you've checked the following:

Platform

Version

Sysinfo

What happened?

Steps to reproduce

Expected behavior

Actual behavior

Screenshots and logs

Additional context

laverya commented Feb 9, 2024 • edited Loading

laverya commented Feb 13, 2024

laverya commented Feb 14, 2024

laverya commented Feb 15, 2024

laverya commented Feb 15, 2024 • edited Loading

ricardomaraschini commented Feb 15, 2024

ricardomaraschini commented Feb 15, 2024 • edited Loading

ricardomaraschini commented Feb 15, 2024

ggolin commented Mar 13, 2024

twz123 commented Mar 27, 2024

laverya commented Feb 9, 2024 •

edited

Loading

laverya commented Feb 9, 2024 •

edited

Loading

laverya commented Feb 15, 2024 •

edited

Loading

ricardomaraschini commented Feb 15, 2024 •

edited

Loading