Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating the in-cluster autopilot config while a helm chart is being created results in the chart being created twice #4047

Closed
4 tasks done
laverya opened this issue Feb 9, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@laverya
Copy link
Contributor

laverya commented Feb 9, 2024

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 6.2.0-1019-azure #19~22.04.1-Ubuntu SMP Wed Jan 10 22:57:03 UTC 2024 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version

v1.29.1+k0s.1

Sysinfo

k0s sysinfo Machine ID: "ae13f7395464d07eda6ba68f7e94b56d64b68c4b3527b66ccb7ab4868f9c13b1" (from machine) (pass) Total memory: 15.6 GiB (pass) Disk space available for /var/lib/k0s: 24.3 GiB (pass) Name resolution: localhost: [::1 127.0.0.1] (pass) Operating system: Linux (pass) Linux kernel release: 6.2.0-1019-azure (pass) Max. file descriptors per process: current: 1048576 / max: 1048576 (pass) AppArmor: active (pass) Executable in PATH: modprobe: /usr/sbin/modprobe (pass) Executable in PATH: mount: /usr/bin/mount (pass) Executable in PATH: umount: /usr/bin/umount (pass) /proc file system: mounted (0x9fa0) (pass) Control Groups: version 2 (pass) cgroup controller "cpu": available (is a listed root controller) (pass) cgroup controller "cpuacct": available (via cpu in version 2) (pass) cgroup controller "cpuset": available (is a listed root controller) (pass) cgroup controller "memory": available (is a listed root controller) (pass) cgroup controller "devices": available (device filters attachable) (pass) cgroup controller "freezer": available (cgroup.freeze exists) (pass) cgroup controller "pids": available (is a listed root controller) (pass) cgroup controller "hugetlb": available (is a listed root controller) (pass) cgroup controller "blkio": available (via io in version 2) (pass) CONFIG_CGROUPS: Control Group support: no kernel config found (warning) CONFIG_NAMESPACES: Namespaces support: no kernel config found (warning) CONFIG_NET: Networking support: no kernel config found (warning) CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: no kernel config found (warning) CONFIG_PROC_FS: /proc file system support: no kernel config found (warning)

What happened?

A helm chart was 'created' twice, resulting in an error. I believe this happened because the clusterconfig object was updated while the helm chart was being initially created. (Dynamic config is being used here)

Steps to reproduce

  1. Install a cluster with dynamic config, and multiple charts in the config object
  2. Modify those charts while creation is in progress
  3. Observe that sometimes one chart will not be successfully installed and will have an error status

Expected behavior

Charts are not applied twice in parallel.

Actual behavior

Charts are applied twice in parallel, resulting in helm errors.

A chart that has a status.error can't install loadedChart ```ingress-nginx```: cannot re-use a name that is still in use cannot be further updated by k0s, because it will attempt to recreate the chart once again.

Screenshots and logs

the clusterconfig, with errors:

apiVersion: v1
items:
- apiVersion: k0s.k0sproject.io/v1beta1
  kind: ClusterConfig
  metadata:
    creationTimestamp: "2024-02-09T16:36:34Z"
    generation: 2
    name: k0s
    namespace: kube-system
    resourceVersion: "1489"
    uid: 7bb36730-9c81-4486-bd41-eff56d4f62aa
  spec:
    api:
      address: 10.244.239.133
      k0sApiPort: 9443
      port: 6443
      sans:
      - 10.244.239.133
      - fe80::5c7c:63ff:feb5:4b47
    controllerManager: {}
    extensions:
      helm:
        charts:
        - chartname: oci://registry.replicated.com/library/admin-console
          name: admin-console
          namespace: kotsadm
          order: 3
          timeout: 0
          values: |
            automation:
              appVersionLabel: 0.1.9
              license:
                data: |
                  apiVersion: kots.io/v1beta1
                  kind: License
                  metadata:
                    creationTimestamp: null
                    name: on-prtestcustomer
                  spec:
                    appSlug: embedded-cluster-smoke-test-staging-app
                    channelID: 2bVjTIz1TkO8pCwy6ipVLGwHpGX
                    channelName: on-pr
                    customerName: on-pr test customer
                    endpoint: https://staging.replicated.app
                    entitlements:
                      expires_at:
                        description: License Expiration
                        title: Expiration
                        value: ""
                        valueType: String
                    isAirgapSupported: true
                    licenseID: 2bVjmMR5t2QmFMQYOo04v4tsj9N
                    licenseSequence: 1
                    licenseType: dev
                    signature: omitted
                  status: {}
                slug: embedded-cluster-smoke-test-staging-app
            embeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd
            isHelmManaged: false
            kurlProxy:
              enabled: true
              nodePort: 30000
            minimalRBAC: false
            service:
              enabled: false
          version: 1.107.2
        - chartname: ingress-nginx/ingress-nginx
          name: ingress-nginx
          namespace: ingress-nginx
          order: 4
          timeout: 0
          values: |
            controller:
              service:
                type: NodePort
                nodePorts:
                  http: "80"
                  https: "443"
          version: 4.8.3
        - chartname: openebs/openebs
          name: openebs
          namespace: openebs
          order: 1
          timeout: 0
          values: |
            localprovisioner:
              deviceClass:
                enabled: false
              hostpathClass:
                isDefaultClass: true
            ndm:
              enabled: false
            ndmOperator:
              enabled: false
          version: 3.10.0
        - chartname: oci://registry.replicated.com/library/embedded-cluster-operator
          name: embedded-cluster-operator
          namespace: embedded-cluster
          order: 2
          timeout: 0
          values: |
            embeddedBinaryName: embedded-cluster
            embeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd
            embeddedClusterK0sVersion: v1.29.1+k0s.1
            embeddedClusterVersion: dev-777b11e
            kotsVersion: 1.107.2
          version: 0.22.5
        concurrencyLevel: 1
        repositories:
        - caFile: ""
          certFile: ""
          insecure: false
          keyfile: ""
          name: openebs
          password: ""
          url: https://openebs.github.io/charts
          username: ""
        - caFile: ""
          certFile: ""
          insecure: false
          keyfile: ""
          name: ingress-nginx
          password: ""
          url: https://kubernetes.github.io/ingress-nginx
          username: ""
      storage:
        create_default_storage_class: false
        type: external_storage
    images:
      calico:
        cni:
          image: quay.io/k0sproject/calico-cni
          version: v3.26.1-1
        kubecontrollers:
          image: quay.io/k0sproject/calico-kube-controllers
          version: v3.26.1-1
        node:
          image: quay.io/k0sproject/calico-node
          version: v3.26.1-1
      coredns:
        image: quay.io/k0sproject/coredns
        version: 1.11.1
      default_pull_policy: IfNotPresent
      konnectivity:
        image: quay.io/k0sproject/apiserver-network-proxy-agent
        version: v0.1.4
      kubeproxy:
        image: quay.io/k0sproject/kube-proxy
        version: v1.28.4
      kuberouter:
        cni:
          image: quay.io/k0sproject/kube-router
          version: v1.6.0-iptables1.8.9-0
        cniInstaller:
          image: quay.io/k0sproject/cni-node
          version: 1.1.1-k0s.1
      metricsserver:
        image: registry.k8s.io/metrics-server/metrics-server
        version: v0.6.4
      pause:
        image: registry.k8s.io/pause
        version: "3.8"
      pushgateway:
        image: quay.io/k0sproject/pushgateway-ttl
        version: 1.4.0-k0s.0
    installConfig:
      users:
        etcdUser: etcd
        kineUser: kube-apiserver
        konnectivityUser: konnectivity-server
        kubeAPIserverUser: kube-apiserver
        kubeSchedulerUser: kube-scheduler
    konnectivity:
      adminPort: 8133
      agentPort: 8132
    network:
      calico:
        flexVolumeDriverPath: /usr/libexec/k0s/kubelet-plugins/volume/exec/nodeagent~uds
        mode: vxlan
        mtu: 0
        overlay: Always
        vxlanPort: 4789
        vxlanVNI: 4096
        wireguard: false
      clusterDomain: cluster.local
      dualStack: {}
      kubeProxy:
        iptables:
          minSyncPeriod: 0s
          syncPeriod: 0s
        ipvs:
          minSyncPeriod: 0s
          syncPeriod: 0s
          tcpFinTimeout: 0s
          tcpTimeout: 0s
          udpTimeout: 0s
        metricsBindAddress: 0.0.0.0:10249
        mode: iptables
      kuberouter:
        autoMTU: true
        hairpin: Enabled
        ipMasq: false
        metricsPort: 8080
        mtu: 0
        peerRouterASNs: ""
        peerRouterIPs: ""
      nodeLocalLoadBalancing:
        envoyProxy:
          apiServerBindPort: 7443
          image:
            image: quay.io/k0sproject/envoy-distroless
            version: v1.29.0
          konnectivityServerBindPort: 7132
        type: EnvoyProxy
      podCIDR: 10.244.0.0/16
      provider: calico
      serviceCIDR: 10.96.0.0/12
    scheduler: {}
    storage:
      etcd:
        peerAddress: 10.244.239.133
      type: etcd
    telemetry:
      enabled: false
kind: List
metadata:
  resourceVersion: ""
operator logs:
2024-02-09T16:39:51Z	INFO	chart errors	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "20103e8f-bf5e-4055-bcf6-6020532c01c9", "errors": "failed to update helm charts: can't install loadedChart `ingress-nginx`: cannot re-use a name that is still in use"}
2024-02-09T16:39:51Z	INFO	Installation reconciliation ended	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "20103e8f-bf5e-4055-bcf6-6020532c01c9"}
2024-02-09T16:40:32Z	INFO	Reconciling installation	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "1cd2b51d-19fc-4ccc-8316-dc0b7b57042a"}
2024-02-09T16:40:32Z	INFO	Reconciling addons	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "1cd2b51d-19fc-4ccc-8316-dc0b7b57042a"}
2024-02-09T16:40:32Z	INFO	chart errors	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "1cd2b51d-19fc-4ccc-8316-dc0b7b57042a", "errors": "failed to update helm charts: can't install loadedChart `ingress-nginx`: cannot re-use a name that is still in use"}
2024-02-09T16:40:32Z	INFO	Installation reconciliation ended	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "1cd2b51d-19fc-4ccc-8316-dc0b7b57042a"}
2024-02-09T16:41:54Z	INFO	Reconciling installation	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "86d8d24a-8d35-4040-afc6-ab1f5a8e1d8e"}
2024-02-09T16:41:54Z	INFO	Reconciling addons	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "86d8d24a-8d35-4040-afc6-ab1f5a8e1d8e"}
2024-02-09T16:41:54Z	INFO	chart errors	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "86d8d24a-8d35-4040-afc6-ab1f5a8e1d8e", "errors": "failed to update helm charts: can't install loadedChart `ingress-nginx`: cannot re-use a name that is still in use"}
2024-02-09T16:41:54Z	INFO	Installation reconciliation ended	{"controller": "installation", "controllerGroup": "embeddedcluster.replicated.com", "controllerKind": "Installation", "Installation": {"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"}, "namespace": "kube-system", "name": "k0s-addon-chart-ingress-nginx", "reconcileID": "86d8d24a-8d35-4040-afc6-ab1f5a8e1d8e"}

the installed helm charts (the ingress-nginx chart has the error):

apiVersion: v1
items:
- apiVersion: helm.k0sproject.io/v1beta1
  kind: Chart
  metadata:
    annotations:
      k0s.k0sproject.io/last-applied-configuration: |
        {"apiVersion":"helm.k0sproject.io/v1beta1","kind":"Chart","metadata":{"finalizers":["helm.k0sproject.io/uninstall-helm-release"],"name":"k0s-addon-chart-admin-console","namespace":"kube-system"},"spec":{"chartName":"oci://registry.replicated.com/library/admin-console","namespace":"kotsadm","releaseName":"admin-console","timeout":"0s","values":"\nautomation:\n  appVersionLabel: 0.1.9\n  license:\n    data: |\n      apiVersion: kots.io/v1beta1\n      kind: License\n      metadata:\n        creationTimestamp: null\n        name: on-prtestcustomer\n      spec:\n        appSlug: embedded-cluster-smoke-test-staging-app\n        channelID: 2bVjTIz1TkO8pCwy6ipVLGwHpGX\n        channelName: on-pr\n        customerName: on-pr test customer\n        endpoint: https://staging.replicated.app\n        entitlements:\n          expires_at:\n            description: License Expiration\n            title: Expiration\n            value: \"\"\n            valueType: String\n        isAirgapSupported: true\n        licenseID: 2bVjmMR5t2QmFMQYOo04v4tsj9N\n        licenseSequence: 1\n        licenseType: dev\n        signature: omitted\n      status: {}\n    slug: embedded-cluster-smoke-test-staging-app\nembeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd\nisHelmManaged: false\nkurlProxy:\n  enabled: true\n  nodePort: 30000\nminimalRBAC: false\nservice:\n  enabled: false\n","version":"1.107.2"}}
      k0s.k0sproject.io/stack-checksum: 88480fbd2e8349d5cf20f6bbd4f4020d
    creationTimestamp: "2024-02-09T16:36:38Z"
    finalizers:
    - helm.k0sproject.io/uninstall-helm-release
    generation: 2
    labels:
      k0s.k0sproject.io/stack: helm
    name: k0s-addon-chart-admin-console
    namespace: kube-system
    resourceVersion: "1802"
    uid: 936cbfd5-381c-456e-aa42-d1d2d8885e5f
  spec:
    chartName: oci://registry.replicated.com/library/admin-console
    namespace: kotsadm
    releaseName: admin-console
    timeout: 0s
    values: |2

      automation:
        appVersionLabel: 0.1.9
        license:
          data: |
            apiVersion: kots.io/v1beta1
            kind: License
            metadata:
              creationTimestamp: null
              name: on-prtestcustomer
            spec:
              appSlug: embedded-cluster-smoke-test-staging-app
              channelID: 2bVjTIz1TkO8pCwy6ipVLGwHpGX
              channelName: on-pr
              customerName: on-pr test customer
              endpoint: https://staging.replicated.app
              entitlements:
                expires_at:
                  description: License Expiration
                  title: Expiration
                  value: ""
                  valueType: String
              isAirgapSupported: true
              licenseID: 2bVjmMR5t2QmFMQYOo04v4tsj9N
              licenseSequence: 1
              licenseType: dev
              signature: omitted
            status: {}
          slug: embedded-cluster-smoke-test-staging-app
      embeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd
      isHelmManaged: false
      kurlProxy:
        enabled: true
        nodePort: 30000
      minimalRBAC: false
      service:
        enabled: false
    version: 1.107.2
  status:
    appVersion: 1.107.2
    namespace: kotsadm
    releaseName: admin-console
    revision: 2
    updated: 2024-02-09 16:39:31.016168254 +0000 UTC m=+59.884245154
    valuesHash: 5aab5a4aed2df2490f5369e776d44a432e7f4e07bb8d9f8b5f664f55babbfb2a
    version: 1.107.2
- apiVersion: helm.k0sproject.io/v1beta1
  kind: Chart
  metadata:
    annotations:
      k0s.k0sproject.io/last-applied-configuration: |
        {"apiVersion":"helm.k0sproject.io/v1beta1","kind":"Chart","metadata":{"finalizers":["helm.k0sproject.io/uninstall-helm-release"],"name":"k0s-addon-chart-embedded-cluster-operator","namespace":"kube-system"},"spec":{"chartName":"oci://registry.replicated.com/library/embedded-cluster-operator","namespace":"embedded-cluster","releaseName":"embedded-cluster-operator","timeout":"0s","values":"\nembeddedBinaryName: embedded-cluster\nembeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd\nembeddedClusterK0sVersion: v1.29.1+k0s.1\nembeddedClusterVersion: dev-777b11e\nkotsVersion: 1.107.2\n","version":"0.22.5"}}
      k0s.k0sproject.io/stack-checksum: 4c93dcd1df9d6de6df2a345537ce4a82
    creationTimestamp: "2024-02-09T16:36:38Z"
    finalizers:
    - helm.k0sproject.io/uninstall-helm-release
    generation: 2
    labels:
      k0s.k0sproject.io/stack: helm
    name: k0s-addon-chart-embedded-cluster-operator
    namespace: kube-system
    resourceVersion: "1622"
    uid: 9740d5f0-4796-4a28-922c-b4bb6228d23c
  spec:
    chartName: oci://registry.replicated.com/library/embedded-cluster-operator
    namespace: embedded-cluster
    releaseName: embedded-cluster-operator
    timeout: 0s
    values: |2

      embeddedBinaryName: embedded-cluster
      embeddedClusterID: 356ddb08-8ac9-4628-bf45-38b6542bb7dd
      embeddedClusterK0sVersion: v1.29.1+k0s.1
      embeddedClusterVersion: dev-777b11e
      kotsVersion: 1.107.2
    version: 0.22.5
  status:
    namespace: embedded-cluster
    releaseName: embedded-cluster-operator
    revision: 2
    updated: 2024-02-09 16:39:06.655903745 +0000 UTC m=+35.523980665
    valuesHash: 9e936e9f6eb13b1fb933921c1a5b1fcd77d61768472bfac6341881455188e139
    version: 0.22.5
- apiVersion: helm.k0sproject.io/v1beta1
  kind: Chart
  metadata:
    annotations:
      k0s.k0sproject.io/last-applied-configuration: |
        {"apiVersion":"helm.k0sproject.io/v1beta1","kind":"Chart","metadata":{"finalizers":["helm.k0sproject.io/uninstall-helm-release"],"name":"k0s-addon-chart-ingress-nginx","namespace":"kube-system"},"spec":{"chartName":"ingress-nginx/ingress-nginx","namespace":"ingress-nginx","releaseName":"ingress-nginx","timeout":"0s","values":"\ncontroller:\n  service:\n    type: NodePort\n    nodePorts:\n      http: \"80\"\n      https: \"443\"\n","version":"4.8.3"}}
      k0s.k0sproject.io/stack-checksum: 4976759ef6f053e7134c038d493a38cf
    creationTimestamp: "2024-02-09T16:36:38Z"
    finalizers:
    - helm.k0sproject.io/uninstall-helm-release
    generation: 1
    labels:
      k0s.k0sproject.io/stack: helm
    name: k0s-addon-chart-ingress-nginx
    namespace: kube-system
    resourceVersion: "2314"
    uid: d27f0ce7-e8c7-416f-a691-8f29f8a478ff
  spec:
    chartName: ingress-nginx/ingress-nginx
    namespace: ingress-nginx
    releaseName: ingress-nginx
    timeout: 0s
    values: |2

      controller:
        service:
          type: NodePort
          nodePorts:
            http: "80"
            https: "443"
    version: 4.8.3
  status:
    error: 'can''t install loadedChart `ingress-nginx`: cannot re-use a name that
      is still in use'
    updated: 2024-02-09 16:41:54.824811545 +0000 UTC m=+203.692888445
    valuesHash: 6183ed64f0eb5e20401df18c636d397fe783bbd76f5684aeaa4f9671fc7fb561
- apiVersion: helm.k0sproject.io/v1beta1
  kind: Chart
  metadata:
    annotations:
      k0s.k0sproject.io/last-applied-configuration: |
        {"apiVersion":"helm.k0sproject.io/v1beta1","kind":"Chart","metadata":{"finalizers":["helm.k0sproject.io/uninstall-helm-release"],"name":"k0s-addon-chart-openebs","namespace":"kube-system"},"spec":{"chartName":"openebs/openebs","namespace":"openebs","releaseName":"openebs","timeout":"0s","values":"\nlocalprovisioner:\n  deviceClass:\n    enabled: false\n  hostpathClass:\n    isDefaultClass: true\nndm:\n  enabled: false\nndmOperator:\n  enabled: false\n","version":"3.10.0"}}
      k0s.k0sproject.io/stack-checksum: d01c1c23c102be75743e7e6435310d17
    creationTimestamp: "2024-02-09T16:36:37Z"
    finalizers:
    - helm.k0sproject.io/uninstall-helm-release
    generation: 1
    labels:
      k0s.k0sproject.io/stack: helm
    name: k0s-addon-chart-openebs
    namespace: kube-system
    resourceVersion: "1441"
    uid: eabb4703-4d42-4d2b-8add-a2556491b56b
  spec:
    chartName: openebs/openebs
    namespace: openebs
    releaseName: openebs
    timeout: 0s
    values: |2

      localprovisioner:
        deviceClass:
          enabled: false
        hostpathClass:
          isDefaultClass: true
      ndm:
        enabled: false
      ndmOperator:
        enabled: false
    version: 3.10.0
  status:
    appVersion: 3.10.0
    namespace: openebs
    releaseName: openebs
    revision: 1
    updated: 2024-02-09 16:38:35.936221183 +0000 UTC m=+4.804298103
    valuesHash: 6cc59a285465ad2e97237da6efd7972daa3deeb90613fc8a18c8772739723bf7
    version: 3.10.0
kind: List
metadata:
  resourceVersion: ""

Additional context

No response

@laverya laverya added the bug Something isn't working label Feb 9, 2024
@laverya
Copy link
Contributor Author

laverya commented Feb 9, 2024

I would expect this to be resolved on single-node if we added a sync.Mutex to

func (cr *ChartReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
but that wouldn't fully handle the multi-node case. (we're seeing this while bootstrapping a cluster from 0, so there's only one node)

@laverya
Copy link
Contributor Author

laverya commented Feb 13, 2024

I tested adding a mutex to the Reconcile function - no improvement, which was a surprise to me! I know that the cannot re-use a name that is still in use error comes from calling 'install' when a chart already exists: https://github.com/helm/helm/blob/169561a1b381ae1a6a3974d84c303f19f324ffa0/pkg/action/install.go#L531

But I'm not sure how that would happen besides a race where Reconcile is running twice for the same chart object, and the mutex should have fixed that if there's only one copy running.

@laverya
Copy link
Contributor Author

laverya commented Feb 14, 2024

I tried an alternate solution, that being to call upgrade if a cannot re-use a name which is already in use error was returned.

I got failed to update helm charts: can't upgrade loadedChart ```ingress-nginx```: \"ingress-nginx\" has no deployed releases instead 😆 (probably because the secret had been created, but it was not yet in a deployed state?)

@laverya
Copy link
Contributor Author

laverya commented Feb 15, 2024

I'm becoming less sure that this is triggered by making an update to the cluster config while the chart apply is still ongoing - I should look at the list of things that can trigger a reconcile 😄

@laverya
Copy link
Contributor Author

laverya commented Feb 15, 2024

This may be from running systemctl restart k0scontroller.service immediately after waiting for the first few (but not all!) of the charts to deploy pods

Edit: looks like both restarting the service and editing the config can trigger the same bug

@ricardomaraschini
Copy link
Contributor

What follows is a theory for a possible cause for this race condition:

An install of a Helm Chart only happens when the Status.Release for the given chart is empty. The update in the Helm Chart Status.ReleaseName only happens at the end of the function (here or here).

What would happen if something patched the Helm Chart spec section while the Helm Install is ongoing ? That would trigger a Conflict when attempting to save the Chart Status in the cluster so the Status.ReleaseName would not be updated and in the next reconcile the Controller will attempt to install the chart once again.

I inspect k0s logs and I noticed that this error is happening quite often with different charts:

Feb 15 11:40:46 ec k0s[28576]: time="2024-02-15 11:40:46" level=error msg="Failed to update status for chart releasek0s-addon-chart-memcached" component=extensions_controller error="Operation cannot be fulfilled on charts.helm.k0sproject.io \"k0s-addon-chart-memcached\": the object has been modified; please apply your changes to the latest version and try again" extensions_type=he

This seems to indicates that SOMETHING has changed the Chart object while the chart was being reconciled by the controller and the Status could not be updated in the cluster.

Looking at the logic used to update those charts objects it seems like this situation is possible as it just dumps them in a directory and something else loads them into the cluster.

So, in other words:

  1. A new chart is created in the cluster config.
  2. The chart install has started.
  3. Before the end of the chart installation the chart in the cluster config is updated.
  4. This last change dumps a new Chart yaml into the manifests directory.
  5. Something loads the manifest into the cluster.
  6. The install finishes.
  7. The controller attempts to update the Chart status and see a conflict.
  8. During the next Chart reconcile it attempts to install the chart again and we see the issue.

Does this make sense ?

@ricardomaraschini
Copy link
Contributor

ricardomaraschini commented Feb 15, 2024

I manage to reproduce this by adding a new chart to the Cluster Config and then just after saving it I edited the manifest file for the chart in the /var/lib/k0s/manifests directory and changed something. The chart now is on

{
  "error": "can't install loadedChart `memcached`: cannot re-use a name that is still in use",
  "updated": "2024-02-15 13:54:58.207462269 +0000 UTC m=+386.995226471",
  "valuesHash": "201c25ad8c659602ec3934d0b3153586f112da4406a2b683587aef4a76390beb"
}

Inspecting the k0s log this was logged:

Feb 15 11:32:32 ec k0s[28576]: time="2024-02-15 11:32:32" level=error msg="Failed to update status for chart releasek0s-addon-chart-memcached" component=extensions_controller error="Operation cannot be fulfilled on charts.helm.k0sproject.io \"k0s-addon-chart-memcached\": the object has been modified; please apply your changes to the latest version and try again" extensions_type=helm

It might be the case that the theory is actually right.

@ricardomaraschini
Copy link
Contributor

@jnummelin I have raised a possible workaround for the problem on #4064, please let me know what you think.

@ggolin
Copy link

ggolin commented Mar 13, 2024

I manage to reproduce this by adding a new chart to the Cluster Config and then just after saving it I edited the manifest file for the chart in the /var/lib/k0s/manifests directory and changed something. The chart now is on

{
  "error": "can't install loadedChart `memcached`: cannot re-use a name that is still in use",
  "updated": "2024-02-15 13:54:58.207462269 +0000 UTC m=+386.995226471",
  "valuesHash": "201c25ad8c659602ec3934d0b3153586f112da4406a2b683587aef4a76390beb"
}

Inspecting the k0s log this was logged:

Feb 15 11:32:32 ec k0s[28576]: time="2024-02-15 11:32:32" level=error msg="Failed to update status for chart releasek0s-addon-chart-memcached" component=extensions_controller error="Operation cannot be fulfilled on charts.helm.k0sproject.io \"k0s-addon-chart-memcached\": the object has been modified; please apply your changes to the latest version and try again" extensions_type=helm

It might be the case that the theory is actually right.

This is also reproducible with 1.29 and 1.28 by merely adjusting the chart values via the config object and applying k0s.yaml again. For example:

kind: Cluster
metadata:
  name: k0s-cluster
spec:
  hosts:
  - ssh:
      address: 1.2.3.4
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
    files:
      - name: metallb-crd
        src: manifests/metallb-crd
        dstDir: /var/lib/k0s/manifests/metallb-crd
      - name: wildcard-cert
        src: manifests/wildcard-cert
        dstDir: /var/lib/k0s/manifests/wildcard-cert
  - ssh:
      address: 1.2.3.5
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
  - ssh:
      address: 1.2.3.6
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
  - ssh:
      address: 1.2.3.7
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  - ssh:
      address: 1.2.3.8
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  - ssh:
      address: 1.2.3.9
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  k0s:
    version: v1.28.7+k0s.0
    versionChannel: stable
    dynamicConfig: false
    config:
      spec:
        extensions:
          helm:
            charts:
            - chartname: appscode/gateway-api
              name: gateway-api
              namespace: kube-system
              order: 0
              version: v1.0.0
            - chartname: metallb/metallb
              name: metallb
              namespace: metallb
              order: 1
            - chartname: oci://private-registry-redacted.com/helm-charts/lib
              name: echo-app
              namespace: default
              version: 0.1.0
              order: 3
              values: |2
                image:
                  repository: library/whoami
                  registry: private-registry-redacted.com
                  tag: latest
                ingress:
                  enabled: true
                  hostname: echo.poligon-lb1.redacted.com
                  tls: true
                  existingSecret: wildcard-poligon-lb1-redacted-com-cert
                ports:
                - name: http
                  containerPort: 80
                  protocol: TCP  
            - chartname: oci://private-registry-redacted.com/helm-charts/nginx-ingress-controller
              name: nginx-ingress-controller
              namespace: nginx-ingress
              order: 2
              values: |
                image:
                  registry: private-registry-redacted.com
                  tag: 1.9.6-debian-12-r8
                  repository: registry/nginx-ingress-controller
                defaultBackend:
                  image:
                    registry: private-registry-redacted.com
                    tag: 1.24.0-debian-11-r5
                    repository: library/bitnami/nginx
              version: 10.7.0
            repositories:
            - name: metallb
              url: https://metallb.github.io/metallb
            - name: appscode
              url: https://charts.appscode.com/stable
        network:
          nodeLocalLoadBalancing:
            enabled: true
            type: EnvoyProxy
          provider: calico

And then adding a value to the whoami-app chart (ingressClassName was missing above):

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s-cluster
spec:
  hosts:
  - ssh:
      address: 1.2.3.4
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
    files:
      - name: metallb-crd
        src: manifests/metallb-crd
        dstDir: /var/lib/k0s/manifests/metallb-crd
      - name: wildcard-cert
        src: manifests/wildcard-cert
        dstDir: /var/lib/k0s/manifests/wildcard-cert
  - ssh:
      address: 1.2.3.5
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
  - ssh:
      address: 1.2.3.6
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: controller
  - ssh:
      address: 1.2.3.7
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  - ssh:
      address: 1.2.3.8
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  - ssh:
      address: 1.2.3.9
      user: root
      port: 22
      keyPath: $HOME/.ssh/id_rsa
    role: worker
    privateInterface: bond0
  k0s:
    version: v1.28.7+k0s.0
    versionChannel: stable
    dynamicConfig: false
    config:
      spec:
        extensions:
          helm:
            charts:
            - chartname: appscode/gateway-api
              name: gateway-api
              namespace: kube-system
              order: 0
              version: v1.0.0
            - chartname: metallb/metallb
              name: metallb
              namespace: metallb
              order: 1
            - chartname: oci://private-registry-redacted.com/helm-charts/lib
              name: echo-app
              namespace: default
              version: 0.1.0
              order: 3
              values: |2
                image:
                  repository: library/whoami
                  registry: private-registry-redacted.com
                  tag: latest
                ingress:
                  enabled: true
                  hostname: echo.poligon-lb1.redacted.com
                  tls: true
                  existingSecret: wildcard-poligon-lb1-redacted-com-cert
                  ingressClassName: nginx # -- new value!
                ports:
                - name: http
                  containerPort: 80
                  protocol: TCP  
            - chartname: oci://private-registry-redacted.com/helm-charts/nginx-ingress-controller
              name: nginx-ingress-controller
              namespace: nginx-ingress
              order: 2
              values: |
                image:
                  registry: private-registry-redacted.com
                  tag: 1.9.6-debian-12-r8
                  repository: registry/nginx-ingress-controller
                defaultBackend:
                  image:
                    registry: private-registry-redacted.com
                    tag: 1.24.0-debian-11-r5
                    repository: library/bitnami/nginx
              version: 10.7.0
            repositories:
            - name: metallb
              url: https://metallb.github.io/metallb
            - name: appscode
              url: https://charts.appscode.com/stable
        network:
          nodeLocalLoadBalancing:
            enabled: true
            type: EnvoyProxy
          provider: calico

Applying this config does not work as I would expect (helm upgrade since I added a value), and the chart crd now looks like this:

apiVersion: helm.k0sproject.io/v1beta1
kind: Chart
metadata:
  annotations:
    k0s.k0sproject.io/last-applied-configuration: |
      {"apiVersion":"helm.k0sproject.io/v1beta1","kind":"Chart","metadata":{"finalizers":["helm.k0sproject.io/uninstall-helm-release"],"name":"k0s-addon-chart-echo-app","namespace":"kube-system"},"spec":{"chartName":"oci://private-registry-redacted.com/helm-charts/lib","namespace":"default","releaseName":"echo-app","timeout":"0s","values":"\nimage:\n  repository: library/whoami\n  registry: private-registry-redacted.com\n  tag: latest\ningress:\n  enabled: true\n  hostname: echo.poligon-lb1.redacted.com\n  tls: true\n  existingSecret: wildcard-poligon-lb1-redacted-com-cert\n  ingressClassName: nginx\nports:\n- name: http\n  containerPort: 80\n  protocol: TCP  \n","version":"0.1.0"}}
    k0s.k0sproject.io/stack-checksum: 49567ef87546fbe43dffc2f201b8db20
  creationTimestamp: "2024-03-12T23:20:21Z"
  finalizers:
  - helm.k0sproject.io/uninstall-helm-release
  generation: 4
  labels:
    k0s.k0sproject.io/stack: helm
  name: k0s-addon-chart-echo-app
  namespace: kube-system
  resourceVersion: "191084"
  uid: b7c950d7-4c42-4319-bb27-f3d72a25e68f
spec:
  chartName: oci://private-registry-redacted.com/helm-charts/lib
  namespace: default
  releaseName: echo-app
  timeout: 0s
  values: "\nimage:\n  repository: library/whoami\n  registry: private-registry-redacted.com\n
    \ tag: latest\ningress:\n  enabled: true\n  hostname: echo.poligon-lb1.redacted.com\n
    \ tls: true\n  existingSecret: wildcard-poligon-lb1-redacted-com-cert\n  ingressClassName:
    nginx\nports:\n- name: http\n  containerPort: 80\n  protocol: TCP  \n"
  version: 0.1.0
status:
  error: 'can''t install loadedChart `lib`: cannot re-use a name that is still in
    use'
  updated: 2024-03-13 08:32:08.958195386 -0700 PDT m=+683.794333516
  valuesHash: f4f589b227ece419b3e9207f90bacc688a83d613b31539fb8980808c7bc824ce

@twz123
Copy link
Member

twz123 commented Mar 27, 2024

Closing this, as #4064 has been merged and backported. Feel free to ping here or open another issue if the problem persists.

@twz123 twz123 closed this as completed Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants