System application cilium
stops upgrading because another operation appears to be in progress
#12846
Labels
kind/bug
Categorizes issue or PR as related to a bug.
sig/app-management
Denotes a PR or issue as being assigned to SIG App Management.
sig/networking
Denotes a PR or issue as being assigned to SIG Networking.
Milestone
What happened?
On our captain environment, a majority of user clusters that use the
cilium
system application as CNI are failing to upgrade the Cilium Helm chart to deploy any updates (e.g. when trying to update from 1.13.3 to 1.13.8). Theuser-cluster-controller-manager
logs:Helm shows that the release is in "pending-upgrade" state and has a very high release:
I'm not sure what creates this condition (it seems to happen to all clusters sooner or later), but my suspicion is that when the
user-cluster-controller-manager
gets terminated, it can make the helm release stuck in this state and won't try to pick it up again after restart. Since we are constantly running release upgrades (see #12095), the chance of hitting one of those upgrades is quite high.Expected behavior
KKP does not stop reconciling the CNI system application.
How to reproduce the issue?
Unclear, probably install latest KKP, create a user cluster with Cilium as CNI and run it through a couple of Kubernetes upgrades for both the underlying seed cluster and the user cluster.
How is your environment configured?
Provide your KKP manifest here (if applicable)
# paste manifest here
What cloud provider are you running on?
N/A
What operating system are you running in your user cluster?
N/A
Additional information
Workaround
To unblock, it is possible to delete the Helm release secrets from the
kube-system
namespace. Theuser-cluster-controller-manager
will then re-deploy after some time.To find the release secrets, run and then delete them:
$ kubectl get secrets | grep sh.helm.release.v1.kube-system-cilium
The text was updated successfully, but these errors were encountered: