Unable to update workloads in v1.9.10 which were deployed in <=v1.9.9 due to resourcetracker controller guid #6493

Kolossi · 2024-03-27T13:36:20Z

Describe the bug

I upgraded 2 separate clusters - one AKS, one GKE - from v1.9.9 to v1.9.10 using "helm upgrade --install ...."

"vela up --wait ..." deploys of new builds of existing workloads exhibit the same behaviour in both clusters - vela up hangs (aborted it after an hour!).

To Reproduce

On a cluster with v1.9.9 installed via helm, deploy (using vela-cli v1.9.9) an Application, in my case a webservice component, with vela up.
All works ok.
Update cluster to v1.9.10 with "helm upgrade --install"
Deploy (using vela-cli v1.9.10) a new version of the application (e.g. with newer container image tag)
"vela deploy" hangs - abort with ctrl-c
doing "vela status" it reports e.g.

$ vela status -n my-ns my-app
About:

  Name:         my-app
  Namespace:    my-ns
  Created at:   2024-03-27 11:32:59 +0000 GMT
  Status:
$

trying to delete application with "vela delete" the response was that the application did not exist despite "kubectl get application ..." succeeding
observed an error shown in application describe:

$ kubectl get application -n my-ns my-app
...
Status:
  Conditions:
    Last Transition Time:  2024-03-27T12:38:37Z
    Message:               failed to create resourceKeeper: failed to load resourcetrackers: resourcetracker my-app-v25-my-ns exists but controlled by another application (uid: 161a9675-33f1-4360-81e9-3580142bf26a), this could probably be cased by some mistakes while garbage collecting outdated resource. Please check this resourcetrakcer and delete it manually
    Reason:                ReconcileError
    Status:                False
    Type:                  Synced
...

Expected behavior
vela up should proceed as normal without resourcetracker controller issues after upgrade to kubevela v1.9.10

KubeVela Version

problem controller and cli - v1.9.10
revert to v1.9.9 (and resourcetracker cleanup) fixed the issue.

Cluster information

GKE cluster - Kubernetes v1.27.7-gke.1121002
AKS cluster - Kubernetes v1.28.5

Additional context
I've had to revert both clusters to v1.9.9 in order to support the business. I don't have a test cluster to perform further investigation.

In one case the resourcetracker was still hanging around after deleting the application with "kubectl delete application ..." as above, "vela delete ..." could not find app.

Delete of resourcetracker also hung, presumably due to finalizer. Did "kubectl edit resourcetracker ..." and changed finalizers: [] then resourcetracker was deleted. Without this, deploy did not succeed even back to v1.9.9.

Once this was done all back to working fine under v1.9.9.

The text was updated successfully, but these errors were encountered:

FogDong · 2024-03-27T16:43:05Z

I believe it's caused by pkg upgration. I saw the error logs of in controller:

 "Could not capability in ConfigMap" err="cannot create or update capability replication in ConfigMap: the server could not find the requested resource (post configmaps)"

It seems related to the client change in pkg, after I revert the pkg upgration commit, the controller works.

Might related to kubevela/pkg#69 or kubevela/pkg#83

wonderflow · 2024-04-11T02:47:29Z

Here I don't think we need to revert the pkg, it's the same for users to use 1.9.9 directly.

ping @Somefive, could you please help look into it to see if we have some cheap solution for the fix? Thanks.

Kolossi · 2024-04-17T09:24:22Z

Here I don't think we need to revert the pkg, it's the same for users to use 1.9.9 directly.

It's not quite the same though @wonderflow @Somefive, as v1.9.10 includes my PR #6477 allowing service reuse/different service and container port on gateway - which I could really do with :-)

mattwelke mentioned this issue Apr 5, 2024

No status for application when following "Deploy First Application" tutorial kubevela/kubevela.github.io#1319

Closed

wonderflow pinned this issue Apr 11, 2024

FogDong mentioned this issue Apr 21, 2024

Fix: bump pkg version to fix the client bug #6507

Merged

4 tasks

wonderflow closed this as completed in #6507 Apr 22, 2024

wonderflow unpinned this issue Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to update workloads in v1.9.10 which were deployed in <=v1.9.9 due to resourcetracker controller guid #6493

Unable to update workloads in v1.9.10 which were deployed in <=v1.9.9 due to resourcetracker controller guid #6493

Kolossi commented Mar 27, 2024

FogDong commented Mar 27, 2024

wonderflow commented Apr 11, 2024

Kolossi commented Apr 17, 2024

Unable to update workloads in v1.9.10 which were deployed in <=v1.9.9 due to resourcetracker controller guid #6493

Unable to update workloads in v1.9.10 which were deployed in <=v1.9.9 due to resourcetracker controller guid #6493

Comments

Kolossi commented Mar 27, 2024

FogDong commented Mar 27, 2024

wonderflow commented Apr 11, 2024

Kolossi commented Apr 17, 2024