Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to update workloads in v1.9.10 which were deployed in <=v1.9.9 due to resourcetracker controller guid #6493

Closed
Kolossi opened this issue Mar 27, 2024 · 3 comments · Fixed by #6507

Comments

@Kolossi
Copy link
Contributor

Kolossi commented Mar 27, 2024

Describe the bug

I upgraded 2 separate clusters - one AKS, one GKE - from v1.9.9 to v1.9.10 using "helm upgrade --install ...."

"vela up --wait ..." deploys of new builds of existing workloads exhibit the same behaviour in both clusters - vela up hangs (aborted it after an hour!).

To Reproduce

  • On a cluster with v1.9.9 installed via helm, deploy (using vela-cli v1.9.9) an Application, in my case a webservice component, with vela up.
  • All works ok.
  • Update cluster to v1.9.10 with "helm upgrade --install"
  • Deploy (using vela-cli v1.9.10) a new version of the application (e.g. with newer container image tag)
  • "vela deploy" hangs - abort with ctrl-c
  • doing "vela status" it reports e.g.
$ vela status -n my-ns my-app
About:

  Name:         my-app
  Namespace:    my-ns
  Created at:   2024-03-27 11:32:59 +0000 GMT
  Status:
$
  • trying to delete application with "vela delete" the response was that the application did not exist despite "kubectl get application ..." succeeding
  • observed an error shown in application describe:
$ kubectl get application -n my-ns my-app
...
Status:
  Conditions:
    Last Transition Time:  2024-03-27T12:38:37Z
    Message:               failed to create resourceKeeper: failed to load resourcetrackers: resourcetracker my-app-v25-my-ns exists but controlled by another application (uid: 161a9675-33f1-4360-81e9-3580142bf26a), this could probably be cased by some mistakes while garbage collecting outdated resource. Please check this resourcetrakcer and delete it manually
    Reason:                ReconcileError
    Status:                False
    Type:                  Synced
...

Expected behavior
vela up should proceed as normal without resourcetracker controller issues after upgrade to kubevela v1.9.10

KubeVela Version

problem controller and cli - v1.9.10
revert to v1.9.9 (and resourcetracker cleanup) fixed the issue.

Cluster information

GKE cluster - Kubernetes v1.27.7-gke.1121002
AKS cluster - Kubernetes v1.28.5

Additional context
I've had to revert both clusters to v1.9.9 in order to support the business. I don't have a test cluster to perform further investigation.

In one case the resourcetracker was still hanging around after deleting the application with "kubectl delete application ..." as above, "vela delete ..." could not find app.

Delete of resourcetracker also hung, presumably due to finalizer. Did "kubectl edit resourcetracker ..." and changed finalizers: [] then resourcetracker was deleted. Without this, deploy did not succeed even back to v1.9.9.

Once this was done all back to working fine under v1.9.9.

@FogDong
Copy link
Member

FogDong commented Mar 27, 2024

I believe it's caused by pkg upgration. I saw the error logs of in controller:

 "Could not capability in ConfigMap" err="cannot create or update capability replication in ConfigMap: the server could not find the requested resource (post configmaps)"

It seems related to the client change in pkg, after I revert the pkg upgration commit, the controller works.

Might related to kubevela/pkg#69 or kubevela/pkg#83

@wonderflow
Copy link
Collaborator

Here I don't think we need to revert the pkg, it's the same for users to use 1.9.9 directly.

ping @Somefive, could you please help look into it to see if we have some cheap solution for the fix? Thanks.

@wonderflow wonderflow pinned this issue Apr 11, 2024
@Kolossi
Copy link
Contributor Author

Kolossi commented Apr 17, 2024

Here I don't think we need to revert the pkg, it's the same for users to use 1.9.9 directly.

It's not quite the same though @wonderflow @Somefive, as v1.9.10 includes my PR #6477 allowing service reuse/different service and container port on gateway - which I could really do with :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants