You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Cluster object stays in failed state after a credentials problem is resolved.
failureMessage: |
Failure detected from referenced resource infrastructure.cluster.x-k8s.io/v1beta1, Kind=OpenStackCluster with name "rotate-creds-cnrwq": failed to reconcile external network: failed to get external network: Unable to re-authenticate: Expected HTTP response code [200] when accessing [GET https://url/v2.0/networks/54258498-a513-47da-9369-1a644e4be692], but got 401 instead
{"error": {"code": 401, "title": "Unauthorized", "message": "The request you have made requires authentication."}}: Resource not found: [POST https://url/v3/auth/tokens], error message: {"error":{"code":404,"message":"Could not find Application Credential: OLD ID.","title":"Not Found"}}
What steps did you take and what happened:
Delete used appcreds and wait for failure to accour
Create new app creds and update all resources
Verify that all actions are possible again, on mgmt cluster or in cluster
What did you expect to happen:
Cluster state is good again after fix
Anything else you would like to add:
Looks like Creds errors are handelt as a terminal error and not as a transient error.
Seems to be like a problem we had with the Machines too
Just for referance and helping everyone with broken cluster.
A manual temp fix is kubectl patch cluster CLUSTERNAME --subresource=status --type='merge' -p'{"status":{"failureReason": null, "failureMessage": null}}'
I noticed yesterday that the same behavior seems to occur if you MGMT cluster having problem with DNS. I had ~10 clusters in Failed state after one of our DNS servers experienced problems. (meaning failureMessage indicated a failed DNS lookup)
I have not had time to dig deeper however I can confirm the patch above works just fine :)
/kind bug
The Cluster object stays in failed state after a credentials problem is resolved.
What steps did you take and what happened:
Delete used appcreds and wait for failure to accour
Create new app creds and update all resources
Verify that all actions are possible again, on mgmt cluster or in cluster
What did you expect to happen:
Cluster state is good again after fix
Anything else you would like to add:
Looks like Creds errors are handelt as a terminal error and not as a transient error.
Seems to be like a problem we had with the Machines too
Code for the machine problem as an idea how to fix: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/7a19fb6f038[…]a31df228877a446b9b99/controllers/openstackmachine_controller.go
Code in Openstack cluster controller with the problem: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/7a19fb6f038[…]a31df228877a446b9b99/controllers/openstackcluster_controller.go
Environment:
git rev-parse HEAD
if manually built): v010.3The text was updated successfully, but these errors were encountered: