New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controller clearing status from Mig objects (possibly RBAC related) #275
Comments
Adding some background from my usage.
I did not do much of anything else, left the velero Backup, Restore CRs present. I used the cluster to perform migrations several times ~5 migrations in evening, worked fine. I manually deleted a few of the registry pods. I still had a working setup for migrations at this point, I was changing the name of MigPlan and migrations were happening. Around 1:40pm was last successful migration on 8/16. First thing I saw was that the UI was having issues with check connection, like it couldn't talk to backend. I attempted to walk through wizard and got stuck at PV Discovery as Erik mentioned. Grabbing some more info below: $ oc describe clusterrole.rbac &> clusterrole.rbac.logs |
Below is from a cluster I recently provisioned, installed mig-operator ~10 minutes ago. $ oc describe clusterrole.rbac &> clusterrole.rbac_0817cluster.logs $ gist clusterrole.rbac_0817cluster.logs |
This has been fixed by: migtools/mig-operator#40, right? |
I'm OK closing for now, if we see it again re-open. |
A bunch of us have hit this issue over the last couple days (@pranavgaikwad, myself, and just now @jwmatthews). It will look like the UI has locked up during PV discovery, or validation, or check connection on clusters or storage. All of these operations rely on the status being updated by the controller or timing out. Upon digging into it more, the Mig objects either 1) never get an initial status, or 2) have their existing status wiped so there is no status object on the mig object any longer. This just depends on when the issue strikes. After seeing the absence of the status, logging the controller pod reveals the following RBAC error:
I just helped John debug his cluster, which had been up and working about 20 minutes before this appeared. Here are his controller logs: https://gist.github.com/fefcf7f47d8c9b7e1571b3165cbfa9dd
When I helped Pranav, I just gave his SA cluster-admin and obviously it was able to do everything it needed to do, so the problem went away.
The odd thing about this is it seems to appear sometime after everything has been functioning fine? If this were simply a misconfigured role, I would expect nothing to be working from initial deployment.
The text was updated successfully, but these errors were encountered: