Controller clearing status from Mig objects (possibly RBAC related) #275

eriknelson · 2019-08-16T19:30:48Z

A bunch of us have hit this issue over the last couple days (@pranavgaikwad, myself, and just now @jwmatthews). It will look like the UI has locked up during PV discovery, or validation, or check connection on clusters or storage. All of these operations rely on the status being updated by the controller or timing out. Upon digging into it more, the Mig objects either 1) never get an initial status, or 2) have their existing status wiped so there is no status object on the mig object any longer. This just depends on when the issue strikes. After seeing the absence of the status, logging the controller pod reveals the following RBAC error:

E0816 19:01:01.845757       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to list *v1.Deployment: deployments.apps is forbidden: User "system:serviceaccount:mig:default" cannot list resource "deployments" in API group "apps" at the cluster scope

I just helped John debug his cluster, which had been up and working about 20 minutes before this appeared. Here are his controller logs: https://gist.github.com/fefcf7f47d8c9b7e1571b3165cbfa9dd

When I helped Pranav, I just gave his SA cluster-admin and obviously it was able to do everything it needed to do, so the problem went away.

The odd thing about this is it seems to appear sometime after everything has been functioning fine? If this were simply a misconfigured role, I would expect nothing to be working from initial deployment.

The text was updated successfully, but these errors were encountered:

jwmatthews · 2019-08-17T11:54:14Z

Adding some background from my usage.
I deployed a cluster on Thursday evening 8/15 ~6pm.
I tested multiple migrations of same mssql-persistent namespace.
After a migration I would:

Delete the migmigrations
Delete the migplan
Delete the mssql-persistent namespace from destination
Scale the app back up on source cluster

I did not do much of anything else, left the velero Backup, Restore CRs present.
I also was not closing the plans.

I used the cluster to perform migrations several times ~5 migrations in evening, worked fine.
Friday morning 8/16, I saw some odd issues with a migration failing if I reused the same name of the MigPlan. Even though I deleted the MigPlan/MigMigrations, I could sometimes reuse same name sometimes i couldn't. I did see the registry pods were still present as I hadn't finalized the plans.

I manually deleted a few of the registry pods.

I still had a working setup for migrations at this point, I was changing the name of MigPlan and migrations were happening.

Around 1:40pm was last successful migration on 8/16.
I attempted to demo functionality at ~2:30pm 8/16.

First thing I saw was that the UI was having issues with check connection, like it couldn't talk to backend.

I attempted to walk through wizard and got stuck at PV Discovery as Erik mentioned.

Grabbing some more info below:
https://gist.github.com/jwmatthews/2531d26af8475617c377fa71cf0d569f

$ oc describe clusterrole.rbac &> clusterrole.rbac.logs
https://gist.github.com/jwmatthews/0e2ab32682dfda0d8e4e48c1d4dc1d20

jwmatthews · 2019-08-18T14:08:37Z

Below is from a cluster I recently provisioned, installed mig-operator ~10 minutes ago.
About to do some migrations, grabbing a view of the clusterrole.rbac info incase anything we want to compare from initial state to later state.

$ oc describe clusterrole.rbac &> clusterrole.rbac_0817cluster.logs

$ gist clusterrole.rbac_0817cluster.logs
https://gist.github.com/b96602cc1679838cd53e9a4277881ed9

jortel · 2019-08-23T15:36:13Z

This has been fixed by: migtools/mig-operator#40, right?

jwmatthews · 2019-08-23T16:13:03Z

I'm OK closing for now, if we see it again re-open.

jortel mentioned this issue Aug 23, 2019

Add apps ApiGroup to controller cluster-role migtools/mig-operator#40

Merged

jwmatthews closed this as completed Aug 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller clearing status from Mig objects (possibly RBAC related) #275

Controller clearing status from Mig objects (possibly RBAC related) #275

eriknelson commented Aug 16, 2019

jwmatthews commented Aug 17, 2019

jwmatthews commented Aug 18, 2019

jortel commented Aug 23, 2019

jwmatthews commented Aug 23, 2019

Controller clearing status from Mig objects (possibly RBAC related) #275

Controller clearing status from Mig objects (possibly RBAC related) #275

Comments

eriknelson commented Aug 16, 2019

jwmatthews commented Aug 17, 2019

jwmatthews commented Aug 18, 2019

jortel commented Aug 23, 2019

jwmatthews commented Aug 23, 2019