Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.5] Fix the problem of handler registration failure caused by inconsistent context #35172

Merged
merged 1 commit into from Nov 10, 2021

Conversation

JacieChao
Copy link
Contributor

In Rancher HA mode, each replica will create an access control cache for each downstream cluster no matter it's the clusterOwner or not. In access control cache, we start a handler here to handle role revisions change when clusterrole/role changed at the downstream cluster.

For further investigation, I found some inconsistent context to start the access control cache.

  1. https://github.com/rancher/rancher/blob/release/v2.5/pkg/clustermanager/manager.go#L480
  2. https://github.com/rancher/rancher/blob/release/v2.5/pkg/clustermanager/manager.go#L457
  3. https://github.com/rancher/rancher/blob/release/v2.5/pkg/clustermanager/manager.go#L95

Sometimes it using context.Background() to new AccessControl, but sometimes it using ctx which inherited from wrangler context(This context is a HandlerTransaction context). When it using context.Background() to create an access control cache, the handler could start correctly, so that some replicas got the correct result. If it using ctx which inherited from wrangler context(ReigsterHandler type), the handler in access control cache will never start.

Then I found the HandlerTransaction context for steve controller.

  • When we called RegisterHandler here, if the ctx is HandlerTransaction type, will add handler function to todo list, if not, it will start handler function directly.
  • If the ctx is HandlerTransaction type, we can call Commit function to start all handlers here.

So if it using ctx which inherited from wrangler context and has already called Commit function when rancher started, all handlers register after that will never start.

We create a new HandlerTransaction here to start cluster controllers but not using this transaction to create access control. I think it's better to use the same context to register handlers to make sure all handlers can be started consistently.

#31982

@JacieChao JacieChao changed the title Fix the problem of handler registration failure caused by inconsistent context [2.5] Fix the problem of handler registration failure caused by inconsistent context Oct 19, 2021
@niusmallnan
Copy link
Contributor

@ryansann If the accesstore handlers in AccessControl can not be registered, the hash key of AccessControl cache will also be calculated incorrectly. Not sure if this issue is related to #32045 .

@ryansann
Copy link
Contributor

@niusmallnan good point- I'm reviewing and testing this so I'll also see if this change fixes #32045

ryansann
ryansann previously approved these changes Oct 19, 2021
Copy link
Contributor

@ryansann ryansann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deployed HA validated that this is a problem and then upgraded to a custom image that contains the fix and saw the correct behavior. I also was not able to reproduce #32045 with the new changes, so this looks to have fixed that issue as well. Approving, but we'll need one more review on this.

Copy link
Contributor

@rmweir rmweir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks fine, however this looks like it was done intentionally. This also is basically "framework related", so my approval is contingent on approval from @ibuildthecloud.

@ryansann ryansann self-requested a review November 9, 2021 23:55
Copy link
Contributor

@rmweir rmweir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants