New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[2.5] Fix the problem of handler registration failure caused by inconsistent context #35172
Conversation
@niusmallnan good point- I'm reviewing and testing this so I'll also see if this change fixes #32045 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deployed HA validated that this is a problem and then upgraded to a custom image that contains the fix and saw the correct behavior. I also was not able to reproduce #32045 with the new changes, so this looks to have fixed that issue as well. Approving, but we'll need one more review on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic looks fine, however this looks like it was done intentionally. This also is basically "framework related", so my approval is contingent on approval from @ibuildthecloud.
… by inconsistent context
eb63fa6
to
643d392
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In Rancher HA mode, each replica will create an access control cache for each downstream cluster no matter it's the clusterOwner or not. In access control cache, we start a handler here to handle role revisions change when clusterrole/role changed at the downstream cluster.
For further investigation, I found some inconsistent context to start the access control cache.
Sometimes it using context.Background() to new AccessControl, but sometimes it using ctx which inherited from wrangler context(This context is a HandlerTransaction context). When it using context.Background() to create an access control cache, the handler could start correctly, so that some replicas got the correct result. If it using ctx which inherited from wrangler context(ReigsterHandler type), the handler in access control cache will never start.
Then I found the HandlerTransaction context for steve controller.
So if it using ctx which inherited from wrangler context and has already called Commit function when rancher started, all handlers register after that will never start.
We create a new HandlerTransaction here to start cluster controllers but not using this transaction to create access control. I think it's better to use the same context to register handlers to make sure all handlers can be started consistently.
#31982