Skip to content
This repository has been archived by the owner on Jun 26, 2023. It is now read-only.

Move webhook setup before manager start #1087

Merged
merged 1 commit into from
Sep 9, 2020

Conversation

yiqigao217
Copy link
Contributor

According to
kubernetes-sigs/controller-runtime#1148,
there's a race condition between manager start and webhook register.
This commit moved webhook setup before manager start.

Tested on GKE cluster. The nil pointer error was not detected for 20
times.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 9, 2020
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 9, 2020
@yiqigao217
Copy link
Contributor Author

/assign @adrianludwin
/assign @rjbez17

Copy link
Contributor

@adrianludwin adrianludwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with minor nits. You don't need to test another 20 times after making my suggested changes, once will be fine :)

incubator/hnc/cmd/manager/main.go Show resolved Hide resolved
incubator/hnc/cmd/manager/main.go Outdated Show resolved Hide resolved
incubator/hnc/cmd/manager/main.go Show resolved Hide resolved
According to
kubernetes-sigs/controller-runtime#1148,
there's a race condition between manager start and webhook register.
This commit moved webhook setup before manager start.

Tested on GKE cluster. The nil pointer error was not detected for 20
times.
@yiqigao217
Copy link
Contributor Author

lgtm with minor nits. You don't need to test another 20 times after making my suggested changes, once will be fine :)

Updated. BTW tested 20 times because last time it took about 20 times to reproduce the race condition.

Copy link
Contributor

@adrianludwin adrianludwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@rjbez17 this is a fairly straightforward fix, lmk if you have any concerns about it but I'm just approving for now.

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianludwin, yiqigao217

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 9, 2020
@yiqigao217
Copy link
Contributor Author

/retest

looking the object reconciler issue now.

@k8s-ci-robot k8s-ci-robot merged commit ec7fbb5 into kubernetes-retired:master Sep 9, 2020
adrianludwin added a commit to adrianludwin/multi-tenancy that referenced this pull request Sep 21, 2020
This essentially reverts kubernetes-retired#1087, which breaks HNC on new clusters that
haven't previously had HNC installed. It fixes the nondeterministic
crashing problem by patching in
kubernetes-sigs/controller-runtime#1155, which
has been applied to controller-runtime 0.6.3 in @adrianludwin's repo.
This is a temporary hack and will be removed when controller-runtime
releases its own fix - likely 0.6.4.

Tested: with the reversion of kubernetes-retired#1087 (main.go), HNC can be installed on a
fresh cluster again but fails to start up ~50% of the time. With the fix
to controller-runtime, it passes on 20/20 startup attempts. Ran e2e
tests and got the same result as without this change (four failures).
adrianludwin added a commit to adrianludwin/multi-tenancy that referenced this pull request Sep 21, 2020
This essentially reverts kubernetes-retired#1087, which breaks HNC on new clusters that
haven't previously had HNC installed. It fixes the nondeterministic
crashing problem by patching in
kubernetes-sigs/controller-runtime#1155, which
has been applied to controller-runtime 0.6.3 in adrianludwin's repo.
This is a temporary hack and will be removed when controller-runtime
releases its own fix - likely 0.6.4.

Tested: with the reversion of kubernetes-retired#1087 (main.go), HNC can be installed on a
fresh cluster again but fails to start up ~50% of the time. With the fix
to controller-runtime, it passes on 20/20 startup attempts. Ran e2e
tests and got the same result as without this change (four failures).
@adrianludwin adrianludwin added this to the hnc-v0.6 milestone Sep 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants