Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flake: Integration test reports 403 on /healthz, possibly due to failure to register or report API group available #15648

Closed
smarterclayton opened this issue Aug 6, 2017 · 6 comments · Fixed by #15654
Assignees
Labels
component/auth kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1

Comments

@smarterclayton
Copy link
Contributor

smarterclayton commented Aug 6, 2017

Master integration test failed because master never went healthy (healthz returned 403). Looks like failure to initialize some or all of the roles, or an inability to check the permissions

https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_integration/5489/testReport/junit/github/com_openshift_origin_test_integration_runner/TestOAuthOIDC/

Attached output from unit file junit.zip

@smarterclayton smarterclayton added component/auth kind/test-flake Categorizes issue or PR as related to test flakes. labels Aug 6, 2017
@smarterclayton smarterclayton changed the title Integration test fails due to not all roles/bindings being created flake: Integration test fails due to not all roles/bindings being created Aug 6, 2017
@smarterclayton
Copy link
Contributor Author

Fairly sure this is not a test flake, but a real problem with the server.

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Aug 6, 2017

Saw 403 /healthz a second time in the next run (so 2 out of 200 integration test runs). In the second run I saw:

    		I0806 18:31:24.082581   32400 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1.authentication.k8s.io: (753.664µs) 409 [[integration.test/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 127.0.0.1:54696]
    		I0806 18:31:24.082876   32400 handler.go:150] kube-aggregator: PUT "/apis/apiregistration.k8s.io/v1beta1/apiservices/v1beta1.authentication.k8s.io/status" satisfied by gorestful with webservice /apis/apiregistration.k8s.io/v1beta1
    		E0806 18:31:24.083933   32400 autoregister_controller.go:167] v1.authentication.k8s.io failed with : Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1.authentication.k8s.io": the object has been modified; please apply your changes to the latest version and try again

https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_integration/5491/

The larger snippet is

    I0806 18:31:24.077326   32400 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1.authentication.k8s.io/status: (5.385278ms) 200 [[integra
    I0806 18:31:24.077689   32400 wrap.go:42] GET /api/v1/namespaces/default/resourcequotas: (22.40711ms) 200 [[integration.test/v1.7.0+695f48a16f (linux/amd6
    I0806 18:31:24.078556   32400 wrap.go:42] GET /oapi/v1/clusterroles/system:openshift:controller:deployer-controller: (21.027588ms) 404 [[integration.test/
    I0806 18:31:24.080064   32400 apiservice_controller.go:164] Adding v1.user.openshift.io
    I0806 18:31:24.080077   32400 apiservice_controller.go:170] Updating v1.authentication.k8s.io
    I0806 18:31:24.080088   32400 available_controller.go:310] Adding v1.user.openshift.io
    I0806 18:31:24.080099   32400 available_controller.go:316] Updating v1.authentication.k8s.io
    I0806 18:31:24.082581   32400 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1.authentication.k8s.io: (753.664µs) 409 [[integration.tes
    E0806 18:31:24.083933   32400 autoregister_controller.go:167] v1.authentication.k8s.io failed with : Operation cannot be fulfilled on apiservices.apiregis

Second junit output

junit2.zip

@smarterclayton smarterclayton changed the title flake: Integration test fails due to not all roles/bindings being created flake: Integration test reports 403 on /healthz, possibly due to failure to register or report API group available Aug 6, 2017
@smarterclayton
Copy link
Contributor Author

Second run looks like maybe the subsequent succeeds - not positive

    I0806 18:31:24.082581   32400 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1.authentication.k8s.io: (753.664µs) 409 [[integration.test/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 127.0.0.1:54696]
    I0806 18:31:24.098959   32400 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1beta1.authentication.k8s.io: (8.871418ms) 409 [[integration.test/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 127.0.0.1:54696]
    I0806 18:31:24.216312   32400 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1.authentication.k8s.io: (41.469709ms) 200 [[integration.test/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 127.0.0.1:54696]
    I0806 18:31:24.229597   32400 wrap.go:42] PUT /apis/apiregistration.k8s.io/v1beta1/apiservices/v1beta1.authentication.k8s.io: (29.416326ms) 200 [[integration.test/v1.7.0+695f48a16f (linux/amd64) kubernetes/d2e5420] 127.0.0.1:54696]

@smarterclayton
Copy link
Contributor Author

In run 1 I see a 500 being returned by /healthz:

poststarthook/ca-registration failed: reason withheld
autoregister-completion failed: reason withheld
poststarthook/authorization.openshift.io-ensureopenshift-infra failed: reason withheld

@smarterclayton
Copy link
Contributor Author

3 / 4 runs happens again:

https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_integration/5497/

Not seen any other flakes except for this. Similar 409's on registration, not seeing anything related to healthz.

@smarterclayton
Copy link
Contributor Author

Ran two healthy runs. So looks like ~ 1 / 200.

openshift-merge-robot added a commit that referenced this issue Aug 8, 2017
Automatic merge from submit-queue

reconcile cluster roles instead of overwriting

fixes #15648

Moving to post-start hooks introduced a policy creation race.  This pull fixes the race by unconditionally reconciling like we will in 3.7 when we switch to RBAC.  I also made the reconcile cluster roles respect the annotation we use to protect the resources from reconciliation.

@openshift/security fyi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/auth kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants