Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPSTREAM: 88120: add dynamic certificate reloading for kube aggregator #24607

Merged

Conversation

p0lyn0mial
Copy link
Contributor

@p0lyn0mial p0lyn0mial commented Feb 27, 2020

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 27, 2020
@p0lyn0mial
Copy link
Contributor Author

/hold

Do not review it.
I want to test it before. I created this PR to get CI feedback.

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. vendor-update Touching vendor dir or related files labels Feb 27, 2020
@p0lyn0mial
Copy link
Contributor Author

/retest

@p0lyn0mial
Copy link
Contributor Author

alright, CI feedback was positive. What's the best way to test it? Create a cluster and leave it for 24h?

@p0lyn0mial
Copy link
Contributor Author

/assign @deads2k @sttts

if err != nil {
panic(err)
}
discoveryClient := &http.Client{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I have an idea (check the upstream PR) how to change that code so that we don't have to create a new http client on every sync (every 30s).

@p0lyn0mial
Copy link
Contributor Author

/retest

@mfojtik
Copy link
Member

mfojtik commented Mar 2, 2020

/lgtm

@deads2k PTAL

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 2, 2020
@p0lyn0mial
Copy link
Contributor Author

Alright, so I managed to test it more or less.

Because I failed to create a cluster on GCP I modified the operator to rotate the certificates more quickly and created a cluster on AWS.

I haven't found any suspicious entries in the logs at the time of certificate reloading except:

E0302 14:14:50.341544       1 available_controller.go:415] v1.packages.operators.coreos.com failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1.packages.operators.coreos.com": the object has been modified; please apply your changes to the latest version and try again
E0302 14:14:50.343971       1 available_controller.go:415] v1.packages.operators.coreos.com failed with: failing or missing response from https://10.128.0.107:5443/apis/packages.operators.coreos.com/v1: bad status from https://10.128.0.107:5443/apis/packages.operators.coreos.com/v1: 401
E0302 14:14:54.152041       1 available_controller.go:415] v1.packages.operators.coreos.com failed with: failing or missing response from https://10.129.0.110:5443/apis/packages.operators.coreos.com/v1: bad status from https://10.129.0.110:5443/apis/packages.operators.coreos.com/v1: 401
E0302 14:14:59.129673       1 available_controller.go:415] v1.packages.operators.coreos.com failed with: failing or missing response from https://10.129.0.110:5443/apis/packages.operators.coreos.com/v1: bad status from https://10.129.0.110:5443/apis/packages.operators.coreos.com/v1: 401
E0302 14:15:02.159001       1 available_controller.go:415] v1.packages.operators.coreos.com failed with: Operation cannot be fulfilled on apiservices.apiregistration.k8s.io "v1.packages.operators.coreos.com": the object has been modified; please apply your changes to the latest version and try again
E0302 14:15:12.415322       1 controller.go:114] loading OpenAPI spec for "v1.packages.operators.coreos.com" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: Error trying to reach service: 'x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "Red Hat, Inc.")', Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I0302 14:15:12.415362       1 controller.go:127] OpenAPI AggregationController: action for item v1.packages.operators.coreos.com: Rate Limited Requeue.

which seems to be related to #24607 (comment) as the certificate will be reloaded after 30 seconds in the worst case.

Additionally, @sttts suggested to manually update the secret (kubectl edit secrets -n openshift-kube-apiserver aggregator-client) and that was reflected in the logs.

@p0lyn0mial
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Mar 2, 2020
@p0lyn0mial
Copy link
Contributor Author

/retest

if err := aggregatorProxyCerts.RunOnce(); err != nil {
return nil, err
}
aggregatorProxyCerts.AddListener(apiserviceRegistrationController)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order is correct? Would expect AddListener before the RunOnce()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does the initial loading? Then it's fine I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is correct, we use CurrentCertKeyContent function for loading the certificate.

@sttts
Copy link
Contributor

sttts commented Mar 3, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 3, 2020
@p0lyn0mial
Copy link
Contributor Author

/cherrypick release-4.4

@openshift-cherrypick-robot

@p0lyn0mial: once the present PR merges, I will cherry-pick it on top of release-4.4 in a new PR and assign it to you.

In response to this:

/cherrypick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mfojtik, p0lyn0mial, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-cherrypick-robot

@p0lyn0mial: new pull request created: #24621

In response to this:

/cherrypick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants