-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor csr approvals #9711
refactor csr approvals #9711
Conversation
def get_ready_nodes(module, oc_bin, oc_conf): | ||
'''Get list of nodes currently ready vi oc''' | ||
# plain output is way less to parse. | ||
command = "{} {} get nodes".format(oc_bin, oc_conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An overengineered version would be:
oc get nodes -o jsonpath="{range .items[*]}{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status};{end}{end}"
to output smth like
ip-172-18-14-252.ec2.internal:OutOfDisk=False;MemoryPressure=False;DiskPressure=False;PIDPressure=False;Ready=True;
split it and filter out lines with Ready=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's incomprehensible and probably just as much splitting and filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back to past discussions where David had stated that oc output is not guaranteed stable I think we should either get yaml output or a template like Vadim is suggesting. Perhaps not as complicated, but it still needs to be a format that we're sure of so if they change column ordering we're not busted.
3a08096
to
19d554d
Compare
76cedc7
to
45a59ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run this in a few different iterations against a local cluster (removed certs of a node, restarted), though not on a fresh cluster. Hopefully CI passes :)
|
||
short_description: Retrieve, approve, and verify node client csrs | ||
|
||
version_added: "2.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.6
from ansible.module_utils.basic import AnsibleModule | ||
|
||
try: | ||
from json.decoder import JSONDecodeError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python2 vs python 3.4+
oc_csr_client: | ||
oc_bin: "/usr/bin/oc" | ||
oc_conf: "/etc/origin/master/admin.kubeconfig" | ||
node_list: "['node1.example.com', 'node2.example.com']" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove double quotes on this line.
roles/lib_openshift/library/t.json
Outdated
@@ -0,0 +1,6 @@ | |||
{"ANSIBLE_MODULE_ARGS": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file.
@@ -0,0 +1,288 @@ | |||
#!/usr/bin/env python | |||
# pylint: disable=missing-docstring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line and add some docstrings :)
|
||
DOCUMENTATION = ''' | ||
--- | ||
module: oc_csr_client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure what we should name it though. All the other oc modules actually map pretty much 1:1 with oc foo bar
cli interactions.
supports_check_mode=False, | ||
argument_spec=module_args | ||
) | ||
oc_bin = module.params['oc_bin'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stuff gets passed around a lot. Probably should refactor into a class object, this will help will reporting during failures.
a7cbcec
to
1a96157
Compare
c87511b
to
37f799f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it's probably functionally correct, just some nits about module name and defaults.
- name: Approve node certificates when bootstrapping | ||
oc_csr_client: | ||
oc_bin: "{{ openshift_client_binary }}" | ||
oc_conf: "{{ openshift.common.config_base }}/master/admin.kubeconfig" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're naming this module oc_ I'd expect it to default this like all the other oc modules do.
when: | ||
- l_nodes_to_join|length > 0 | ||
|
||
- when: approve_out is failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we're sure this is 100% lets leave in debug log generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer necessary. No module provides actually useful failure information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sounds good
|
||
DOCUMENTATION = ''' | ||
--- | ||
module: oc_csr_client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure what we should name it though. All the other oc modules actually map pretty much 1:1 with oc foo bar
cli interactions.
def get_ready_nodes(module, oc_bin, oc_conf): | ||
'''Get list of nodes currently ready vi oc''' | ||
# plain output is way less to parse. | ||
command = "{} {} get nodes".format(oc_bin, oc_conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back to past discussions where David had stated that oc output is not guaranteed stable I think we should either get yaml output or a template like Vadim is suggesting. Perhaps not as complicated, but it still needs to be a format that we're sure of so if they change column ordering we're not busted.
f49176d
to
a9c01f0
Compare
Currently, csr approval process for nodes is quite fragile. This commit creates a new custom module oc_csr_approve which facilitates handling the multiple steps involved for approving pending node certificates. The module attempts to approve all 'client' csrs for any nodes provided via node_list, missing csrs are ignored as long as the missing node is in a 'Ready' status as reported by oc get nodes. Next, the module approves csrs for 'server' certificates. Similar to the client process, missing node csrs are acceptable as long as the node's api endpoint is reachable without error, indicating a server certificate is deployed. In cases of long delay between issuing a csr and csr approval, there may be several outstanding 'server' csrs. This module will approve any outstanding csrs. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1571515
a9c01f0
to
fef0430
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Worked perfectly here
/test install |
@michaelgugino: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
May require tweaking of the retries and time but we can figure that out later. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: michaelgugino, sdodson, vrutkovs The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
gcp-crio was an e2e flake. |
/cherrypick release-3.10 |
@michaelgugino: #9711 failed to apply on top of branch "release-3.10":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Currently, csr approval process for nodes is quite
fragile.
This commit creates a new custom module oc_csr_client
which facilitates handling the multiple steps involved
for approving pending node certificates.
The module attempts to approve all 'client' csrs
for any nodes provided via node_list, missing csrs
are ignored as long as the missing node is in a
'Ready' status as reported by oc get nodes.
Next, the module approves csrs for 'server' certificates.
Similar to the client process, missing node csrs
are acceptable as long as the node's api endpoint
is reachable without error, indicating a server
certificate is deployed.
In cases of long delay between issuing a csr and
csr approval, there may be several outstanding
'server' csrs. This module will approve any
outstanding csrs.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1571515