Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO-605: MCO-550: Remove Certificates from MachineConfig #3787

Merged
merged 1 commit into from Aug 16, 2023

Conversation

cdoern
Copy link
Contributor

@cdoern cdoern commented Jul 10, 2023

- What I did

In order to remove certificates from MC, the MCS needs to read the controller config both in the bootstrap server and the cluster server. Once this is in place, removing the certs from machine configs consists of ensuring the rendered configs do not contain the kube data to write into the files of the machine config. Accomplish this by deleting cert templates

- How to verify it

oc describe mc/xxxx the raw config/storage/files should no longer contain the following certs:

/etc/kubernetes/kubelet-ca.crt
/etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt

- Description for the changelog

ensure the MCS properly writes certificate data directly to ignition while also ensuring that all MachineConfigs do not have certificates in them.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 10, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 10, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 10, 2023
@cdoern cdoern changed the title Remove Certificates from MachineConfig MCO-605: MCO-550: Remove Certificates from MachineConfig Jul 10, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 10, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 10, 2023

@cdoern: This pull request references MCO-605 which is a valid jira issue.

In response to this:

- What I did

In order to remove certificates from MC, the MCS needs to read the controller config both in the bootstrap server and the cluster server. Once this is in place, removing the certs from machine configs consists of ensuring the rendered configs do not contain the kube data to write into the files of the machine config. Accomplish this by deleting cert templates

- How to verify it

oc describe mc/xxxx the raw config/storage/files should no longer contain any certificates

- Description for the changelog

ensure the MCS properly writes certificate data directly to ignition while also ensuring that all MachineConfigs do not have certificates in them.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cdoern
Copy link
Contributor Author

cdoern commented Jul 10, 2023

/jira refresh

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 10, 2023

@cdoern: This pull request references MCO-605 which is a valid jira issue.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member

Some intersection with #3729 I think

@cdoern cdoern marked this pull request as ready for review July 12, 2023 15:51
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 12, 2023
@cdoern
Copy link
Contributor Author

cdoern commented Jul 19, 2023

/retest-required

@cdoern
Copy link
Contributor Author

cdoern commented Jul 20, 2023

/retest-required

I think CI is fixed now

@cdoern
Copy link
Contributor Author

cdoern commented Jul 26, 2023

/retest-required

@cdoern cdoern force-pushed the removeCerts branch 11 times, most recently from 5a4eb7d to a5f9e7a Compare August 1, 2023 02:42
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 3, 2023
@rioliu-rh
Copy link

  • test install scenario: install 4.14 cluster with the image based on this PR and provision rhel8 worker nodes
    installed aws cluster with image registry.build05.ci.openshift.org/ci-ln-41jwz7k/release:latest and provisioned 2 rhel8 worker nodes: pass
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.ci.test-2023-08-04-010040-ci-ln-41jwz7k-latest   True        False         164m    Cluster version is 4.14.0-0.ci.test-2023-08-04-010040-ci-ln-41jwz7k-latest

# rhel node info
node -l node.openshift.io/os_id=rhel -o wide
NAME                                       STATUS   ROLES    AGE    VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                               KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-55-41.us-east-2.compute.internal   Ready    worker   125m   v1.27.3+e123787   10.0.55.41    <none>        Red Hat Enterprise Linux 8.6 (Ootpa)   4.18.0-477.15.1.el8_8.x86_64   cri-o://1.27.1-4.rhaos4.14.gitab7845e.el8
ip-10-0-61-50.us-east-2.compute.internal   Ready    worker   125m   v1.27.3+e123787   10.0.61.50    <none>        Red Hat Enterprise Linux 8.6 (Ootpa)   4.18.0-477.15.1.el8_8.x86_64   cri-o://1.27.1-4.rhaos4.14.gitab7845e.el8

check mc content to verify following certs are removed: pass
/etc/kubernetes/kubelet-ca.crt
/etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
/etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt

mc rendered-worker-a612a055640964b5467a71f7a1179e89 -o yaml | egrep "\.crt|\.pem"
        path: /etc/kubernetes/ca.crt

check above certs on rhel worker node: pass

debug node/ip-10-0-55-41.us-east-2.compute.internal -- chroot /host stat /etc/kubernetes/kubelet-ca.crt /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt
  File: /etc/kubernetes/kubelet-ca.crt
  Size: 8332      	Blocks: 24         IO Block: 4096   regular file
Device: ca02h/51714d	Inode: 109058491   Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:kubernetes_file_t:s0
Access: 2023-08-04 05:02:04.493771340 +0000
Modify: 2023-08-04 05:02:04.491771358 +0000
Change: 2023-08-04 05:02:04.493771340 +0000
 Birth: 2023-08-04 05:02:04.490771367 +0000
  File: /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
  Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
Device: ca02h/51714d	Inode: 92342056    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:kubernetes_file_t:s0
Access: 2023-08-04 05:02:04.493771340 +0000
Modify: 2023-08-04 05:02:04.493771340 +0000
Change: 2023-08-04 05:02:04.494771331 +0000
 Birth: 2023-08-04 05:02:04.493771340 +0000
  File: /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt
  Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
Device: ca02h/51714d	Inode: 25166966    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:cert_t:s0
Access: 2023-08-04 05:02:04.494771331 +0000
Modify: 2023-08-04 05:02:04.494771331 +0000
Change: 2023-08-04 05:02:04.495771322 +0000
 Birth: 2023-08-04 05:02:04.494771331 +0000

check mcd log to see, it wrote cert on the node according to controllerconfig

I0804 02:34:40.024993    5262 certificate_writer.go:161] Certificate was synced from controllerconfig resourceVersion 40676
I0804 02:35:37.104797    5262 certificate_writer.go:161] Certificate was synced from controllerconfig resourceVersion 41966
  • test upgrade scenario: install 4.13 cluster, provision new rhel8 worker nodes, then upgrade to the image based on this PR
    installed 4.13.7 cluster and provisioned 2 rhel8 worker nodes: pass
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.7    True        False         40m     Cluster version is 4.13.7

node -l node.openshift.io/os_id=rhel -o wide
NAME                                        STATUS   ROLES    AGE   VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                               KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-53-230.us-east-2.compute.internal   Ready    worker   14m   v1.26.6+73ac561   10.0.53.230   <none>        Red Hat Enterprise Linux 8.6 (Ootpa)   4.18.0-477.15.1.el8_8.x86_64   cri-o://1.26.4-3.rhaos4.13.git615a02c.el8
ip-10-0-59-120.us-east-2.compute.internal   Ready    worker   14m   v1.26.6+73ac561   10.0.59.120   <none>        Red Hat Enterprise Linux 8.6 (Ootpa)   4.18.0-477.15.1.el8_8.x86_64   cri-o://1.26.4-3.rhaos4.13.git615a02c.el8

pre upgrade check above certs exist in mc

mc rendered-worker-6700e4fc613c45609666b12da24ebfa4 -o yaml | egrep '\.crt|\.pem'
        path: /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt
        path: /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
        path: /etc/kubernetes/kubelet-ca.crt
        path: /etc/kubernetes/ca.crt

upgrade cluster to registry.build05.ci.openshift.org/ci-ln-41jwz7k/release:latest: pass

oc adm upgrade --to-image registry.build05.ci.openshift.org/ci-ln-41jwz7k/release:latest --force --allow-explicit-upgrade
warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
Requested update to release image registry.build05.ci.openshift.org/ci-ln-41jwz7k/release:latest

NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.ci.test-2023-08-04-010040-ci-ln-41jwz7k-latest   True        False         7m14s   Cluster version is 4.14.0-0.ci.test-2023-08-04-010040-ci-ln-41jwz7k-latest

post upgrade check mc content to verify no above certs found: pass

mc rendered-worker-e87b41a1570dbf0b781ff97342f309d9 -o yaml | egrep '\.crt|\.pem'
        path: /etc/kubernetes/ca.crt

check above certs on rhel worker node: pass

debug node/ip-10-0-53-230.us-east-2.compute.internal -- chroot /host stat /etc/kubernetes/kubelet-ca.crt /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt
  File: /etc/kubernetes/kubelet-ca.crt
  Size: 8332      	Blocks: 24         IO Block: 4096   regular file
Device: ca02h/51714d	Inode: 109107015   Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:kubernetes_file_t:s0
Access: 2023-08-04 06:23:54.246234133 +0000
Modify: 2023-08-04 06:23:54.244234150 +0000
Change: 2023-08-04 06:23:54.246234133 +0000
 Birth: 2023-08-04 06:23:54.244234150 +0000
  File: /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
  Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
Device: ca02h/51714d	Inode: 117481765   Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:kubernetes_file_t:s0
Access: 2023-08-04 06:23:54.246234133 +0000
Modify: 2023-08-04 06:23:54.246234133 +0000
Change: 2023-08-04 06:23:54.247234125 +0000
 Birth: 2023-08-04 06:23:54.246234133 +0000
  File: /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt
  Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
Device: ca02h/51714d	Inode: 25166966    Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:cert_t:s0
Access: 2023-08-04 06:23:54.247234125 +0000
Modify: 2023-08-04 06:23:54.247234125 +0000
Change: 2023-08-04 06:23:54.248234117 +0000
 Birth: 2023-08-04 06:23:54.247234125 +0000

@rioliu-rh
Copy link

@cdoern @yuqi-zhang please confirm the test result. thanks

@rioliu-rh
Copy link

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 4, 2023
@cdoern
Copy link
Contributor Author

cdoern commented Aug 4, 2023

/payload 4.14 ci blocking
/payload 4.14 nightly blocking

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 4, 2023

@cdoern: trigger 5 job(s) of type blocking for the ci release of OCP 4.14

  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-azure-sdn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial
  • periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/42651f10-32ca-11ee-8506-9ea577b35548-0

trigger 8 job(s) of type blocking for the nightly release of OCP 4.14

  • periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.14-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-ovn-ipv6
  • periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-sdn-bm

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/42651f10-32ca-11ee-8506-9ea577b35548-1

@yuqi-zhang
Copy link
Contributor

yuqi-zhang commented Aug 4, 2023

Thanks for the tests @rioliu-rh ! I thought they would have failed since in ansible, we are using an oc get on the machineconfig: https://github.com/openshift/openshift-ansible/blob/2ba66d98edb01fb1ff1250053c09fddae0441b12/roles/openshift_node/tasks/apply_machine_config.yml#L23

which no longer contains certs, so the rhel worker node should have in theory gotten no certs to join the cluster with. To double check, is that the ansible script being used?

Edit: upon further inspection, it looks like https://github.com/openshift/openshift-ansible/blob/2ba66d98edb01fb1ff1250053c09fddae0441b12/roles/openshift_node/tasks/config.yml#L179-L184 is what's actually doing the bootstrap once-from, so using ignition and not MC, which is expected to work as Rio tested. Sorry for the confusion

@cdoern
Copy link
Contributor Author

cdoern commented Aug 7, 2023

/payload-job periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 7, 2023

@cdoern: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ab6d7670-3544-11ee-85db-3c439caed440-0

@cdoern
Copy link
Contributor Author

cdoern commented Aug 8, 2023

/payload 4.14 ci blocking

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2023

@cdoern: trigger 5 job(s) of type blocking for the ci release of OCP 4.14

  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-azure-sdn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial
  • periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e7344320-35ec-11ee-9f15-e81b432aba7d-0

@cdoern
Copy link
Contributor Author

cdoern commented Aug 8, 2023

the tests failed last time due to what looked like infrastructure reasons, running payload again

@cdoern
Copy link
Contributor Author

cdoern commented Aug 8, 2023

/payload 4.14 ci blocking

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2023

@cdoern: trigger 5 job(s) of type blocking for the ci release of OCP 4.14

  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-azure-sdn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial
  • periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/fa5aa2b0-360e-11ee-9490-e5247871e92d-0

In order to remove certificates from MC, the MCS needs to read the controller config
both in the bootstrap server and the cluster server. Once this is in place, removing the certs
from machine configs consists of ensuring the rendered configs do not contain the kube data to write into the
files of the machine config. Accomplish this by deleting cert templates

Signed-off-by: Charlie Doern <cdoern@redhat.com>
@cdoern
Copy link
Contributor Author

cdoern commented Aug 15, 2023

/test e2e-gcp-op

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've done quite a bit of testing and checking against rhel8 workers, hypershift, IBM, windows nodes. The fix for windows is in flight and everything else should work. With that context, I'd like to get this in (windows is broken on ignition spec 3.4 anyways).

/lgtm

Thanks for all the hard work and verification!

@yuqi-zhang
Copy link
Contributor

/hold

Upon further discussion, let's wait for the Windows fix to go through first

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 15, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 15, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cdoern, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD cf8e35e and 2 for PR HEAD c2bb862 in total

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 15, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 15, 2023

@cdoern: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn c2bb862 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cdoern
Copy link
Contributor Author

cdoern commented Aug 16, 2023

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 16, 2023
@openshift-merge-robot openshift-merge-robot merged commit e386166 into openshift:master Aug 16, 2023
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants