Additional trusted certificate authorities for image registries #1295

dmage · 2022-12-01T17:10:53Z

No description provided.

dmage · 2022-12-01T17:26:42Z

/assign @mrunalp @sinnykumari
/cc @flavianmissi

There is a intention to make the image registry operator optional, and we are looking for a new way to distribute certificate authorities for registries. It seems MCO is the best fit for this feature. I briefly described my idea how it could work without the registry operator, please let me know if something is unclear or you have other ideas.

wking · 2022-12-01T21:02:03Z

enhancements/machine-config/certificate-authorities-for-image-registries.md

+
+### User Stories
+
+* As a <role>, I want to <take some action> so that I can <accomplish a


Possibly:

As a customer, I want to be able to remove the image-registry operator, so I can save some CPU/memory overhead in a cluster that does not need an internal image registry. But I still need external image registry CAs listed in the cluster Images distributed so that...

or some such?

wking · 2022-12-01T21:04:20Z

enhancements/machine-config/certificate-authorities-for-image-registries.md

+* Make additionalTrustedCA be managed by machine-config-operator.
+* Create a mechanism for the cluster-image-registry-operator to distribute
+  certificate authorities for the integrated image registry.
+* Remove the node-ca daemon set.


nit: Removing the DaemonSet is a nice simplification, but I think it's an implementation/design choice, and not a goal in its own right. Possibly "minimize redundant pathways for getting data onto node disks" or something is a goal that is motivating the DaemonSet-removal?

wking · 2022-12-01T21:10:27Z

enhancements/machine-config/certificate-authorities-for-image-registries.md

+the `openshift-config-managed` namespace and merge its content with the
+user-provided config map. This will allow cluster-image-registry-operator to
+provide certificate authorities for the integated image registry without
+changing the user-provided config map.


Unsaid in the current proposal, but implied by the enhancements/machine-config/ position, is that the output of this controller will be MachineConfigs feeding into whichever MachineConfigPools are necessary to get this new content rolled out over the whole cluster. Can we spell that out in detail here, possibly with feedback from the MachineConfig folks about how to target whichever MachineConfigPools are required?

Yes, I think this would generate MachineConfig.

wking · 2022-12-01T21:11:03Z

enhancements/machine-config/certificate-authorities-for-image-registries.md

+    -----END CERTIFICATE-----
+```
+
+The cluster-config-operator should update every node to propagete these


nit: "propagete" -> propagate"

the cluster-config-operator is a dead repo. It contains no code and should contain no code. It has no owners. It holds CRD and CR definitions required to bootstrap the first master, nothing more.

#812 had floated a node operator. Allowing folks to save some CPU/memory by removing the registry operator, while replacing it with a new node operator, doesn't actually win many resources back for the customer. But #812 shows that we have other reasons for wanting the node folks to have their own operator, including avoiding the need to do things like openshift/machine-config-operator#1453. No worries from me if the image-registry folks can find a team with an existing core operator to hand this functionality off to. Or if the node folks can continue to talk other existing core operators into letting them co-maintain node-associated portions. But I'm cross-linking #812 in case folks decide to go that route instead.

I meant machine-config-operator, it already manages files like registries.conf. Updated.

wking · 2022-12-01T21:16:06Z

enhancements/machine-config/certificate-authorities-for-image-registries.md

+
+If the config maps are updated, old certificate authorities should be removed
+and new ones should be added. Nodes nor applications shouldn't be restarted to
+apply changes.


Will these changes require a CRI-O reload to take effect? If so, and if MachineConfig end up being used as the delivery mechanism, this will need a bump.

The existing mechanism doesn't reload anything, and it seems to work.

Trevor is right here, if we need to avoid disruption here then we'll need to add these paths to the live-apply set.

It may actually need a restart for these fields. I can't find the special handling for a CRI-O reload for this path

@dmage and @haircommander Did we ever reach conclusion on this? Oleg says it works currently without reloading cri-o but Peter says no.

@mtrmac do we read the /etc/docker/certs.d each time we pull or is it cached at all? I would have expected it to be cached similarly to registries.conf but the current behavior imlies it's not

That’s https://github.com/containers/image/blob/bb66acc37f166c03470cb58d3b8c808c288e1c2a/docker/docker_client.go#L268 , and it’s not cached at all - it’s read on every pull afresh (and in case of CRI-O, actually multiple times per pull). That’s not a documented/committed behavior, but I also can’t see any reason that would motivate us to change.

(And yes, registries.conf is cached. Sadly, the (too many) config files in c/image were added over time and are not very consistent in their file formats, lookup locations, or behaviors.)

cgwalters · 2022-12-08T15:08:16Z

enhancements/machine-config/certificate-authorities-for-image-registries.md

+* Make additionalTrustedCA be managed by machine-config-operator.
+* Create a mechanism for the cluster-image-registry-operator to distribute
+  certificate authorities for the integrated image registry.
+* Remove the node-ca daemon set.


Several of us on the MCO/CoreOS side were really surprised to discover this node-ca daemonset a while ago. The fewer things that mutate the node state the better.

cc @jkyros

cgwalters · 2022-12-08T15:09:05Z

enhancements/machine-config/certificate-authorities-for-image-registries.md

+the `openshift-config-managed` namespace and merge its content with the
+user-provided config map. This will allow cluster-image-registry-operator to
+provide certificate authorities for the integated image registry without
+changing the user-provided config map.


Yes, I think this would generate MachineConfig.

cgwalters · 2022-12-08T15:10:12Z

enhancements/machine-config/certificate-authorities-for-image-registries.md

+
+If the config maps are updated, old certificate authorities should be removed
+and new ones should be added. Nodes nor applications shouldn't be restarted to
+apply changes.


Trevor is right here, if we need to avoid disruption here then we'll need to add these paths to the live-apply set.

openshift-bot · 2023-01-19T01:16:07Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

flavianmissi · 2023-01-24T12:41:18Z

/remove-lifecycle stale

sinnykumari · 2023-01-30T14:39:59Z

enhancements/machine-config/certificate-authorities-for-image-registries.md

+    name: registry-ca
+```
+
+If the image registry operator is installed, there should be a config map:


As we are talking here about making image-registry operator optional, what happens when image-registry operator is not installed?

The image-registry-ca config map is required for the image registry to correctly work in a cluster.
When the image registry operator is not installed in the cluster, the image-registry-ca config map will not be present.

The important thing here is that when it is present, the MCO should distribute the CA on the cluster nodes (which the registry operator currently does, but will stop doing).
It's worth noting that this would be the path of least friction. Another option would be to rely on machine configs (created by the registry operator) to distribute the CAs required by the registry, so the MCO would only need a new path for distributing user provided CAs via additionalTrustedCA in the images.config.openshift.io/cluster (to keep backwards compatibility). The downside of this approach is that could potentially disrupt other openshift components (such as builds) that also rely on the integrated registry CAs.

make sense. thanks

sinnykumari

Few questions:

Does node-ca write/update these ca today to all nodes in the cluster?
I am wondering if we really need a new MCO controller for this. Even today, MCO reads and updates nodes with cluster level object(like proxy, infra) as well as configmap (like kube-cloud-config) in openshift-config-managed namespace. We could look for image cluster object and image-registry-ca cm ?
Who will be doing and maintaining MCO related changes?

sinnykumari · 2023-01-30T15:14:33Z

enhancements/machine-config/certificate-authorities-for-image-registries.md

+certificate authorities for image registries, so that openshift-apiserver can
+access registries.
+
+For them, the controller should create a merged config map in


Is "the controller" here referring to MCO controller or somethingelse?

yuqi-zhang · 2023-02-08T01:39:59Z

I think the MCO can also manage this like it currently manages kubelet certs, but long term, we should have a separate path for cert management, which we are looking into (https://issues.redhat.com/browse/MCO-467).

+1 to the questions Sinny posted above, but I am also wondering:

how often are these CAs and certificates rotated, and by what mechanism?
it doesn't seem we have a consensus on yet if a reboot or a crio reload is required for this to take effect. Has this been otherwise tested?
similar to Sinny's question, are these additional CAs always present, and is there always a canonical location to read these from, or is it up to the MCO to piece it together based on in-cluster objects?

flavianmissi · 2023-02-09T15:23:04Z

Thanks for reviewing @sinnykumari, tried to answer your questions in line:

Does node-ca write/update these ca today to all nodes in the cluster?

Yes, to /etc/docker/certs.d.

I am wondering if we really need a new MCO controller for this. Even today, MCO reads and updates nodes with cluster level object(like proxy, infra) as well as configmap (like kube-cloud-config) in openshift-config-managed namespace. We could look for image cluster object and image-registry-ca cm ?

I think you and your team are best equipped to make that decision. We should probably rephrase this EP to request a "mechanism" rather than a "controller" specifically. cc @dmage could you reword? This is the first mention of "controller" that I found. There are others.

Who will be doing and maintaining MCO related changes?

We would like the MCO team to do (and maintain) these changes for us, as we are very understaffed in the image registry team 😄

flavianmissi · 2023-02-09T15:39:18Z

@yuqi-zhang thanks for commenting, answers below.

how often are these CAs and certificates rotated, and by what mechanism?

The user provided CAs are well, rotated by the user, so we can't answer that part.
The internal registry CAs AFAIK come from the service CA operator. It seems they're updated once every other year.

it doesn't seem we have a consensus on yet if a reboot or a crio reload is required for this to take effect. Has this been otherwise tested?

I have not tested this. The node-ca daemon set does not reboot anything, and it works.

similar to Sinny's question, are these additional CAs always present, and is there always a canonical location to read these from, or is it up to the MCO to piece it together based on in-cluster objects?

Yes, it is up to the MCO to piece together the internal registry CAs with the user provided CAs.
The internal registry CAs will only be present when the registry operator is installed (right now it's always installed). The location can be changed, the suggested locations here are just suggestions, if you think we can do better then let's discuss it (:
The user provided CAs are in the openshift-config namespace, the name itself comes from the images.config.openshift.io/cluster object, under the .spec.additionalTrustedCA.

yuqi-zhang · 2023-02-09T23:39:48Z

Thanks for the answers! I think then off the top of my head, the MCO will attempt to do something like this:

the MCO reads AdditionalTrustedCA from images.config.openshift.io/cluster, if any exists (we actually do read images.config.openshift.io/cluster for image pull configuration)
the MCO reads the internal registry CAs (place tbd)
the MCO renders them (if they exist) into some field (controllerconfig?)
the MCC takes that and populates a templated file at /etc/docker/certs.d into the rendered config
the MCD lays that down as a disruptionless operation

With the note that long term (maybe not that long) we want certs to be managed differently. Does that mostly make sense?

When release does this need to happen in?

flavianmissi · 2023-02-10T12:11:14Z

Thanks for coming back to us so quickly @yuqi-zhang!

Points 1 and 2 from your list make sense to me, but I can't claim to understand 3-5, I'll leave the internals to you and your team 😄 The result (CAs configured on /etc/docker/certs.d) is what matters to us, and that looks good from where I stand.

With the note that long term (maybe not that long) we want certs to be managed differently. Does that mostly make sense?

The makes complete sense! Is the long[er] term approach something that we want defined in this EP as well? Or is it worth to just note that we need a better solution, and leave the step of defining such solution to its own EP?

When release does this need to happen in?

Our PM would like this to land on 4.14. WDYT?

yuqi-zhang · 2023-02-10T19:28:37Z

The makes complete sense! Is the long[er] term approach something that we want defined in this EP as well? Or is it worth to just note that we need a better solution, and leave the step of defining such solution to its own EP?

I think we don't have to include it. It can be a separate process, no need to increase scope of this 😄

Our PM would like this to land on 4.14. WDYT?

I will bring this to the team as part of planning to see how the timelines would align.

sinnykumari · 2023-02-15T13:43:54Z

Thanks Flavian for answering all the questions. As Jerry said, we will need to discuss within team and see feasible this is for the MCO team.

flavianmissi · 2023-02-27T12:10:45Z

Hi @sinnykumari and @yuqi-zhang, that is great news - thank you! The image registry team will gladly help with reviews.

I'll sync with @dmage this week so that we update this EP with a link to that ticket, as well as the clarifications we discussed here earlier.

flavianmissi · 2023-03-13T11:29:04Z

@sdodson could you help us reviewing and eventually getting this approved?
Thanks!

sdodson · 2023-03-14T20:06:57Z

@sdodson could you help us reviewing and eventually getting this approved?

I should have time to look at this tomorrow, sorry for the delay.

openshift-bot · 2023-04-12T01:15:34Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

flavianmissi · 2023-04-12T07:09:15Z

/remove-lifecycle stale

enhancements/machine-config/certificate-authorities-for-image-registries.md

…registries.md

sdodson · 2023-04-13T18:23:40Z

/approve
Light on implementation details but it's also clear that the MCO team are asking all the right questions and they've agreed to inherit this cluster responsibility.

Key for me will be ensuring as, Trevor and Colin point out, that these files are able to be live applied.

@sinnykumari You're up next for final review. If the MCO team has any implementation details they'd like to get recorded lets get those added as suggested updates.

/cc @mrunalp
Mrunal wanted to take a look too

openshift-ci · 2023-04-13T18:24:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sdodson]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sinnykumari · 2023-04-14T11:42:03Z

enhancements/machine-config/certificate-authorities-for-image-registries.md

+creation-date: 2022-12-01
+last-updated: 2022-12-01
+tracking-link:
+  - https://issues.redhat.com/browse/MCO-499


MCO-499 track mainly work from MCO side. I think we should also have here jira card linked here that covers image registries side of work.

enhancements/machine-config/certificate-authorities-for-image-registries.md

sinnykumari · 2023-04-14T12:53:26Z

LGTM
@sdodson MCO team haven't fleshed out yet the overall details of how it will look like. We have some planned spikes for next sprint which will provide more details and follow-up stories into linked jira card MCO-499

sdodson · 2023-04-19T18:06:23Z

/lgtm

openshift-ci-robot · 2023-04-19T19:25:48Z

/retest-required

Remaining retests: 0 against base HEAD 6b3209f and 2 for PR HEAD 77e38ad in total

…registries.md Co-authored-by: Flavian Missi <flaviamissi@gmail.com>

openshift-ci · 2023-04-20T09:56:20Z

@dmage: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

flavianmissi · 2023-04-21T08:21:28Z

/lgtm

openshift-ci bot requested review from mandre and sudhaponnaganti December 1, 2022 17:11

dmage force-pushed the node-ca-replacement branch 2 times, most recently from 682d400 to 498fbcb Compare December 1, 2022 17:22

openshift-ci bot assigned mrunalp and sinnykumari Dec 1, 2022

openshift-ci bot requested a review from flavianmissi December 1, 2022 17:26

wking reviewed Dec 1, 2022

View reviewed changes

Add Additional trusted certificate authorities for image registries

da1c55a

dmage force-pushed the node-ca-replacement branch from 498fbcb to da1c55a Compare December 1, 2022 23:48

cgwalters approved these changes Dec 8, 2022

View reviewed changes

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2023

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2023

sinnykumari reviewed Jan 30, 2023

View reviewed changes

add reviewer, link MCO epic, s/controller/mechanism

c99f956

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2023

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2023

sdodson reviewed Apr 13, 2023

View reviewed changes

enhancements/machine-config/certificate-authorities-for-image-registries.md Outdated Show resolved Hide resolved

Update enhancements/machine-config/certificate-authorities-for-image-…

77e38ad

…registries.md

openshift-ci bot requested a review from mrunalp April 13, 2023 18:24

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2023

sinnykumari reviewed Apr 14, 2023

View reviewed changes

flavianmissi reviewed Apr 14, 2023

View reviewed changes

enhancements/machine-config/certificate-authorities-for-image-registries.md Show resolved Hide resolved

openshift-ci bot assigned sdodson Apr 19, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 19, 2023

Fix linter error

30a4e65

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 20, 2023

dmage and others added 2 commits April 20, 2023 11:47

Update enhancements/machine-config/certificate-authorities-for-image-…

eb3dbb6

…registries.md Co-authored-by: Flavian Missi <flaviamissi@gmail.com>

Add api-approvers

d3e0586

openshift-ci bot assigned flavianmissi Apr 21, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 21, 2023

openshift-ci bot merged commit 40a1bc1 into openshift:master Apr 21, 2023
2 checks passed


		### User Stories

		* As a <role>, I want to <take some action> so that I can <accomplish a

Additional trusted certificate authorities for image registries #1295

Additional trusted certificate authorities for image registries #1295

Conversation

dmage commented Dec 1, 2022

dmage commented Dec 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-bot commented Jan 19, 2023

flavianmissi commented Jan 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sinnykumari left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuqi-zhang commented Feb 8, 2023

flavianmissi commented Feb 9, 2023

flavianmissi commented Feb 9, 2023

yuqi-zhang commented Feb 9, 2023

flavianmissi commented Feb 10, 2023

yuqi-zhang commented Feb 10, 2023

sinnykumari commented Feb 15, 2023

flavianmissi commented Feb 27, 2023

flavianmissi commented Mar 13, 2023

sdodson commented Mar 14, 2023

openshift-bot commented Apr 12, 2023

flavianmissi commented Apr 12, 2023

sdodson commented Apr 13, 2023

openshift-ci bot commented Apr 13, 2023

Choose a reason for hiding this comment

sinnykumari commented Apr 14, 2023 • edited by openshift-ci bot

sdodson commented Apr 19, 2023

openshift-ci-robot commented Apr 19, 2023

openshift-ci bot commented Apr 20, 2023

flavianmissi commented Apr 21, 2023

sinnykumari commented Apr 14, 2023 •

edited by openshift-ci bot