WIP: Change YAML style of scheduler-kubeconfig from human to compact JSON #523

omertuc · 2023-12-22T13:00:32Z

(This is currently a WIP PR suggestion)

Background

The scheduler-kubeconfig configmap managed by this operator contains a
YAML formatted kubeconfig string. As part of a particular novel form
of OCP deployment (image based upgrade/installation), we have to
programmatically modify the server: entry in that kubeconfig. This is
easily done by parsing the YAML, editing the server entry and
serializing the YAML back into the configmap.

Issue

Every YAML serializer has its own opinions of what YAML should look
like. When we perform the procedure described above, we get a kubeconfig
that's slightly different than what this operator outputs (different
indentation, different quoting decisions). As a result, it eventually
causes the kube-scheduler-operator to issue a new revision of the
scheduler. Rolling out the new revision takes a few minutes, which we
would prefer to avoid.

Solution

In order to solve this we suggest the change in this commit which would
be to format this kubeconfig as unopinionated compact JSON as opposed to
human readable YAML

Alternative solutions

We could have a textual "search-and-replace" for the server: entry in
our code, but we would prefer to do it right by properly parsing,
editing then serializing it.

openshift-ci · 2023-12-22T13:01:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: omertuc
Once this PR has been reviewed and has the lgtm label, please assign soltysh for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…JSON (This is currently a WIP PR suggestion) # Background The `scheduler-kubeconfig` configmap managed by this operator contains a YAML formatted `kubeconfig` string. As part of a particular novel form of OCP deployment (image based upgrade/installation), we have to programmatically modify the `server:` entry in that kubeconfig. This is easily done by parsing the YAML, editing the server entry and serializing the YAML back into the configmap. # Issue Every YAML serializer has its own opinions of what YAML should look like. When we perform the procedure described above, we get a kubeconfig that's slightly different then what this operator outputs (different indentation, different quoting decisions). As a result, it eventually causes the kube-scheduler-operator to issue a new revision of the scheduler. Rolling out the new revision takes a few minutes, which we would prefer to avoid. # Solution In order to solve this we suggest the change in this commit which would be to format this kubeconfig as unopinionated compact JSON as opposed to human readable YAML # Alternative solutions We could have a textual "search-and-replace" for the `server:` entry in our code, but we would prefer to do it right by properly parsing, editing then serializing it.

openshift-ci · 2023-12-22T18:20:55Z

@omertuc: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/security	`d06b23e`	link	false	`/test security`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

ingvagabund · 2024-01-02T09:56:36Z

The scheduler-kubeconfig configmap managed by this operator contains a
YAML formatted kubeconfig string. As part of a particular novel form
of OCP deployment (image based upgrade/installation),

Would you please elaborate more on this novel form?

we have to programmatically modify the server: entry in that kubeconfig.

Only KS-operator should be allowed to modify the kubeconfig. Otherwise, any subsequent revision roll out will undo your changes. Is there a card tracking your work?

omertuc · 2024-01-02T17:42:12Z

Would you please elaborate more on this novel form?

The tl;dr is we install SNO OCP, take a snapshot of the filesystem, and reuse that snapshot to quickly spin up new clusters, after a bit of reconfiguration

For more information we're at #wg-image-based-upgrades on Red Hat Slack. Internal HLD doc

Only KS-operator should be allowed to modify the kubeconfig

True, for now we're going around changing stuff we "shouldn't", in the long term we're thinking of ways to make this more formal and have the components themselves support the changes we need

Otherwise, any subsequent revision roll out will undo your changes.

That's fine, since we're making the changes we need across the entire cluster's resources, so everything will be consistent and subsequent revisions should be OK

Is there a card tracking your work?

Not for this particular thing, but in general our work is tracked under this Jira label

omertuc · 2024-01-02T17:43:50Z

This change shouldn't make a difference in terms of this operator's functionality, but it will make our life's easier and our solution simpler, so if we could merge it, it would be nice for us. But it's not a must.

And in general I think for anything that doesn't require human interaction YAMLs should be avoided in favor of more deterministic formats, so I guess you could consider this a positive change regardless of why we need it in particular, but that's just my subjective opinion of course

ingvagabund · 2024-01-10T10:40:27Z

When we perform the procedure described above, we get a kubeconfig that's slightly different than what this operator outputs (different indentation, different quoting decisions). As a result, it eventually causes the kube-scheduler-operator to issue a new revision of the scheduler.

I wonder about the "it eventually causes the kube-scheduler-operator to issue a new revision of the scheduler." part. Is there a proof the different formatting is the cause to spin a new revision? Can you provide the operator logs to see what is causing the new revision? You should see something similar to

new revision X triggered by "required secret/localhost-recovery-client-token has changed,required configmap/config has changed,required configmap/sa-token-signing-certs has changed"

in the logs. Normally, content of scheduler-kubeconfig CM (referential content) is copy pasted to scheduler-kubeconfig-X revision CM during each revision rolling. Are you updating both scheduler-kubeconfig and scheduler-kubeconfig-X CMs? https://github.com/openshift/library-go/blob/master/pkg/operator/revisioncontroller/revision_controller.go#L157 is the line where contents of both CMs is compared. So if you edit both CMs with the same formatted yaml (no matter how differently formatted it is) the deep equal comparison will produce a match.

omertuc · 2024-01-10T10:57:16Z

Is there a proof the different formatting is the cause to spin a new revision?

I haven't seen it directly, @mresvanis did

Are you updating both scheduler-kubeconfig and scheduler-kubeconfig-X CMs?

Yes

So if you edit both CMs with the same formatted yaml (no matter how differently formatted it is) the deep equal comparison will produce a match.

True, but I guess some controller (this?) notices the different format and re-applies with the styling from the bindata directory, and that creates a diff

ingvagabund · 2024-01-10T12:59:37Z

True, but I guess some controller (this?) notices the different format and re-applies with the styling from the bindata directory, and that creates a diff

That's the other part. In the KSO case (

cluster-kube-scheduler-operator/pkg/operator/targetconfigcontroller/targetconfigcontroller.go

Lines 413 to 417 in 5892968

    
           for pattern, value := range map[string]string{ 
        
           	"$LB_INT_URL": apiServerInternalURL, 
        
           } { 
        
           	cmString = strings.ReplaceAll(cmString, pattern, value) 
        
           }

) we just read assets/kube-scheduler/kubeconfig-cm.yaml as it is and perform string replacement. Given $LB_INT_URL is the only string that gets replaced and KSO is the only component allowed to create/update openshift-kube-scheduler CM I strongly recommend to follow the same steps. We might extend the code base to create a new package that you can import in your solution and directly invoke the same code that renders the configmap. Thus, to avoid any issues with formatting and the config representation. I strongly discourage anyone from modifying the CM through a different solution. This will make debugging more complicated as we don't assume other component modifying KSO owned files behind our back.

omertuc · 2024-01-10T13:08:42Z

True, but I guess some controller (this?) notices the different format and re-applies with the styling from the bindata directory, and that creates a diff

That's the other part. In the KSO case (

cluster-kube-scheduler-operator/pkg/operator/targetconfigcontroller/targetconfigcontroller.go

Lines 413 to 417 in 5892968

for pattern, value := range map[string]string{

"$LB_INT_URL": apiServerInternalURL,

} {

cmString = strings.ReplaceAll(cmString, pattern, value)

}

) we just read assets/kube-scheduler/kubeconfig-cm.yaml as it is and perform string replacement. Given $LB_INT_URL is the only string that gets replaced and KSO is the only component allowed to create/update openshift-kube-scheduler CM I strongly recommend to follow the same steps. We might extend the code base to create a new package that you can import in your solution and directly invoke the same code that renders the configmap. Thus, to avoid any issues with formatting and the config representation.

Given $LB_INT_URL is the only string that gets replaced and KSO is the only component allowed to create/update openshift-kube-scheduler CM I strongly recommend to follow the same steps

But from my POV all I have is an already "rendered" template, the kubeconfig I operate on doesn't have an easily recognizable $LB_INT_URL string that I can simply replace, so me properly parsing it as a YAML to edit the server URL makes sense. I could instead do some form of regex search and replace to find the existing URL line, but of course I would like to avoid that.

We might extend the code base to create a new package that you can import in your solution and directly invoke the same code that renders the configmap. Thus, to avoid any issues with formatting and the config representation.

The tool we use for editing all of the cluster resources is not written in Go so a Go package would not be very helpful in our case.

I strongly discourage anyone from modifying the CM through a different solution. This will make debugging more complicated as we don't assume other component modifying KSO owned files behind our back.

I agree, this is not great, but we have no other option for now. Also note that I'm not really modifying anything that wouldn't otherwise get reconciled to the exact same state. I just want to avoid that reconciliation because it's expensive so I have to "modify in advance", while the cluster is offline.

Ideally as a long term solution we should find a way to stop scattering the cluster-name and cluster-base-domain and all the URLs that are derived from them all over the place in OCP, so they can be easily changed in few places (ideally one) without causing a flood of reconciliations

Avoid parsing YAML for now until openshift/cluster-kube-scheduler-operator#523 is fixed, use dumb regex instead

openshift-bot · 2024-04-12T09:00:58Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

omertuc · 2024-04-12T09:02:23Z

/remove-lifecycle stale

openshift-bot · 2024-07-12T01:00:22Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 22, 2023

openshift-ci bot requested review from ingvagabund and mfojtik December 22, 2023 13:01

omertuc force-pushed the machine branch 2 times, most recently from 146c101 to dd6cc3c Compare December 22, 2023 13:02

omertuc force-pushed the machine branch from dd6cc3c to d06b23e Compare December 22, 2023 15:45

omertuc mentioned this pull request Jan 30, 2024

Modify scheduler & KCM kubeconfig YAML no parsing rh-ecosystem-edge/recert#83

Merged

omertuc added a commit to omertuc/recert-1 that referenced this pull request Jan 30, 2024

Modify scheduler & KCM kubeconfig YAML no parsing

4a862a9

Avoid parsing YAML for now until openshift/cluster-kube-scheduler-operator#523 is fixed, use dumb regex instead

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2024

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2024

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Change YAML style of scheduler-kubeconfig from human to compact JSON #523

WIP: Change YAML style of scheduler-kubeconfig from human to compact JSON #523

omertuc commented Dec 22, 2023 •

edited

Loading

openshift-ci bot commented Dec 22, 2023

openshift-ci bot commented Dec 22, 2023

ingvagabund commented Jan 2, 2024

omertuc commented Jan 2, 2024 •

edited

Loading

omertuc commented Jan 2, 2024 •

edited

Loading

ingvagabund commented Jan 10, 2024 •

edited

Loading

omertuc commented Jan 10, 2024 •

edited

Loading

ingvagabund commented Jan 10, 2024 •

edited

Loading

omertuc commented Jan 10, 2024 •

edited

Loading

openshift-bot commented Apr 12, 2024

omertuc commented Apr 12, 2024

openshift-bot commented Jul 12, 2024

WIP: Change YAML style of scheduler-kubeconfig from human to compact JSON #523

Are you sure you want to change the base?

WIP: Change YAML style of scheduler-kubeconfig from human to compact JSON #523

Conversation

omertuc commented Dec 22, 2023 • edited Loading

Background

Issue

Solution

Alternative solutions

openshift-ci bot commented Dec 22, 2023

openshift-ci bot commented Dec 22, 2023

ingvagabund commented Jan 2, 2024

omertuc commented Jan 2, 2024 • edited Loading

omertuc commented Jan 2, 2024 • edited Loading

ingvagabund commented Jan 10, 2024 • edited Loading

omertuc commented Jan 10, 2024 • edited Loading

ingvagabund commented Jan 10, 2024 • edited Loading

omertuc commented Jan 10, 2024 • edited Loading

openshift-bot commented Apr 12, 2024

omertuc commented Apr 12, 2024

openshift-bot commented Jul 12, 2024

omertuc commented Dec 22, 2023 •

edited

Loading

omertuc commented Jan 2, 2024 •

edited

Loading

omertuc commented Jan 2, 2024 •

edited

Loading

ingvagabund commented Jan 10, 2024 •

edited

Loading

omertuc commented Jan 10, 2024 •

edited

Loading

ingvagabund commented Jan 10, 2024 •

edited

Loading

omertuc commented Jan 10, 2024 •

edited

Loading