Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Change YAML style of scheduler-kubeconfig from human to compact JSON #523

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

omertuc
Copy link

@omertuc omertuc commented Dec 22, 2023

(This is currently a WIP PR suggestion)

Background

The scheduler-kubeconfig configmap managed by this operator contains a
YAML formatted kubeconfig string. As part of a particular novel form
of OCP deployment (image based upgrade/installation), we have to
programmatically modify the server: entry in that kubeconfig. This is
easily done by parsing the YAML, editing the server entry and
serializing the YAML back into the configmap.

Issue

Every YAML serializer has its own opinions of what YAML should look
like. When we perform the procedure described above, we get a kubeconfig
that's slightly different than what this operator outputs (different
indentation, different quoting decisions). As a result, it eventually
causes the kube-scheduler-operator to issue a new revision of the
scheduler. Rolling out the new revision takes a few minutes, which we
would prefer to avoid.

Solution

In order to solve this we suggest the change in this commit which would
be to format this kubeconfig as unopinionated compact JSON as opposed to
human readable YAML

Alternative solutions

We could have a textual "search-and-replace" for the server: entry in
our code, but we would prefer to do it right by properly parsing,
editing then serializing it.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 22, 2023
Copy link
Contributor

openshift-ci bot commented Dec 22, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: omertuc
Once this PR has been reviewed and has the lgtm label, please assign soltysh for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@omertuc omertuc force-pushed the machine branch 2 times, most recently from 146c101 to dd6cc3c Compare December 22, 2023 13:02
…JSON

(This is currently a WIP PR suggestion)

# Background

The `scheduler-kubeconfig` configmap managed by this operator contains a
YAML formatted `kubeconfig` string. As part of a particular novel form
of OCP deployment (image based upgrade/installation), we have to
programmatically modify the `server:` entry in that kubeconfig. This is
easily done by parsing the YAML, editing the server entry and
serializing the YAML back into the configmap.

# Issue

Every YAML serializer has its own opinions of what YAML should look
like. When we perform the procedure described above, we get a kubeconfig
that's slightly different then what this operator outputs (different
indentation, different quoting decisions). As a result, it eventually
causes the kube-scheduler-operator to issue a new revision of the
scheduler. Rolling out the new revision takes a few minutes, which we
would prefer to avoid.

# Solution

In order to solve this we suggest the change in this commit which would
be to format this kubeconfig as unopinionated compact JSON as opposed to
human readable YAML

# Alternative solutions

We could have a textual "search-and-replace" for the `server:` entry in
our code, but we would prefer to do it right by properly parsing,
editing then serializing it.
Copy link
Contributor

openshift-ci bot commented Dec 22, 2023

@omertuc: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security d06b23e link false /test security

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ingvagabund
Copy link
Member

The scheduler-kubeconfig configmap managed by this operator contains a
YAML formatted kubeconfig string. As part of a particular novel form
of OCP deployment (image based upgrade/installation),

Would you please elaborate more on this novel form?

we have to programmatically modify the server: entry in that kubeconfig.

Only KS-operator should be allowed to modify the kubeconfig. Otherwise, any subsequent revision roll out will undo your changes. Is there a card tracking your work?

@omertuc
Copy link
Author

omertuc commented Jan 2, 2024

Would you please elaborate more on this novel form?

The tl;dr is we install SNO OCP, take a snapshot of the filesystem, and reuse that snapshot to quickly spin up new clusters, after a bit of reconfiguration

For more information we're at #wg-image-based-upgrades on Red Hat Slack. Internal HLD doc

Only KS-operator should be allowed to modify the kubeconfig

True, for now we're going around changing stuff we "shouldn't", in the long term we're thinking of ways to make this more formal and have the components themselves support the changes we need

Otherwise, any subsequent revision roll out will undo your changes.

That's fine, since we're making the changes we need across the entire cluster's resources, so everything will be consistent and subsequent revisions should be OK

Is there a card tracking your work?

Not for this particular thing, but in general our work is tracked under this Jira label

@omertuc
Copy link
Author

omertuc commented Jan 2, 2024

This change shouldn't make a difference in terms of this operator's functionality, but it will make our life's easier and our solution simpler, so if we could merge it, it would be nice for us. But it's not a must.

And in general I think for anything that doesn't require human interaction YAMLs should be avoided in favor of more deterministic formats, so I guess you could consider this a positive change regardless of why we need it in particular, but that's just my subjective opinion of course

@ingvagabund
Copy link
Member

ingvagabund commented Jan 10, 2024

When we perform the procedure described above, we get a kubeconfig that's slightly different than what this operator outputs (different indentation, different quoting decisions). As a result, it eventually causes the kube-scheduler-operator to issue a new revision of the scheduler.

I wonder about the "it eventually causes the kube-scheduler-operator to issue a new revision of the scheduler." part. Is there a proof the different formatting is the cause to spin a new revision? Can you provide the operator logs to see what is causing the new revision? You should see something similar to

new revision X triggered by "required secret/localhost-recovery-client-token has changed,required configmap/config has changed,required configmap/sa-token-signing-certs has changed"

in the logs. Normally, content of scheduler-kubeconfig CM (referential content) is copy pasted to scheduler-kubeconfig-X revision CM during each revision rolling. Are you updating both scheduler-kubeconfig and scheduler-kubeconfig-X CMs? https://github.com/openshift/library-go/blob/master/pkg/operator/revisioncontroller/revision_controller.go#L157 is the line where contents of both CMs is compared. So if you edit both CMs with the same formatted yaml (no matter how differently formatted it is) the deep equal comparison will produce a match.

@omertuc
Copy link
Author

omertuc commented Jan 10, 2024

Is there a proof the different formatting is the cause to spin a new revision?

I haven't seen it directly, @mresvanis did

Are you updating both scheduler-kubeconfig and scheduler-kubeconfig-X CMs?

Yes

So if you edit both CMs with the same formatted yaml (no matter how differently formatted it is) the deep equal comparison will produce a match.

True, but I guess some controller (this?) notices the different format and re-applies with the styling from the bindata directory, and that creates a diff

@ingvagabund
Copy link
Member

ingvagabund commented Jan 10, 2024

True, but I guess some controller (this?) notices the different format and re-applies with the styling from the bindata directory, and that creates a diff

That's the other part. In the KSO case (

for pattern, value := range map[string]string{
"$LB_INT_URL": apiServerInternalURL,
} {
cmString = strings.ReplaceAll(cmString, pattern, value)
}
) we just read assets/kube-scheduler/kubeconfig-cm.yaml as it is and perform string replacement. Given $LB_INT_URL is the only string that gets replaced and KSO is the only component allowed to create/update openshift-kube-scheduler CM I strongly recommend to follow the same steps. We might extend the code base to create a new package that you can import in your solution and directly invoke the same code that renders the configmap. Thus, to avoid any issues with formatting and the config representation. I strongly discourage anyone from modifying the CM through a different solution. This will make debugging more complicated as we don't assume other component modifying KSO owned files behind our back.

@omertuc
Copy link
Author

omertuc commented Jan 10, 2024

True, but I guess some controller (this?) notices the different format and re-applies with the styling from the bindata directory, and that creates a diff

That's the other part. In the KSO case (

for pattern, value := range map[string]string{
"$LB_INT_URL": apiServerInternalURL,
} {
cmString = strings.ReplaceAll(cmString, pattern, value)
}

) we just read assets/kube-scheduler/kubeconfig-cm.yaml as it is and perform string replacement. Given $LB_INT_URL is the only string that gets replaced and KSO is the only component allowed to create/update openshift-kube-scheduler CM I strongly recommend to follow the same steps. We might extend the code base to create a new package that you can import in your solution and directly invoke the same code that renders the configmap. Thus, to avoid any issues with formatting and the config representation.

Given $LB_INT_URL is the only string that gets replaced and KSO is the only component allowed to create/update openshift-kube-scheduler CM I strongly recommend to follow the same steps

But from my POV all I have is an already "rendered" template, the kubeconfig I operate on doesn't have an easily recognizable $LB_INT_URL string that I can simply replace, so me properly parsing it as a YAML to edit the server URL makes sense. I could instead do some form of regex search and replace to find the existing URL line, but of course I would like to avoid that.

We might extend the code base to create a new package that you can import in your solution and directly invoke the same code that renders the configmap. Thus, to avoid any issues with formatting and the config representation.

The tool we use for editing all of the cluster resources is not written in Go so a Go package would not be very helpful in our case.

I strongly discourage anyone from modifying the CM through a different solution. This will make debugging more complicated as we don't assume other component modifying KSO owned files behind our back.

I agree, this is not great, but we have no other option for now. Also note that I'm not really modifying anything that wouldn't otherwise get reconciled to the exact same state. I just want to avoid that reconciliation because it's expensive so I have to "modify in advance", while the cluster is offline.

Ideally as a long term solution we should find a way to stop scattering the cluster-name and cluster-base-domain and all the URLs that are derived from them all over the place in OCP, so they can be easily changed in few places (ideally one) without causing a flood of reconciliations

omertuc added a commit to omertuc/recert-1 that referenced this pull request Jan 30, 2024
Avoid parsing YAML for now until
openshift/cluster-kube-scheduler-operator#523 is
fixed, use dumb regex instead
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2024
@omertuc
Copy link
Author

omertuc commented Apr 12, 2024

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2024
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants