Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USHIFT-706: Make MicroShift's CSI plugin optionally deployable #1640

Closed
wants to merge 9 commits into from

Conversation

copejon
Copy link
Contributor

@copejon copejon commented Jun 10, 2024

No description provided.

Copy link
Contributor

openshift-ci bot commented Jun 10, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign miciah for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@copejon copejon marked this pull request as draft June 10, 2024 20:58
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 10, 2024
@copejon
Copy link
Contributor Author

copejon commented Jun 11, 2024

@copejon copejon changed the title [USHIFT-706] Make MicroShift's CSI plugin optionally deployable USHIFT-706 Make MicroShift's CSI plugin optionally deployable Jun 11, 2024
@copejon copejon changed the title USHIFT-706 Make MicroShift's CSI plugin optionally deployable USHIFT-706: Make MicroShift's CSI plugin optionally deployable Jun 11, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 11, 2024

@copejon: This pull request references USHIFT-706 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 11, 2024
Copy link
Contributor

@jakobmoellerdev jakobmoellerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made a first read through and gave my 2 cents where I had concerns. I realize this is not done though and just feel free to close out / ignore comments if they are irrelevant to the final design.

enhancements/microshift/optionally-deploy-csi-plugin.md Outdated Show resolved Hide resolved
into the MicroShift binary during compilation and are deployed during runtime. This means that even if a user wanted to
disable the driver and/or snapshotter, they would be deployed again when the service restarts. Additionally, the LVMS an
CSI containers required to run LVMS are always packaged with the product and thus will always consume a certain amount
of storage space devices where MicroShift is deployed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This storage space equivalent to ~60kB uncompressed, in the binary it is even less than that due to binary compression. Im not sure if this is even a relevant point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused as to where the ~60Kb value comes from. Note that the sentence contains a typo and should have said "LVMS and CSI container images", referring to the size of the images on disk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, well, I was talking about the manifests, clearly didnt read that at full. Sorry for the confusion :) We will have only 2 images in the future, one for lvms and 1 for csi snapshotter, where the base layers will be shared with the release payload. I at expecting at most 100mi gain, but that is still relevant.

storage provider and reasons for its integration into MicroShift. Configuration of the driver is exposed via a config
file at /etc/microshift/lvmd.yaml. The manifests required to deploy and configure the driver and snapshotter are baked
into the MicroShift binary during compilation and are deployed during runtime. This means that even if a user wanted to
disable the driver and/or snapshotter, they would be deployed again when the service restarts. Additionally, the LVMS an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a problem in the way that microshift handles the apply, and is unrelated (imo) to how LVMS is packaged. There could be easily an uninstall via delete operations in k8s right now with the current method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. The sentence needs to be amended. It should be communicating that this is how it's done right now in MicroShift.

enhancements/microshift/optionally-deploy-csi-plugin.md Outdated Show resolved Hide resolved
## Motivation

Not all users may persistent storage provider or volume snapshot capabilities for their MicroShift deployments and
should be enabled to choose whether to deploy the driver and / or snapshotter. This will afford resource-conscious users
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already possible today through the "enabled" option, I dont really see why this is not possible today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MicroShift doesn't have a config option to enable/disable CSI. Can you point me to what your referring to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am referring to the fact that it is quite easily possible to build a flag or something similar into the config that disables the driver. I guess Im trying to convey that we dont need such a major rework for getting that done. There may be other reasons for the extraction such as modularity of the CSI driver of course.

Copy link
Contributor Author

@copejon copejon Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a config field doesn't solve the goal of separating the LVMS project from microshift. It also poses a new set of complexities where as revoking LVMS's first-class citizen status in MicroShift would provide a much more simplified solution and reduce tech debt and testing requirements.

Also, it makes it even more difficult to add support for other CSI drivers later. The tech debt would grow at a linear rate because each driver would require it's own service manager in microshift to handle deployment and config.

Lastly, using a field implies that the container images must always be present in case the config changed from "off" to "on". We can't assume a network connection is reliable, or even exists. Therefore the container images would have to be installed when MicroShift is installed, regardless of whether the user ever intended to use dynamic storage.

enhancements/microshift/optionally-deploy-csi-plugin.md Outdated Show resolved Hide resolved
enhancements/microshift/optionally-deploy-csi-plugin.md Outdated Show resolved Hide resolved

- Provide the LVMS CSI driver and LVMS CSI snapshot controller as optional components of MicroShift
- Reuse the optional-installation pattern implemented by MicroShift for the [Multus CNI
Plugin](./multus-cni-for-microshift.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LVMS / TopoLVM has custom logic paths in the microshift binary right now. If we want to have them in a seperate rpm, we need a new binary that is able to run as root and run these logic paths in its stead, especially for configuring lvmd.yaml with defaults. This will get only more pressing once we incorporate the hot reloading implementation in the LVMS migration from TopoLVM. Whats the idea to build that logic or run that somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is something that would be well suited to a %post-install in the rpm spec. The logic is relatively simple and is mostly process calls wrapped with go code. This script should also include logic to prevent stomping on existing config file during install so that user configs aren't lost after an upgrade/downgrade.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the migration to LVMS, to allow the hot reloading mechanism of lvmd.yaml we need to actively watch and copy over /etc/microshift/lvmd.yaml to /var/lib/microshift/lvms/lvmd.yaml

Also we currently have on every restart a copying of that file, that behavior would need to change if we move to post-install.

Not generally opposed, just want to make sure we

  • dont loose the ability to enable hotreloading in lvms for microshift (allows customers to reconfigure storage without restarting)
  • dont break compatibility for ostree-style commits when upgrading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is compatible with the hotreloading behavior. The post-install script would be applying the changes to the /etc/microshift/lvmd.yaml (if not already present). So if a user were to install LVMS on a running cluster, without providing an lvmd.yaml file, the hot-reload should detect the new file and perform as expected.


#### Single-node Deployments or MicroShift

The changes proposed here only affect MicroShift.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They also affect LVMS due Microshift and LVMS sharing a version dependency to each other. We will no longer need to release LVMS ahead of Microshift technically and can version it separately. That also means the RPMs need to be versioned and released separately. Otherwise I believe there is no gain here for extracting them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gain is also to support another storage provider in the future.
We need to discuss how this change affects LVMS release cycle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the rebase changes, we assume that the RPM is still built in the MicroShift repo, so we maintain the version dependency. In general, change the build procedure and full decoupling should be treated as a separate ER, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think that the first step should remain scoped to the feature work.

- During rebasing, the image digests for LVMS and CSI Snapshotter will be parsed from the LVMS bundle (like they
are now) and the digests will be written to their respective `release-$ARCH.json` files.

- **Mainline Code Changes:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think this needs to be revisited because this logic still has to exist somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a section called RPM-Spec about this. LMKWYT

copejon and others added 2 commits July 1, 2024 14:44
Co-authored-by: Jakob Möller <contact@jakob-moeller.com>
…snapshotting

tweaks to summary to clarify storage size concerns

added uninstall workflow

amended mainline code changes to cover in-built dynamic defaulting behavior

added RPM Spec section

added up/down grade strat

added version skew strat

added alternatives
Copy link
Contributor

@jakobmoellerdev jakobmoellerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good already, im mostly nitting on details on the LVMS + Snapshotter and the RPM replacing the internals, consider a general LGTM from my side!


## Version Skew Strategy

Risk of version skew is mitigated because the LVMS and CSI components are built from the same release image as the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im a bit confused by this statement. LVMS was never built with the same release image and had to be rebased, what would be the process for that now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean to say is that the LVMS and CSI versions are tracked by the operator bundle, which is then use to correlate the microshift version to the lvms/csi versions.

## Alternatives

Use the MicroShift or LVMD config to determine deployment of LVMS and/or snapshotting. It does not satisfy the goal of
optionalizing the installation of the components. Instead, this solution would simply enable/disable the components.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you quickly explain why a MicroShift config flag would not optionalize the installation of the components (maybe Im just missing something here)

Copy link
Contributor Author

@copejon copejon Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a flag to optionalize storage has some drawbacks (off the top of my head):

  • MicroShift carries some tech-debt for rebasing, deploying, and configuring LVMS.
  • treating lvms as a first class citizen in microshift isn't necessary. it's just a cluster workload like any other CSI driver.
  • should the default be 'on' or 'off'? what should happen if the toggle is switched from 'on' to 'off'? How ever these are answered means more responsibility for microshift, not less.
  • disk consumption is another consideration. for offline deployments, should microshift still include the lvms images just in case users switch it on after starting microshift?
  • Should MicroShift add support for another driver, how should this be implemented? An enum field would be fine in the config, but it'd also mean microshift would have to first-class that driver as well. This can't scale well at all.


## Graduation Criteria

### Dev Preview -> Tech Preview
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we be able to run Dev Preview of the RPM and the old installation method through microshift core in parallel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need Dev or Tech preview. Let's release it in GA in 4.17.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change IMO is A) too big to handle in one shot and B) will have too much impact on LVMS / MicroShift release procedure in 4.17 together with the LVMS MicroShift migration, it feels like we are forcing a big bang

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushing back. From the user perspective, this is a relatively easy change to existing blueprints. Add this to new installs:

[packages]
name = "microshift-lvms"
version = "*"

Layers built without this and which are then pushed to 4.16 clusters are unaffected.

WRT the migration work, where do you see this becoming overly complex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly in making sure we are not breaking existing users together with the LVMS migration. This is more of a too big to handle for engineering and keep stable kind of thing for me. But Ill wait for some other opinions, Im not gonna be adamant about it :)

- A `microshift-lvms` rpm post-install script that determines what volume group, if any, should be specified.

- **RPM Spec**
- New RPM specs will be written for `microshift-lvms` and `microshift-lvms-snapshotting` packages. These have the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since snapshotting can be used for any CSI driver, WDYT about microshift-snapshotting?

That way anyone who is just interested in snapshotting capability can install that RPM and then install their own CSI driver + VolumeSnapshotClass.

LVMS can already smartly create or ignore VolumeSnapshotClass creation so this would not need any extra logic paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in an earlier comment. LVMS and Snapshotting are considered as separate installs in the EP

- **RPM Spec**
- New RPM specs will be written for `microshift-lvms` and `microshift-lvms-snapshotting` packages. These have the
following characteristics:
- `microshift-lvms-snapshotting` will depend on `microshift-lvms`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could drop that dependency and let LVMS do dynamic adjustment to wether VolumeSnapshotClass Support exists in the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, but it also means this work would be blocked until the LVMS change was accepted and implemented, which doesn't make sense. Instead, let this change and that one be developed in parallel at their own velocities. Whenever the LVMS change is implemented, it'd just require a one-line change in the microshift rpm spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pro: Unblocks engineering on both fronts
Contra: this will mean that the rpm in the future might need to keep that dependency. Maybe Im just wondering why we dont have a microshift-snapshotting rpm with all the snapshotting stuff?

enhancements/microshift/optionally-deploy-csi-plugin.md Outdated Show resolved Hide resolved
file at /etc/microshift/lvmd.yaml. The manifests required to deploy and configure the driver and snapshotter are baked
into the MicroShift binary during compilation and are deployed during runtime. As it is now, even if a user wanted to
disable the driver and/or snapshotter, MicroShift would deploy them again when the service restarts. Additionally, the
LVMS and CSI images are always packaged together with MicroShift and thus always consume a certain amount of storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an estimate of the consumed resources so that it guides us further in the proposal

enhancements/microshift/optionally-deploy-csi-plugin.md Outdated Show resolved Hide resolved
2. User specifies an ostree blueprint which includes the following sections:
1. Packages: microshift, microshift-greenboot, microshift-networking, microshift-selinux, microshift-lvms,
microshift-lvms-snapshotting
2. File (Optional): lvmd.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to install a default lvmd.yaml file? If so, specifying it in the blueprint would override the default.
Let's clarify this please

enhancements/microshift/optionally-deploy-csi-plugin.md Outdated Show resolved Hide resolved
4. User checks that there are no LVMS PVs, PVCs, or Snapshots in the cluster. If there are, it is the user's
responsibility to prevent unintentional data loss or LVM volume orphaning before proceeding.
5. User uninstalls microshift-lvms and microshift-lvms-snapshotting rpms.
6. User deletes the relevant cluster API resources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we swap steps 4 and 5? I mean, when is the data orphaned?
Is it after RPM uninstall, or after the actions described in step 6?


#### Single-node Deployments or MicroShift

The changes proposed here only affect MicroShift.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gain is also to support another storage provider in the future.
We need to discuss how this change affects LVMS release cycle.


#### Single-node Deployments or MicroShift

The changes proposed here only affect MicroShift.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the rebase changes, we assume that the RPM is still built in the MicroShift repo, so we maintain the version dependency. In general, change the build procedure and full decoupling should be treated as a separate ER, I think.

## Test Plan

There are already tests for RPM and Ostree install processes. This proposal only slightly alters the installation
process and thus will require only minimal changes to existing tests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to ensure that MicroShift can function normally without storage optional components.
It's true that 100% of tests currently assume the presence of CSI driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No MicroShift components require dynamically provisioned persistent storage, thus they have no dependency on CSI (LVMS or otherwise).


## Graduation Criteria

### Dev Preview -> Tech Preview
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need Dev or Tech preview. Let's release it in GA in 4.17.

copejon and others added 2 commits July 8, 2024 16:44
add section on default logic in rpm

remove dev -> TP and TP -> GA, as it doesn't fit target version
Co-authored-by: Gregory Giguashvili <85498834+ggiguash@users.noreply.github.com>
@copejon
Copy link
Contributor Author

copejon commented Jul 11, 2024

It's been decided that an RPM-based approach to optional installation of LVMS and CSI components is neither feasible given the total changes required nor compatible with incoming changes (openshift/microshift#3535). This PR is closed.

@copejon copejon closed this Jul 11, 2024
@copejon copejon deleted the ushift-706 branch July 11, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants