-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USHIFT-706: Make MicroShift's CSI plugin optionally deployable #1640
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@copejon: This pull request references USHIFT-706 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just made a first read through and gave my 2 cents where I had concerns. I realize this is not done though and just feel free to close out / ignore comments if they are irrelevant to the final design.
into the MicroShift binary during compilation and are deployed during runtime. This means that even if a user wanted to | ||
disable the driver and/or snapshotter, they would be deployed again when the service restarts. Additionally, the LVMS an | ||
CSI containers required to run LVMS are always packaged with the product and thus will always consume a certain amount | ||
of storage space devices where MicroShift is deployed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This storage space equivalent to ~60kB uncompressed, in the binary it is even less than that due to binary compression. Im not sure if this is even a relevant point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused as to where the ~60Kb value comes from. Note that the sentence contains a typo and should have said "LVMS and CSI container images", referring to the size of the images on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, well, I was talking about the manifests, clearly didnt read that at full. Sorry for the confusion :) We will have only 2 images in the future, one for lvms and 1 for csi snapshotter, where the base layers will be shared with the release payload. I at expecting at most 100mi gain, but that is still relevant.
storage provider and reasons for its integration into MicroShift. Configuration of the driver is exposed via a config | ||
file at /etc/microshift/lvmd.yaml. The manifests required to deploy and configure the driver and snapshotter are baked | ||
into the MicroShift binary during compilation and are deployed during runtime. This means that even if a user wanted to | ||
disable the driver and/or snapshotter, they would be deployed again when the service restarts. Additionally, the LVMS an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a problem in the way that microshift handles the apply, and is unrelated (imo) to how LVMS is packaged. There could be easily an uninstall via delete operations in k8s right now with the current method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. The sentence needs to be amended. It should be communicating that this is how it's done right now in MicroShift.
## Motivation | ||
|
||
Not all users may persistent storage provider or volume snapshot capabilities for their MicroShift deployments and | ||
should be enabled to choose whether to deploy the driver and / or snapshotter. This will afford resource-conscious users |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already possible today through the "enabled" option, I dont really see why this is not possible today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MicroShift doesn't have a config option to enable/disable CSI. Can you point me to what your referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am referring to the fact that it is quite easily possible to build a flag or something similar into the config that disables the driver. I guess Im trying to convey that we dont need such a major rework for getting that done. There may be other reasons for the extraction such as modularity of the CSI driver of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a config field doesn't solve the goal of separating the LVMS project from microshift. It also poses a new set of complexities where as revoking LVMS's first-class citizen status in MicroShift would provide a much more simplified solution and reduce tech debt and testing requirements.
Also, it makes it even more difficult to add support for other CSI drivers later. The tech debt would grow at a linear rate because each driver would require it's own service manager in microshift to handle deployment and config.
Lastly, using a field implies that the container images must always be present in case the config changed from "off" to "on". We can't assume a network connection is reliable, or even exists. Therefore the container images would have to be installed when MicroShift is installed, regardless of whether the user ever intended to use dynamic storage.
|
||
- Provide the LVMS CSI driver and LVMS CSI snapshot controller as optional components of MicroShift | ||
- Reuse the optional-installation pattern implemented by MicroShift for the [Multus CNI | ||
Plugin](./multus-cni-for-microshift.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LVMS / TopoLVM has custom logic paths in the microshift binary right now. If we want to have them in a seperate rpm, we need a new binary that is able to run as root and run these logic paths in its stead, especially for configuring lvmd.yaml with defaults. This will get only more pressing once we incorporate the hot reloading implementation in the LVMS migration from TopoLVM. Whats the idea to build that logic or run that somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is something that would be well suited to a %post-install in the rpm spec. The logic is relatively simple and is mostly process calls wrapped with go code. This script should also include logic to prevent stomping on existing config file during install so that user configs aren't lost after an upgrade/downgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the migration to LVMS, to allow the hot reloading mechanism of lvmd.yaml we need to actively watch and copy over /etc/microshift/lvmd.yaml to /var/lib/microshift/lvms/lvmd.yaml
Also we currently have on every restart a copying of that file, that behavior would need to change if we move to post-install.
Not generally opposed, just want to make sure we
- dont loose the ability to enable hotreloading in lvms for microshift (allows customers to reconfigure storage without restarting)
- dont break compatibility for ostree-style commits when upgrading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is compatible with the hotreloading behavior. The post-install script would be applying the changes to the /etc/microshift/lvmd.yaml (if not already present). So if a user were to install LVMS on a running cluster, without providing an lvmd.yaml file, the hot-reload should detect the new file and perform as expected.
|
||
#### Single-node Deployments or MicroShift | ||
|
||
The changes proposed here only affect MicroShift. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They also affect LVMS due Microshift and LVMS sharing a version dependency to each other. We will no longer need to release LVMS ahead of Microshift technically and can version it separately. That also means the RPMs need to be versioned and released separately. Otherwise I believe there is no gain here for extracting them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gain is also to support another storage provider in the future.
We need to discuss how this change affects LVMS release cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the rebase changes, we assume that the RPM is still built in the MicroShift repo, so we maintain the version dependency. In general, change the build procedure and full decoupling should be treated as a separate ER, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think that the first step should remain scoped to the feature work.
- During rebasing, the image digests for LVMS and CSI Snapshotter will be parsed from the LVMS bundle (like they | ||
are now) and the digests will be written to their respective `release-$ARCH.json` files. | ||
|
||
- **Mainline Code Changes:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really think this needs to be revisited because this logic still has to exist somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a section called RPM-Spec about this. LMKWYT
Co-authored-by: Jakob Möller <contact@jakob-moeller.com>
…snapshotting tweaks to summary to clarify storage size concerns added uninstall workflow amended mainline code changes to cover in-built dynamic defaulting behavior added RPM Spec section added up/down grade strat added version skew strat added alternatives
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good already, im mostly nitting on details on the LVMS + Snapshotter and the RPM replacing the internals, consider a general LGTM from my side!
|
||
## Version Skew Strategy | ||
|
||
Risk of version skew is mitigated because the LVMS and CSI components are built from the same release image as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im a bit confused by this statement. LVMS was never built with the same release image and had to be rebased, what would be the process for that now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean to say is that the LVMS and CSI versions are tracked by the operator bundle, which is then use to correlate the microshift version to the lvms/csi versions.
## Alternatives | ||
|
||
Use the MicroShift or LVMD config to determine deployment of LVMS and/or snapshotting. It does not satisfy the goal of | ||
optionalizing the installation of the components. Instead, this solution would simply enable/disable the components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you quickly explain why a MicroShift config flag would not optionalize the installation of the components (maybe Im just missing something here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a flag to optionalize storage has some drawbacks (off the top of my head):
- MicroShift carries some tech-debt for rebasing, deploying, and configuring LVMS.
- treating lvms as a first class citizen in microshift isn't necessary. it's just a cluster workload like any other CSI driver.
- should the default be 'on' or 'off'? what should happen if the toggle is switched from 'on' to 'off'? How ever these are answered means more responsibility for microshift, not less.
- disk consumption is another consideration. for offline deployments, should microshift still include the lvms images just in case users switch it on after starting microshift?
- Should MicroShift add support for another driver, how should this be implemented? An enum field would be fine in the config, but it'd also mean microshift would have to first-class that driver as well. This can't scale well at all.
|
||
## Graduation Criteria | ||
|
||
### Dev Preview -> Tech Preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we be able to run Dev Preview of the RPM and the old installation method through microshift core in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we need Dev or Tech preview. Let's release it in GA in 4.17.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change IMO is A) too big to handle in one shot and B) will have too much impact on LVMS / MicroShift release procedure in 4.17 together with the LVMS MicroShift migration, it feels like we are forcing a big bang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushing back. From the user perspective, this is a relatively easy change to existing blueprints. Add this to new installs:
[packages]
name = "microshift-lvms"
version = "*"
Layers built without this and which are then pushed to 4.16 clusters are unaffected.
WRT the migration work, where do you see this becoming overly complex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly in making sure we are not breaking existing users together with the LVMS migration. This is more of a too big to handle for engineering and keep stable kind of thing for me. But Ill wait for some other opinions, Im not gonna be adamant about it :)
- A `microshift-lvms` rpm post-install script that determines what volume group, if any, should be specified. | ||
|
||
- **RPM Spec** | ||
- New RPM specs will be written for `microshift-lvms` and `microshift-lvms-snapshotting` packages. These have the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since snapshotting can be used for any CSI driver, WDYT about microshift-snapshotting?
That way anyone who is just interested in snapshotting capability can install that RPM and then install their own CSI driver + VolumeSnapshotClass.
LVMS can already smartly create or ignore VolumeSnapshotClass creation so this would not need any extra logic paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in an earlier comment. LVMS and Snapshotting are considered as separate installs in the EP
- **RPM Spec** | ||
- New RPM specs will be written for `microshift-lvms` and `microshift-lvms-snapshotting` packages. These have the | ||
following characteristics: | ||
- `microshift-lvms-snapshotting` will depend on `microshift-lvms`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could drop that dependency and let LVMS do dynamic adjustment to wether VolumeSnapshotClass Support exists in the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, but it also means this work would be blocked until the LVMS change was accepted and implemented, which doesn't make sense. Instead, let this change and that one be developed in parallel at their own velocities. Whenever the LVMS change is implemented, it'd just require a one-line change in the microshift rpm spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pro: Unblocks engineering on both fronts
Contra: this will mean that the rpm in the future might need to keep that dependency. Maybe Im just wondering why we dont have a microshift-snapshotting rpm with all the snapshotting stuff?
file at /etc/microshift/lvmd.yaml. The manifests required to deploy and configure the driver and snapshotter are baked | ||
into the MicroShift binary during compilation and are deployed during runtime. As it is now, even if a user wanted to | ||
disable the driver and/or snapshotter, MicroShift would deploy them again when the service restarts. Additionally, the | ||
LVMS and CSI images are always packaged together with MicroShift and thus always consume a certain amount of storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need an estimate of the consumed resources so that it guides us further in the proposal
2. User specifies an ostree blueprint which includes the following sections: | ||
1. Packages: microshift, microshift-greenboot, microshift-networking, microshift-selinux, microshift-lvms, | ||
microshift-lvms-snapshotting | ||
2. File (Optional): lvmd.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to install a default lvmd.yaml file? If so, specifying it in the blueprint would override the default.
Let's clarify this please
4. User checks that there are no LVMS PVs, PVCs, or Snapshots in the cluster. If there are, it is the user's | ||
responsibility to prevent unintentional data loss or LVM volume orphaning before proceeding. | ||
5. User uninstalls microshift-lvms and microshift-lvms-snapshotting rpms. | ||
6. User deletes the relevant cluster API resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we swap steps 4 and 5? I mean, when is the data orphaned?
Is it after RPM uninstall, or after the actions described in step 6?
|
||
#### Single-node Deployments or MicroShift | ||
|
||
The changes proposed here only affect MicroShift. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gain is also to support another storage provider in the future.
We need to discuss how this change affects LVMS release cycle.
|
||
#### Single-node Deployments or MicroShift | ||
|
||
The changes proposed here only affect MicroShift. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the rebase changes, we assume that the RPM is still built in the MicroShift repo, so we maintain the version dependency. In general, change the build procedure and full decoupling should be treated as a separate ER, I think.
## Test Plan | ||
|
||
There are already tests for RPM and Ostree install processes. This proposal only slightly alters the installation | ||
process and thus will require only minimal changes to existing tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to ensure that MicroShift can function normally without storage optional components.
It's true that 100% of tests currently assume the presence of CSI driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No MicroShift components require dynamically provisioned persistent storage, thus they have no dependency on CSI (LVMS or otherwise).
|
||
## Graduation Criteria | ||
|
||
### Dev Preview -> Tech Preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we need Dev or Tech preview. Let's release it in GA in 4.17.
add section on default logic in rpm remove dev -> TP and TP -> GA, as it doesn't fit target version
Co-authored-by: Gregory Giguashvili <85498834+ggiguash@users.noreply.github.com>
It's been decided that an RPM-based approach to optional installation of LVMS and CSI components is neither feasible given the total changes required nor compatible with incoming changes (openshift/microshift#3535). This PR is closed. |
No description provided.