Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPLAT-1808: Add vSphere multi disk support #2028

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

vr4manta
Copy link
Contributor

@vr4manta vr4manta commented Sep 16, 2024

SPLAT-1808

Changes

  • Create new feature gate for vSphere multi disk
  • Create initial CRD changes for adding additional disks to vSphere machines

Blocks

Notes

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 16, 2024
Copy link
Contributor

openshift-ci bot commented Sep 16, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Sep 16, 2024

Hello @vr4manta! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 16, 2024
@vr4manta
Copy link
Contributor Author

/test all

@kannon92
Copy link

Do you have a Jira ticket for this work? I'm trying to research disk support in openshift and this is of interest to me.

@vr4manta
Copy link
Contributor Author

/retest

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 18, 2024
@vr4manta
Copy link
Contributor Author

Do you have a Jira ticket for this work? I'm trying to research disk support in openshift and this is of interest to me.

@kannon92 , We are investigating this as well. I am using our spike to track the work and am getting stories / epic information for you. I'll attach the JIRAs to the PR above as well.

Spike: https://issues.redhat.com/browse/SPLAT-1789

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 18, 2024
@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 18, 2024
@vr4manta vr4manta changed the title WIP: Add vSphere multi disk support Add vSphere multi disk support Sep 23, 2024
@vr4manta vr4manta marked this pull request as ready for review September 23, 2024 13:36
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 23, 2024
Copy link
Contributor

@rvanderp3 rvanderp3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, i left a couple of comments. also, will there be a need for additional unit tests?

// Disks is a list of non OS disks to be attached to the VM.
// +openshift:enable:FeatureGate=VSphereMultiDisk
// +openshift:validation:FeatureGateAwareMaxItems:featureGate="",maxItems=0
// +openshift:validation:FeatureGateAwareMaxItems:featureGate=VSphereMultiDisk,maxItems=15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a max of 15 related to any documented limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just put it in place for now since scsi controller can only house up to 16 if I remember correctly (so 1 for OS disk). Current design was to keep it simple and we just auto add to primarly scsi controller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limit will need to be explained in the godoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update godoc to contain this info.

@@ -172,6 +179,15 @@ type NetworkDeviceSpec struct {
AddressesFromPools []AddressesFromPool `json:"addressesFromPools,omitempty"`
}

// VSphereDisk describes additional disks for vSphere.
type VSphereDisk struct {
// The device name exposed. Used to identify the disk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not defined, what is the behavior? is a disk created for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do not specify a device name, it will still create the disk with the size you defined. The deviceName field is to be used by admin to sort of remind them what disk is meant to be used for (if they so desire).

@vr4manta
Copy link
Contributor Author

overall lgtm, i left a couple of comments. also, will there be a need for additional unit tests?

Unit tests are in the webhook for MAO. The CEL is being ignored based on current behavior of provider specs.

@vr4manta
Copy link
Contributor Author

/assign @JoelSpeed
Ready for review

@vr4manta vr4manta changed the title Add vSphere multi disk support SPLAT-1808: Add vSphere multi disk support Sep 23, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 23, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 23, 2024

@vr4manta: This pull request references SPLAT-1808 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

SPLAT-1808

Changes

  • Create new feature gate for vSphere multi disk
  • Create initial CRD changes for adding additional disks to vSphere machines

Notes

  • This is being done as part of vSphere work for this feature

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rvanderp3
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 23, 2024
Copy link
Contributor

openshift-ci bot commented Sep 23, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rvanderp3, vr4manta
Once this PR has been reviewed and has the lgtm label, please ask for approval from joelspeed. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a paired MAO PR that implements validations for this new API field that I can review?

reportProblemsToJiraComponent("splat").
contactPerson("vr4manta").
productScope(ocpSpecific).
enableIn(configv1.DevPreviewNoUpgrade, configv1.TechPreviewNoUpgrade).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you ready for techpreview or would you prefer just to start in dev preview for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can start in Dev if you prefer. I have the initial logic completed for creating extra disks for masters and workers. This is just the first step of the bigger picture, but i'll take your guidance on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR for MAO: openshift/machine-api-operator#1290. I'll also put up in top comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this effort is likely to be long and span multiple releases?

Comment on lines 75 to 76
// +openshift:validation:FeatureGateAwareMaxItems:featureGate="",maxItems=0
// +openshift:validation:FeatureGateAwareMaxItems:featureGate=VSphereMultiDisk,maxItems=15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the whole field featuregated, rather than feature gated limits (though note this isn't actually part of a CRD so these markers are just for show at the moment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can do that. Yep, I had to make all this logic in the webhook of MAPI operator due to provider specs not being proper CRDs.

// Disks is a list of non OS disks to be attached to the VM.
// +openshift:enable:FeatureGate=VSphereMultiDisk
// +openshift:validation:FeatureGateAwareMaxItems:featureGate="",maxItems=0
// +openshift:validation:FeatureGateAwareMaxItems:featureGate=VSphereMultiDisk,maxItems=15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limit will need to be explained in the godoc

@@ -70,6 +70,13 @@ type VSphereMachineProviderSpec struct {
// When using LinkedClone, if no snapshots exist for the source template, falls back to FullClone.
// +optional
CloneMode CloneMode `json:"cloneMode,omitempty"`
// Disks is a list of non OS disks to be attached to the VM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Godoc should explain validations, and should start with the serialised form (disks) of the field name.

What does a user need to do for each disk here, do they need to specify any software side configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the logic is to just create the disk and attach to vsphere VMs. What user does after that can be anything. They can use LSO (local storage operator) or the future auto mount that we are currently working on to put these disks into certain locations.

// +openshift:validation:FeatureGateAwareMaxItems:featureGate=VSphereMultiDisk,maxItems=15
// +kubebuilder:validation:XValidation:rule="self == oldSelf | size(self) == size(oldSelf)",message="additional disks cannot be added or removed once set"
// +optional
Disks []*VSphereDisk `json:"disks,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why []*VsphereDisk instead of []VSphereDisk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my preference. I can change to not be pointer if you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed pointer.

// +openshift:enable:FeatureGate=VSphereMultiDisk
// +openshift:validation:FeatureGateAwareMaxItems:featureGate="",maxItems=0
// +openshift:validation:FeatureGateAwareMaxItems:featureGate=VSphereMultiDisk,maxItems=15
// +kubebuilder:validation:XValidation:rule="self == oldSelf | size(self) == size(oldSelf)",message="additional disks cannot be added or removed once set"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this actually mean? This is part of the vSphere providerSpec on the Machine right? So, we don't expect it to be changed once the machine is created anyway, that's true of most fields

When this is present on a MachineSet though, then it should be mutable right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is correct. I expect machines to be not mutable, but for machineset, I'll follow the patterns of other fields in the machine def section.

type VSphereDisk struct {
// The device name exposed. Used to identify the disk.
// +optional
DeviceName *string `json:"deviceName,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't need a pointer here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll update that.

@@ -172,6 +179,15 @@ type NetworkDeviceSpec struct {
AddressesFromPools []AddressesFromPool `json:"addressesFromPools,omitempty"`
}

// VSphereDisk describes additional disks for vSphere.
type VSphereDisk struct {
// The device name exposed. Used to identify the disk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Godoc should start with the serialised version.

What are the requirements of this string? Max/Min/Characters/Patterns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a generic name for admin to use to identify the disk definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated godoc with these details.

// The device name exposed. Used to identify the disk.
// +optional
DeviceName *string `json:"deviceName,omitempty"`
// SizeGB is the size of the disk (in GB).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serialised version to start.

What's the minimum and maximum values here?

Do we mean GB or GiB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll update that.

// +optional
DeviceName *string `json:"deviceName,omitempty"`
// SizeGB is the size of the disk (in GB).
SizeGB int64 `json:"sizeGb"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explicit +kubebuilder:validation:Required marker please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I will add. thanks.

@JoelSpeed
Copy link
Contributor

Is this feature currently supported in Cluster API?

@vr4manta
Copy link
Contributor Author

Is this feature currently supported in Cluster API?

No. CAPI only supports additional disks that are created in the template.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 24, 2024

@vr4manta: This pull request references SPLAT-1808 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

SPLAT-1808

Changes

  • Create new feature gate for vSphere multi disk
  • Create initial CRD changes for adding additional disks to vSphere machines

Blocks

Notes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Sep 24, 2024
Copy link
Contributor

openshift-ci bot commented Sep 24, 2024

New changes are detected. LGTM label has been removed.

@JoelSpeed
Copy link
Contributor

No. CAPI only supports additional disks that are created in the template.

So, if we merge this feature, this will create a feature gap between MAPV and CAPV? How would you suggest we migrate users using this feature?

We likely need to implement this upstream in CAPV before we can do any migration right?

@vr4manta
Copy link
Contributor Author

No. CAPI only supports additional disks that are created in the template.

So, if we merge this feature, this will create a feature gap between MAPV and CAPV? How would you suggest we migrate users using this feature?

We likely need to implement this upstream in CAPV before we can do any migration right?

Yes. We will need to work with CAPV to see what they have thought about in the past for dynamic disk configs for VMs. The installer enhancement for this feature currently uses the postProvision hook after CAPI machines (control plane) are created and adds the disks to the VMs. I have been trying to follow what some of the other providers have for naming conventions, but each platform seems to use slightly different field names. For vSphere specific, it would be good to see if we can use the current additionalDisks field to add them if they do not exist in the template.

@vr4manta
Copy link
Contributor Author

vr4manta commented Oct 2, 2024

/hold
Looking into upstream enhancement (CAPV) vs having a a drift in potential configs.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 2, 2024
Copy link
Contributor

openshift-ci bot commented Oct 11, 2024

@vr4manta: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-serial-techpreview acc042b link true /test e2e-aws-serial-techpreview
ci/prow/e2e-upgrade acc042b link true /test e2e-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants