-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STOR-1803: add vsphere snapshot configuration fields to ClusterCSIDriver #1783
STOR-1803: add vsphere snapshot configuration fields to ClusterCSIDriver #1783
Conversation
@RomanBednar: This pull request references STOR-1803 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Hello @RomanBednar! Some important instructions when contributing to openshift/api: |
@RomanBednar: This pull request references STOR-1803 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
b443ef7
to
c12e3f4
Compare
verify-crd-schema error is probably not related to this change as I have not changed |
c12e3f4
to
489a2ec
Compare
489a2ec
to
ed326f9
Compare
// +kubebuilder:validation:Optional | ||
// +openshift:enable:FeatureSets=TechPreviewNoUpgrade | ||
// +optional | ||
GlobalMaxSnapshotsPerBlockVolume *uint32 `json:"globalMaxSnapshotsPerBlockVolume,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all new fields:
- There should be some validation, for example the value should not be negative. Or semantics of negative values should be described.
- Are there any max values?
- Value
0
should have some description - is it a valid value? Does it turn off snapshot support for VSAN / VVOL / globally? - What are the impacts of setting the values high? I would expect higher values mean less performance, otherwise we can set the values to MAXINT by default and we don't need any configuration.
- It could help if there was link to vSphere docs, the questions could be answered there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be some validation, for example the value should not be negative. Or semantics of negative values should be described.
- Correct - currently the operator would fail to unmarshal as the field is uint32 type
- Adding validation for min/max volumes.
W0312 17:09:26.526662 8935 reflector.go:539] k8s.io/client-go@v0.29.1/tools/cache/reflector.go:229: failed to list *v1.ClusterCSIDriver: json: cannot unmarshal number -1 into Go struct field VSphereCSIDriverConfigSpec.items.spec.driverConfig.vSphere.globalMaxSnapshotsPerBlockVolume of type uint32
E0312 17:09:26.526720 8935 reflector.go:147] k8s.io/client-go@v0.29.1/tools/cache/reflector.go:229: Failed to watch *v1.ClusterCSIDriver: failed to list *v1.ClusterCSIDriver: json: cannot unmarshal number -1 into Go struct field VSphereCSIDriverConfigSpec.items.spec.driverConfig.vSphere.globalMaxSnapshotsPerBlockVolume of type uint32
Are there any max values?
- From VMware documentation:
Maximum of 32 snapshots are **supported** in a chain. However, for a better performance use only 2 to 3 snapshots.
- Adding validation for max value of 32.
- Setting any higher value in the config is not prevented currently:
[Snapshot]
global-max-snapshots-per-block-volume = 33
Value
0
should have some description - is it a valid value? Does it turn off snapshot support for VSAN / VVOL / globally?
- Not documented at all, after testing it seems like
0
value results in disabling the option and default is used. - Snapshots are not disabled when
0
is set - Setting min value to 1 should work fine as there is no reason to set
0
$ oc -n openshift-cluster-csi-drivers get cm/vsphere-csi-config -o yaml | grep global-max-snapshots-per-block-volume
global-max-snapshots-per-block-volume = 0
message: 'Failed to create snapshot: failed to take snapshot of the volume c0e2dc41-33de-4566-9323-9aa422741493: "rpc error: code = FailedPrecondition desc = the number of snapshots on the source volume c0e2dc41-33de-4566-9323-9aa422741493 reaches the configured maximum (3)"'
What are the impacts of setting the values high? I would expect higher values mean less performance, otherwise we can set the values to MAXINT by default and we don't need any configuration.
- This is documented under "Best practices for snapshots": https://kb.vmware.com/s/article/1025279
- 2-3 snapshots are recommended for better performance so we can expect a performance drop when using more than 3 snapshots
It could help if there was link to vSphere docs, the questions could be answered there.
- Do we actually want to link to external documentation from comments here?
- Adding the link to relevant doc: https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-E0B41C69-7EEB-450F-A73D-5FD2FF39E891.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. I think linking vSphere docs will help users to actually understand what the options are good for and they don't need to guess.
// overrides the global constraint if set, while it falls back to the global constraint if unset. | ||
// +kubebuilder:validation:Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be better to use overrides globalMaxSnapshotsPerBlockVolume
instead of the global constraint
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack - changed.
204a82d
to
e8fa952
Compare
f9ce480
to
7ea864d
Compare
The API looks good to me, you probably need to fix the tests somehow. |
7ea864d
to
3277e7c
Compare
Yeah, so there are some changes in progress to change the process so new API fields are not enabled by feature set but feature gate, documentation should arrive soon. From current observations it seems there should be 3 separate files now for clustercsidriver CRD, that means we have to change CSO image a bit. Then verification requires a test file for each of those so I've added that, with a small test case for TechPreviewNoUpgrade with new fields - The
|
No. The test works in other PRs without any issues. |
1020a5b
to
3493251
Compare
@deads2k Given the simplicity and potential cadence of similar PRs we want to release this without TechPreview. With the new process which is based on feature gates, can this be achieved by simply including the feature gate in |
/lgtm |
3493251
to
61f5f93
Compare
// +optional | ||
TopologyCategories []string `json:"topologyCategories,omitempty"` | ||
|
||
// globalMaxSnapshotsPerBlockVolume is a global configuration parameter that applies to volumes on all kinds of | ||
// datastores. If unset it defaults to 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something like
If omitted, the platform chooses a default, which is subject to change over time, currently that default is 3.
This matches other fields. Rather than promising a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, changed.
// datastores. If unset it defaults to 3. | ||
// Increasing number of snapshots above 3 can have negative impact on performance, for more details see: https://kb.vmware.com/s/article/1025279 | ||
// Volume snapshot documentation: https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-E0B41C69-7EEB-450F-A73D-5FD2FF39E891.html | ||
// +kubebuilder:validation:Minimum=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do I choose: "no snapshots"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already discussed here: #1783 (comment)
vmWare does not document this value but testing showed that it does not disable snapshots, but default is applied instead. I can add a clarifying statement:
Setting this value to 0 does not disable volume snapshots, but results in default value being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already discussed here: #1783 (comment)
Please add a sentence in the documentation indicating that snapshots cannot be disabled. On this item and the others if they cannot be disabled either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, for all 3 options as it's applicable to all of them.
config/v1/feature_gates.go
Outdated
reportProblemsToJiraComponent("Storage / Kubernetes External Components"). | ||
contactPerson("rbednar"). | ||
productScope(ocpSpecific). | ||
enableIn(Default, TechPreviewNoUpgrade). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TechPreview first please. Demonstrate stability (either via automation or QE sign off) and we can promote in 4.16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know TechPreview first is the safest and most common option, however this change is relatively trivial and only allows setting options of an existing driver that are not new. We're expecting more requests similar to this one from customers and would like to see if there's a possibility to ship this in shorter time than two cycles.
Would it be possible to have this enabled by default in 4.16 (without TechPreview) if we have a sign off from QE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before becoming accessible-by-default, we must have evidence of completeness and reliability. The preferred mechanism is to run automated CI tests in our existing TechPreview jobs and see 95+% pass rates over at least 14 runs. The alternative mechanism is to have QE sign off on a promotion PR. That can all happen in a single release. For instance,
- in 4.16
- merge as techpreview
- add functionality and tests
- after there are 14 runs, open PR promoting to GA
- CI automatically checks for the associated tests (see example failure here)
- PR is merged on green OR QE signs off on the PR and we override the automated stability check (we track data on this now in 4.16)
- we ship 4.16
So this doesn't delay availability in 4.16 assuming the feature is tested and is as stable as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k Removed Default cluster profile for this feature, can we merge this now as TechPreview?
@@ -279,8 +279,35 @@ type VSphereCSIDriverConfigSpec struct { | |||
// If cluster Infrastructure object has a topology, values specified in | |||
// Infrastructure object will be used and modifications to topologyCategories | |||
// will be rejected. | |||
// +listType=set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should have failed our CRD schema checker. Why would this be a safe change to make for clients?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the checker was failing without listType, I've retried it again now and with the recent changes verification is passing without it - dropping this change.
61f5f93
to
5554b88
Compare
@deads2k PTAL again |
@@ -32146,6 +32146,21 @@ | |||
"description": "VSphereCSIDriverConfigSpec defines properties that can be configured for vsphere CSI driver.", | |||
"type": "object", | |||
"properties": { | |||
"globalMaxSnapshotsPerBlockVolume": { | |||
"description": "globalMaxSnapshotsPerBlockVolume is a global configuration parameter that applies to volumes on all kinds of datastores. If omitted, the platform chooses a default, which is subject to change over time, currently that default is 3. Setting this value to 0 does not disable volume snapshots, but results in default value being used. Increasing number of snapshots above 3 can have negative impact on performance, for more details see: https://kb.vmware.com/s/article/1025279 Volume snapshot documentation: https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-E0B41C69-7EEB-450F-A73D-5FD2FF39E891.html", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RomanBednar If we mentioned the value could be set to 0
(TBH, I think we could reject the 0
value safely.), it seems we also need to change the minmum value setting in https://github.com/openshift/api/pull/1783/files#diff-5cb495d65e27aec64145d4043a9e32d029bab426765d92b147556d459be5735bR219 , right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Phaow The comment about 0
value is meant to be rather informative but not needed, and just explains why 0
does not make sense. I can write it better or remove it. The current validation (minimum: 1
) is rejecting 0
which is correct, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RomanBednar Yeah, current validation (minimum: 1) is rejecting 0 looks good to me, just a bit strange that we already reject 0
, just the explain if set to 0
which will still use the default maybe make a bit confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Phaow ok, removing
5554b88
to
8ff768f
Compare
8ff768f
to
d7b9ecd
Compare
@RomanBednar: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, jsafrane, RomanBednar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
c0feb35
into
openshift:master
cc @openshift/storage
Enhancement: openshift/enhancements#1563