Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix storageclass selection #150

Merged
merged 5 commits into from Nov 25, 2020
Merged

Conversation

csibbitt
Copy link
Collaborator

While testing compatibility with Openshift Container Storage, I found that the existing settings for picking a storageclass did not work. I manually tested the storageClassName element in the new place in the AlertMan and Prom templates and it worked as expected, so I wrote this up to adjust.

* selector and class were in the wrong part of the Prom and AlertMan templates
* They were missing from the ES template
* Existing storage_resources var was doing nothing, so I'm removing it
  * No need to deprecate or migrate, if anyone used this it did nothing
* Ran operator-sdk generate bundle --channels stable,latest --default-channel stable --version 1.1.1
* Adjusted CSV UI hints
@csibbitt
Copy link
Collaborator Author

csibbitt commented Nov 24, 2020

Not yet tested

@@ -55,9 +55,6 @@ spec:
storageClass:
description: Storage class name used for Alertmanager PVC
type: string
storageResources:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This param was being used in the wrong part of the template so could never have had any effect. I think it's safe to simply remove it.

@@ -482,6 +481,8 @@ spec:
value: service-telemetry-operator
- name: ANSIBLE_GATHERING
value: explicit
- name: PROMETHEUS_WEBHOOK_SNMP_IMAGE
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not my change, but it came is as part of the bundle update. It must not have been updated previously after this was added to build/operator.yaml.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I noticed that I don't get the change unless I run it locally in the branch. I must have added it post-1.1 tag :(

It also didn't show up when I had --output-dir but worked when it was in the local repo. Thanks for adding that!

@@ -1,6 +1,6 @@
annotations:
operators.operatorframework.io.bundle.channel.default.v1: stable
operators.operatorframework.io.bundle.channels.v1: latest
operators.operatorframework.io.bundle.channels.v1: stable,latest
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leifmadsen I used the bundle update command you showed me - is this the right channels setting (I haven't looked this up yet to be 100% sure I know what it means)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, sorry, actually you want --channels latest and --default-channel stable.

  • --channels is a list of channels this bundle can be a part of -- we only want HEAD to be in latest channel
  • --default-channel defines the bundles default preferred channel -- we want this as stable so that we don't auto-install latest channel

@csibbitt
Copy link
Collaborator Author

csibbitt commented Nov 24, 2020

Removed the 20Gi to 20G change because it resulted in the following when used against an existing deployment:

# persistentvolumeclaims "elasticsearch-data-elasticsearch-es-default-0" was not valid:
# * spec.resources.requests.storage: Forbidden: field can not be less than previous value

@@ -4,7 +4,7 @@ LABEL operators.operatorframework.io.bundle.mediatype.v1=registry+v1
LABEL operators.operatorframework.io.bundle.manifests.v1=manifests/
LABEL operators.operatorframework.io.bundle.metadata.v1=metadata/
LABEL operators.operatorframework.io.bundle.package.v1=service-telemetry-operator
LABEL operators.operatorframework.io.bundle.channels.v1=latest
LABEL operators.operatorframework.io.bundle.channels.v1=stable,latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably revert this back to just latest since that's what the HEAD bundle should probably target.

* Not allowed to supply a blank storageClassName and there is no natural default
* Even empty selectors cause failures on provisioners that do not support them
  * 'Failed to provision volume with StorageClass "standard": claim.Spec.Selector is not supported for dynamic provisioning on Cinder'
@csibbitt
Copy link
Collaborator Author

Testing

  1. Started from Leif's already manually deployed STF (no infrawatch subscriptions to mess with things)

  2. Deleted existing stf object and all pvcs oc delete stf default; oc delete pvc --all

  3. Built my operator with build.sh

  4. Editted deploy/olm-catalog/service-telemetry-operator/manifests/service-telemetry-operator.clusterserviceversion.yaml to point to my new image and oc apply -f <..sto-csv.yaml>

  5. Deployed a default STF from the UI, all defaults except I turned ES on

    1. Verified that the "resources" param is removed, and all three had the "selector" and "class" params
  6. Verified the pvcs were created with the default storage class ("standard", Cinder-backed, non-OCS):

    $ oc get pvc
    NAME                                             STATUS       VOLUME                                     CAPACITY   ACCESS MODES       STORAGECLASS   AGE
    alertmanager-default-db-alertmanager-default-0   Bound        pvc-6b74e598-ff3a-4d1d-948f-94d193ae75f7   19Gi       RWO                standard       55s
    elasticsearch-data-elasticsearch-es-default-0    Bound        pvc-db3610dd-4c8a-426d-ac7b-08de2254035f   20Gi       RWO                standard       58s
    prometheus-default-db-prometheus-default-0       Bound        pvc-7054ce9b-4cc4-4368-9b7e-abc5e16d890d   19Gi       RWO                standard       68s
  7. Wipe it out oc delete stf default; oc delete pvc --all

  8. Redeploy from UI, this time selecting ocs-storagecluster-ceph-rbd from the pre-populated storageClass dropdown

  9. Verified the pvcs were created with the OCS storageclass:

    $ oc get pvc
    NAME                                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE    ````
    alertmanager-default-db-alertmanager-default-0   Bound    pvc-d568268e-1864-426d-9daa-4a69ade58862   19Gi       RWO            ocs-storagecluster-ceph-rbd   2m18s    ````
    elasticsearch-data-elasticsearch-es-default-0    Bound    pvc-bba7664e-2e6a-4509-a292-c170d2041546   20Gi       RWO            ocs-storagecluster-ceph-rbd   45s    ````
    prometheus-default-db-prometheus-default-0       Bound    pvc-d132e038-7279-4adc-81f9-cfe835d849fa   19Gi       RWO            ocs-storagecluster-ceph-rbd   56s    ````

As of this patch, I'd say that STF supports OCS for persistent storage of metrics and events.

I did not test any scenarios involving changing the class of already provisioned storage.

I did not (really) test the selectors, but they appear to be in the right place now since they fail if they are unsupported.

@csibbitt csibbitt force-pushed the csibbitt-1176-fix-storageclass-selection branch from a4fad94 to b37660c Compare November 24, 2020 21:51
@csibbitt
Copy link
Collaborator Author

@csibbitt csibbitt merged commit 124b0a5 into master Nov 25, 2020
@csibbitt csibbitt deleted the csibbitt-1176-fix-storageclass-selection branch November 25, 2020 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants