Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SSAND-1180 ⁃ How do we gracefully increase storage capacity via cass-operator while Cass Datacenter, Statefulset etc are in service with incoming workloads #263

Open
mparikhcloudbeds opened this issue Jan 20, 2022 · 26 comments
Assignees
Labels
assess Issues in the state 'assess' question Further information is requested zh:Assess/Investigate

Comments

@mparikhcloudbeds
Copy link

mparikhcloudbeds commented Jan 20, 2022

Following the below thread, wanted to get an update:
https://community.datastax.com/questions/12269/index.html

Environment:

  • AWS EKS and AWS EBS
  • Cass-Operator : 1.9
  • Server Image : DSE 6.8.18 and/or OSS 3.11.11

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1180
┆priority: Medium

@mparikhcloudbeds mparikhcloudbeds added the question Further information is requested label Jan 20, 2022
@sync-by-unito sync-by-unito bot changed the title How do we gracefully increase storage capacity via cass-operator while Cass Datacenter, Statefulset etc are in service with incoming workloads K8SSAND-1180 ⁃ How do we gracefully increase storage capacity via cass-operator while Cass Datacenter, Statefulset etc are in service with incoming workloads Jan 20, 2022
@burmanm
Copy link
Contributor

burmanm commented Jan 21, 2022

Hi, does your PV provider support PVC volume expansion?

@mparikhcloudbeds
Copy link
Author

Hi, does your PV provider support PVC volume expansion?

@burmanm - Yes, the storage class that we are using has the following property.
allowVolumeExpansion: true

@mparikhcloudbeds
Copy link
Author

@burmanm - Following up to see if there's an update on this?

@burmanm
Copy link
Contributor

burmanm commented Jan 28, 2022

Hey, sorry. The process of expanding a PVC with StatefulSets is a bit tricky and involves manual operations (restriction of Kubernetes). Sadly my local instance did not support the feature, but I'll try to create an example shortly with documented steps.

@mparikhcloudbeds
Copy link
Author

thnx @burmanm .

Is this something on the roadmap of cass-operator project?

@adejanovski adejanovski added zh:To Do Issues in the ZenHub pipeline 'To Do' zh:To-Do and removed zh:To Do Issues in the ZenHub pipeline 'To Do' labels Mar 30, 2022
@bradfordcp
Copy link
Member

It's a feature we would like to see, but unfortunately has not been scheduled yet. We have identified the steps to resolve the issue, but it will require a bit of time to implement.

@counter2015
Copy link

counter2015 commented May 6, 2022

Hey, sorry. The process of expanding a PVC with StatefulSets is a bit tricky and involves manual operations (restriction of Kubernetes). Sadly my local instance did not support the feature, but I'll try to create an example shortly with documented steps.

@burmanm Could you provide more details about this?
I have a 4 node cluster and there disk usage is almost full.
A workaround is to add nodes in cluster, and the data will rebalanced, and cleanup auto.
But it is a waste of cpu and memory resources.

@discostur
Copy link

@counter2015 you can easily upgrade your storage manually:

  • set new storage capacity in your PVC
  • restart the cassandra pods one by one

Then, your PVCs should automatically get resized by your storage csi.

@counter2015
Copy link

counter2015 commented Jun 9, 2022

@discostur I am not sure if the PVC capactity will be changed by operator after I edit datacenter yaml file.
Finally, I incresed storage capacity by creating a new datacenter and migrating data from old dc1 to new dc2.

@discostur
Copy link

@counter2015 no it does not! i edited my datacenter yaml file and nothing was changed in the pvc / pv. So i edited the pvc manually and the storage was resized ...

@jsanda
Copy link
Contributor

jsanda commented Jun 17, 2022

The process is actually a bit more involved to do it safely.

First, we need to delete the StatefulSet without deleting the pods. This can be done for example with kubectl delete --cascade=false.

Next, make sure that persistentVolumeReclaimPolicy on the PV is set to Retain. Remove the claim reference. Then delete the PVC.

Now go ahead expand the volume and update the capacity in the PV spec.

Create new PVC that will bind to the PV. The name of the PVC needs to be the same as the name of the old one.

Lastly, recreate the StatefulSet. The StatefulSet controller find the existing PVCs and pods. The StatefulSet will immediately move into the ready state (assuming the pods are ready).

@counter2015
Copy link

@jsanda Is there any risk to edit the pvc size directly ?

@jsanda
Copy link
Contributor

jsanda commented Jun 17, 2022

That may work and might be easier than what I prescribed. I would need to do some testing/investigation to be certain.

@okgolove
Copy link

okgolove commented Jan 5, 2023

prometheus-operator (which uses statetulset for prometheus pod as well) offeris this way. But this does not work for k8scassandra because of admission webhook:

admission webhook "vcassandradatacenter.kb.io" denied the request: CassandraDatacenter write rejected, attempted to change storageConfig

My example:

k8ssandracluster:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: demo
spec:
  cassandra:
    serverVersion: "4.0.3"
    serverImage: k8ssandra/cass-management-api:4.0.3
    telemetry:
      prometheus:
        enabled: true
    storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: gp3-multizone
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
    config:
      jvmOptions:
        heapSize: 512M
    datacenters:
      - metadata:
          name: dc1
        size: 9
        racks:
          - name: r1
            nodeAffinityLabels:
              onairent.live/node-type: cassandra-node
              topology.kubernetes.io/zone: eu-north-1a
          - name: r2
            nodeAffinityLabels:
              onairent.live/node-type: cassandra-node
              topology.kubernetes.io/zone: eu-north-1b
          - name: r3
            nodeAffinityLabels:
              onairent.live/node-type: cassandra-node
              topology.kubernetes.io/zone: eu-north-1c
  • Change storage to 150Gi
  • Apply changed manifest
  • Patch PVCs
for p in $(kubectl get pvc -l cassandra.datastax.com/datacenter=dc1 -o jsonpath='{range .items[*]}{.metadata.name} {end}'); do \
  kubectl patch pvc/${p} --patch '{"spec": {"resources": {"requests": {"storage":"150Gi"}}}}'; \
done
  • Delete statefulsets
kubectl delete statefulset -l cassandra.datastax.com/datacenter=dc1 --cascade=orphan

After that no changes are applied to cassandra cluster due to the error mentioned above. Even if I try to resize my cluster I get the error and nothing happens.

@adziura-ledger
Copy link

having the same issue described in previous comment
what is the procedure of increase storage capacity in this case?

@chandapukiran
Copy link

I have directly edited the PVCs and restarted the pods in my test environment. Well, nothing is broken and I can see the new size in the PVCs reflected and access the test data.

@okgolove
Copy link

okgolove commented May 31, 2023

Check prometheus-operator resizing manual. Works fine for k8ssandra as well.

@chandapukiran
Copy link

@okgolove I was trying the steps provided on prometheus-operator resizing manual and it worked for me but when i deleted the cluster and tried on a new cluster. It throws the error you mentioned above. Is it still working for you?

Error from server (CassandraDatacenter write rejected, attempted to change storageConfig.CassandraDataVolumeClaimSpec): admission webhook "vcassandradatacenter.kb.io" denied the request: CassandraDatacenter write rejected, attempted to change storageConfig.CassandraDataVolumeClaimSpec

@adejanovski adejanovski added the assess Issues in the state 'assess' label Jun 7, 2023
@okgolove
Copy link

okgolove commented Jun 7, 2023

@chandapukiran have you changed storage size in cluster manifest before recreating?

@chandapukiran
Copy link

@okgolove No, so basically i have created a cluster with a default size and later tried to change the size by trying to modify the cass object

@okgolove
Copy link

okgolove commented Jun 7, 2023

@chandapukiran ahh, yes. Admission webhok won't let you make this change. I disabled it temporary then modified.

@chandapukiran
Copy link

@okgolove oh ok, could you share me the commands to disable/enable admission webhook

@okgolove
Copy link

okgolove commented Jun 7, 2023

@chandapukiran how did you install the operator? If via helm chart then just set

cass-operator:
  admissionWebhooks:
    enabled: false

Or just delete admission webhook via kubectl

@chandapukiran
Copy link

Thanks @okgolove , i see it is already disabled in my helm chart but I now understand why it worked for me before but not now. I was playing with k8ssandra-operator in another namespace and that was causing the issue. Now I am good.

@surajk94
Copy link

Adding the exact steps to be followed for quick reference:

  • disable admissionWebhooks in operator and re-deploy it - cass-operator: admissionWebhooks: enabled: false
  • stop the required data-centers and set new value for volume size in K8ssandraCluster and apply the changes. Set stopped: true flag in each of the required data-centers in the datacenters list and apply the yaml file using kubectl apply -f <file>.
  • manually edit the PVC to the required size for each node in the cluster. One can use kubectl edit pvc <pvc-name> -n <namespace> and edit the size in the spec section
  • delete the underlying StatefulSet using the orphan deletion strategy: kubectl delete statefulset <sts-name> -n <namespace> --cascade=orphan
  • remove the stopped flag in k8ssandra-cluster yaml file and apply the changes to re-start the stopped data-centers in the cluster
  • re-enable admissionWebhooks in operator and re-deploy it

@burmanm
Copy link
Contributor

burmanm commented Dec 19, 2023

Implementation ticket: #602

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assess Issues in the state 'assess' question Further information is requested zh:Assess/Investigate
Projects
Status: Assess/Investigate
Development

No branches or pull requests