Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support automatic resizing of volumes #35941

Closed
saad-ali opened this issue Nov 1, 2016 · 33 comments
Closed

Support automatic resizing of volumes #35941

saad-ali opened this issue Nov 1, 2016 · 33 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@saad-ali
Copy link
Member

saad-ali commented Nov 1, 2016

This is a feature request to support automatic resizing of volumes. For example, if a GCE PD is created as 50 GB, and then resized to 100 GB, the PV object should be able to updated by user to reflect the new size, and Kubernetes should automatically handle the resize, doing whatever is necessary to make the extra disk space available for use to pods. This is non-trivial since it requires repartitioning the disk (see https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd).

@saad-ali saad-ali added sig/storage Categorizes an issue or PR as relevant to SIG Storage. team/cluster kind/feature Categorizes issue or PR as related to a new feature. labels Nov 1, 2016
@westwin
Copy link

westwin commented Feb 3, 2017

is there any plan to support this ?

@wclr
Copy link

wclr commented Feb 3, 2017

So if resize disk manually in compute console and then change PV definition? Will it take effect?

@westwin
Copy link

westwin commented Feb 7, 2017

yes, manually oc edit pv does work. but it is not automatic resizing.

@wclr
Copy link

wclr commented Feb 7, 2017

If you don't mind I'll ask about manual procedure of resizing. I have created 8GB PV in a cluster. What steps should be made to resize it to 10GB now?
Folloing instructions in https://cloud.google.com/compute/docs/disks/add-persistent-disk

  • I resize it using UI in cloud console.
  • On the node I made sudo resize2fs /dev/disk/by-id/google-[DISK_NAME]
  • In cluster present volume I see capacity of 8GB. If I try to change a definition make capacity 10G I get message: the server could not find the requested resource

What should be done to complete cluster PV resize procedure?

I think recommended manual resize procedure should be documented somewhere.

@westwin
Copy link

westwin commented Feb 9, 2017

@whitecolor , I'm actually using openshift-origin, with the oc command line, I can manullay resize the PV by oc edit pv pv_id

@kow3ns
Copy link
Member

kow3ns commented Mar 17, 2017

@saad-ali Are there any thoughts as to how this will interact with the PersistentVolumeController and PersistentVolumeClaims. In particular

  1. If we resize a PV and it is larger than a bound PVC I would assume nothing should happen as the PVC is bound to a PV that satisfies its storage requirements.
  2. If we resize a PV to a value that is smaller than a bound PVC will it automatically unbind the PVC as the storage requirement is no longer met, and, if a Pod has mounted this PV based on a PVC, will that Pod be evicted?

@Jan-M
Copy link

Jan-M commented Mar 20, 2017

Are you expecting Kubernetes to run the "resize2fs"?

This would be nice - as this would not require to have privileged pods anymore to be able to run resize2fs, but I can also see how this could not be Kubernetes' responsibility.

1st you could support only increasing the disk space, shrinking is a whole different topic.

@saad-ali
Copy link
Member Author

Feature request is being tracked here: kubernetes/enhancements#284

@speedplane
Copy link

Until this feature request is implemented, are there any recommended work-arounds to get resizing to work? After resizing the disk, do we need to log into each instance manually and run resize2fs? If we do so, will Kuberenetes automatically recognize the newly added space?

@sjdweb
Copy link

sjdweb commented Sep 6, 2017

+1 on @speedplane 's question if anyone has experience doing it?

@krogon-dp
Copy link

krogon-dp commented Sep 8, 2017

This procedure is valid unless kubernetes introduces full support for resizing PVs. The feature is being tracked at kubernetes/enhancements#284

Resize Kubernetes Persistent Volume using Amazon Elastic Block Device

Resize EBS with full OS on host (Centos/Debian/Ubuntu)

WARNING Below instruction requires root access to host nodes with installed disk utilities (e2fsprogs/e2fsprogs-extra).

  1. Identify the volume in AWS management console. Tags kubernetes.io/created-for/pvc/namespace and kubernetes.io/created-for/pvc/name are useful if the PV was created to satisfy any claim. Find and copy tag kubernetes.io/created-for/pv/name (later used as $PV_NAME variable).

  2. (Optional) Create snapshot of volume. This is done for security purpose as a backup of the data.

  3. (Optional) Create new volume from the snapshot - this is the only way to move data between AWS availability zones. Skip if not necessary.

  4. Increase the size of volume using AWS management console. During the modification you can also change volume type / IOPS requested. Example CLI below.

     $ aws ec2 modify-volume --region us-east-1 --volume-id vol-11111111111111111 --size 20 --volume-type gp2
    
  5. Wait for the volume to apply changes. Using CLI watch for ModificationState and Progress params.

     $ aws ec2 describe-volumes-modifications --region us-east-1 --volume-id vol-11111111111111111
    
  6. Identify the node, where volume is attached by checking pod location. Below example will use example-pod as part of pod name.

     $ kubectl get node -o wide | grep $(kubectl -n namespace describe $(kubectl -n namespace get po -o name | grep example-pod) | grep Node: | awk -F"[\t/]" '{print }')
     ip-172-20-83-245.eu-central-1.compute.internal   Ready,node     22d       v1.6.4    54.93.226.64    Debian GNU/Linux 8 (jessie)   4.4.65-k8s`
    
  7. Login to the node with ssh and gain root privileges.

  8. Identify the name of block device using lsblk command. Notice, that the output can show bigger size of block device.

     # BD_NAME=$(lsblk | grep "$PV_NAME" | awk '{print $1}')
    
  9. Resize the block device partition using resize2fs(ext2/3/4) or xfs_growfs(xfs)

     # resize2fs "/dev/$BD_NAME" 
    
  10. Change the specification of the PV within Kubernetes to apply new size.

     $ kubectl patch pv "$PV_NAME"  -p '{"spec":{"capacity":{"storage":"20Gi"}}}'
    
  11. (Optional) Depending on your application you may have to restart the pod/container.

  12. (Optional) Once data availability has been verified you may delete the volume snapshot from AWS.

WARNING You will end with bigger Persistent Volume bounded to same, smaller Claim (see Appendixes)

Generic procedure (not tested) - specifically when access to host OS is limited

  1. Scale the deployment / daemon set/ stateful set down to zero replicas.
  2. Wait for the k8s pods to terminate cleanly, check that the volume has detached via AWS console or CLI.
  3. Create a snapshot of the volume (it will have proper AWS Tag kubernetes.io/created-for/pvc/name).
  4. Once the snapshot has completed, modify the volume and increase the size as desired.
  5. Edit the deployment (example of the server container section)
    • remove the args key (and any values)
    • change the image to an OS image which you can install packages like alpine:3.5 for example.
    • Add the key securityContext: { privileged: true }
    • Add the key command: [ "sleep", "9999" ]
  6. Once edit is applied it will launch a new container with the data volume attached/mounted (scale/set repilcas back to 1).
  7. Resize the filesystem
    • Install e2fsprogs-extra (on alpine)
    • Lookup the volume device df -h | grep '/data'
    • Run resize2fs /dev/<device ID>
  8. Edit the PersistentVolume in Kubernetes to change the size to match from step 4
  9. Restore the deployment to it's previous state (refer to the helm chart or another cluster)
  10. Ensure the old pod terminates cleanly and the new one comes up (it can take a bit for EBS to re-optimize the volume)
    • Prometheus may go through an automatic crash recovery procedure. If it doesn't it may be necessary to restart (delete) the pod again and let it come up cleanly.

WARNING You will end with bigger Persistent Volume bounded to same, smaller Claim (see appendixes).

Appendixes

Clean status of Persistent Volume Claim

As noted in procedure at the end of all steps you have remaining claim (request for storage) for smaller size, and a persistent volume (with bigger size) bounded to that claim. Everything is working fine unless persistent volume is bigger than the claim.

To get proper claim size there are two options:

  1. Change the claim definition within etcd and wait for propagation to all etcd nodes - this introduces a little risk (direct db modification)
    • exec to etcd-server pod
    • get current manifest
      $ etcdctl get /registry/persistentvolumeclaims/<namespace>/<pvc name> > json
    • modify saved json file
    • upload new version
      $ etcdctl update /registry/persistentvolumeclaims/<namespace>/<pvc name> < json
  2. Recreate claim
    • Set reclaimPolicty of PV to retain.
    • Delete pod and PVC
    • manually create new PVC with the PV id specified
      • if you use label selectors you can just recreate the PVC

@Jan-M
Copy link

Jan-M commented Sep 8, 2017

As the question above is if anyone is doing it or has experience with it, we have automated this in our postgresql operator (https://github.com/zalando-incubator/postgres-operator) and users are somewhat regularly triggering resizes via a user interface.

As our pods/volumes are connected with stateful sets, we also delete those in the process and recreate to reflect the new size.

The resize2fs we trigger from within the pods, which we run as privileged pods to allow it.

@krogon-dp
Copy link

krogon-dp commented Sep 8, 2017

^up: You could use init-container to limit privilege flag, but this would require pod restart

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2018
@sdiaz
Copy link

sdiaz commented Jan 10, 2018

/remove-lifecycle stale

@discordianfish
Copy link
Contributor

There is support since 1.8 for some volume implementation: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims

So can this get closed?

@discordianfish
Copy link
Contributor

Actually.. I'm confused. Running 1.9.3 here and changing my EBS PV size doesn't do anything and according to https://kubernetes.io/docs/reference/feature-gates/, the feature gate ExpandPersistentVolumes doesn't exist in >1.8 anymore.

@discordianfish
Copy link
Contributor

Never mind, it also needs the admission plugin but then it should work (not tested). So probably still can be closed, right?

@gnufied
Copy link
Member

gnufied commented Apr 11, 2018

@discordianfish please try EBS PVC resize with 1.10. Currently the user experience of resizing volumes with file systems is not ideal. You will have to edit the pvc and then wait for FileSystemResizePending condition to appear on PVC and then delete and recreate the pod that was using the PVC. If there was no pod using the PVC, then once condition FileSystemResizePending appears on PVC then you will have to start a pod using it for file system resize to finish.

@discordianfish
Copy link
Contributor

Ah, I see, the empathize is on automatic. This could be clarified in the issue description maybe. It sounds right now like it's not supported at all.

@gnufied
Copy link
Member

gnufied commented Apr 12, 2018

We are on track for implementing online resizing in 1.11 - kubernetes/community#1535 . So you no longer have to re-start the pod to finish the resizing process, it will be "automatic" once you edit the pvc.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2018
@george-angel
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 9, 2018
@george-angel
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 9, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2019
@george-angel
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 7, 2019
@kron4eg
Copy link

kron4eg commented Apr 7, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 7, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2019
@george-angel
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2019
@gnufied
Copy link
Member

gnufied commented Jul 8, 2019

This should be implemented and work out of box now. In any 1.15 cluster you can edit PVC (for volume types that support expansion) and have k8s take care of the rest.

See - https://kubernetes.io/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims

/close

@k8s-ci-robot
Copy link
Contributor

@gnufied: Closing this issue.

In response to this:

This should be implemented and work out of box now. In any 1.15 cluster you can edit PVC (for volume types that support expansion) and have k8s take care of the rest.

See - https://kubernetes.io/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests