Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add azuredisk PV size grow feature #64386

Merged
merged 1 commit into from
Jun 5, 2018

Conversation

andyzhangx
Copy link
Member

@andyzhangx andyzhangx commented May 28, 2018

What this PR does / why we need it:
According to kubernetes/enhancements#284, add size grow feature for azure disk

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #56463

Special notes for your reviewer:

  • This feature is ony for azure managed disk, and if that disk is already attached to a running VM, disk resize will fail as following:
$ kubectl describe pvc pvc-azuredisk
Events:
  Type     Reason              Age               From           Message
  ----     ------              ----              ----           -------
  Warning  VolumeResizeFailed  51s (x3 over 3m)  volume_expand  Error expanding volume "default/pvc-azuredisk" of plugin kubernetes.io/azure-disk : disk.DisksClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="OperationNotAllowed" Message="Cannot resize disk andy-mg1102-dynamic-pvc-d2d00dd9-6185-11e8-a6c3-000d3a0643a8 while it is attached to running VM /subscriptions/.../resourceGroups/.../providers/Microsoft.Compute/virtualMachines/k8s-agentpool-17607330-0."
  • Note: if run into such error issue, workaround is delete the pod or even Deployment/StatefulSet, make sure the disk is in unattached state and run kubectl edit pvc ... operation again.

How to use azure disk size grow feature

  • In the beginning, pls make sure the azure disk PVC is created by kubernetes.io/azure-disk storage class with allowVolumeExpansion: true (default is false)
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: hdd
provisioner: kubernetes.io/azure-disk
parameters:
  skuname: Standard_LRS
  kind: Managed
  cachingmode: None
allowVolumeExpansion: true
  • Before run kubectl edit pvc pvc-azuredisk operation, pls make sure this PVC is not mounted by any pod(change the replica count to 0, this will terminate the pod and detach the disk, need wait a few minutes) otherwise there would be resize error. Now run kubectl edit pvc pvc-azuredisk to change azuredisk PVC size from 6GB to 10GB
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
...
...
  name: pvc-azuredisk
...
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 6Gi
  storageClassName: hdd
  volumeMode: Filesystem
  volumeName: pvc-d2d00dd9-6185-11e8-a6c3-000d3a0643a8
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 6Gi
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-05-27T08:14:34Z
    message: Waiting for user to (re-)start a pod to finish file system resize of
      volume on node.
    status: "True"
    type: FileSystemResizePending
  phase: Bound
  • After resized, run kubectl describe pvc pvc-azuredisk to check PVC status:
$ kubectl describe pvc pvc-azuredisk
Name:          pvc-azuredisk
Namespace:     default
StorageClass:  hdd
Status:        Bound
...
Capacity:      5Gi
Access Modes:  RWO
Conditions:
  Type                      Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----                      ------  -----------------                 ------------------                ------  -------
  FileSystemResizePending   True    Mon, 01 Jan 0001 00:00:00 +0000   Wed, 29 Aug 2018 02:29:52 +0000           Waiting for user to (re-)start a pod to finish file system resize of volume on node.
Events:
  Type       Reason                 Age    From                         Message
  ----       ------                 ----   ----                         -------
  Normal     ProvisioningSucceeded  3m57s  persistentvolume-controller  Successfully provisioned volume pvc-d7d250c1-ab32-11e8-bfaf-000d3a4e76db using kubernetes.io/azure-disk
Mounted By:  <none>
  • Create a pod mounting with this PVC, you will get
$ kubectl exec -it nginx-azuredisk -- bash
# df -h
Filesystem      Size  Used Avail Use% Mounted on
...
/dev/sdf        9.8G   16M  9.3G   1% /mnt/disk
...

Note: volume expansion feature is beta in v1.11

Release note:

Add azuredisk size grow feature

/sig azure
/assign @feiskyer @karataliu @gnufied
cc @khenidak

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label May 28, 2018
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 28, 2018
@andyzhangx
Copy link
Member Author

/assign @deads2k

@@ -131,3 +135,38 @@ func (c *ManagedDiskController) getDisk(diskName string) (string, string, error)

return "", "", err
}

// ResizeDisk Expand the disk to new size
func (c *ManagedDiskController) ResizeDisk(diskName string, oldSize resource.Quantity, newSize resource.Quantity) (resource.Quantity, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

report error early if oldSize > newSize?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws & gce won't report error, let's be consistent with that

return oldSize, err
}

return diskController.ResizeDisk(spec.PersistentVolume.Spec.AzureDisk.DiskName, oldSize, newSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a info log here with diskname, oldSize and newSize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the logs in ManagedDiskController.ResizeDisk func

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How fast is resize operation on Azure btw? If it is slow and controller crashes while resize is in progress, I am wondering if there is a way to query the volume and find out if resize is in progress with cloudprovider and return early from here rather than calling resize again.

I know GCE and AWS does not implement this mechanism currently but I am going to change it for AWS at least because resize is pretty slow there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied azure disk resize operation could be completed in a few seconds, while Azure does not support resize when that disk is attached to a running VM, GCE and AWS supports that, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, shouldn't ResizeDisk call then check if volume is attached somewhere before allowing volume resize operation? Expand controller does not have any kind of fencing to prevent volumes from being resused when mounted on nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied if that disk is already attached to a running VM, the API call will fail with error, eventually ResizeDisk will return error, that's expected case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about Azure tbh, but I have found that it is less costly to make a GET request and return early than make a POST request and handle error and return (cloudproviders seem to have different quotas for mutable and immutable API requests).

And as far as I can tell, you are already fetching disk state from cloudprovider - https://github.com/kubernetes/kubernetes/pull/64386/files#diff-db9a5ad5d2cc7740ca2b73ad9d904fa2R144, so if we can ascertain that volume is attached, we should return error before even attempting to call resize.

This is not a blocker though, I know very little about Azure - so take my comment with a pinch of salt. :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied thanks for the check, while according to https://docs.microsoft.com/en-us/rest/api/compute/disks/get, the returned value DiskProperties.ProvisioningState is not sufficient to tell whether disk is in Attached state or not, we need another GET request to get diskState. So for this PR, I think it's ok to do resize directly.

@feiskyer
Copy link
Member

After user run sudo resize2fs /dev/sdf in agent node, /mnt/disk becomes 10GB now:

Is this user operation required? why not resizeFilesystem being called?

Copy link
Member Author

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feiskyer addressed your comments, PTAL

@@ -131,3 +135,38 @@ func (c *ManagedDiskController) getDisk(diskName string) (string, string, error)

return "", "", err
}

// ResizeDisk Expand the disk to new size
func (c *ManagedDiskController) ResizeDisk(diskName string, oldSize resource.Quantity, newSize resource.Quantity) (resource.Quantity, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws & gce won't report error, let's be consistent with that

return oldSize, err
}

return diskController.ResizeDisk(spec.PersistentVolume.Spec.AzureDisk.DiskName, oldSize, newSize)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the logs in ManagedDiskController.ResizeDisk func

Copy link
Member Author

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feiskyer little change about pointer operation in golang, PTAL, thanks.

return oldSize, err
}

return diskController.ResizeDisk(spec.PersistentVolume.Spec.AzureDisk.DiskName, oldSize, newSize)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied thanks for the check, while according to https://docs.microsoft.com/en-us/rest/api/compute/disks/get, the returned value DiskProperties.ProvisioningState is not sufficient to tell whether disk is in Attached state or not, we need another GET request to get diskState. So for this PR, I think it's ok to do resize directly.

@gnufied
Copy link
Member

gnufied commented May 29, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 29, 2018
Copy link
Member

@feiskyer feiskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@andyzhangx
Copy link
Member Author

@deads2k PTAL, thanks.
You may only take a look at plugin/pkg/admission/storage/persistentvolume/resize/admission.go if you don't have time, thanks.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 2, 2018
fix comments

fix comments
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 3, 2018
@andyzhangx
Copy link
Member Author

just rebased.
@deads2k @brendandburns
PTAL, thanks.
You may only take a look at plugin/pkg/admission/storage/persistentvolume/resize/admission.go if you don't have time, thanks.

@feiskyer would you mark as v1.11 milestone, thanks.

@feiskyer
Copy link
Member

feiskyer commented Jun 4, 2018

/milestone v1.11

@k8s-ci-robot k8s-ci-robot added this to the v1.11 milestone Jun 4, 2018
@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 4, 2018
@feiskyer
Copy link
Member

feiskyer commented Jun 4, 2018

/status approved-for-milestone

@gnufied
Copy link
Member

gnufied commented Jun 4, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2018
@andyzhangx
Copy link
Member Author

/assign @brendandburns

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@andyzhangx @brendandburns @deads2k @feiskyer @gnufied @karataliu

Pull Request Labels
  • sig/azure: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/bug: Fixes a bug discovered during the current release.
Help

@brendandburns
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx, brendandburns, feiskyer, gnufied

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 5, 2018
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@echo-devnull
Copy link

echo-devnull commented Oct 8, 2020

I'm sorry to say that this does not fully work when combined with operators who take control of pods.

For instance: Strimzi kafka operator does not allow me to scale down the pods of 1 statefulset to 0. It will immediately start a new pod because it's trying to keep the cluster "up" . That's it's job as an operator.

So seamless expansion of volumes perhaps should be next on the list ?

Or at least change ALLOWVOLUMEEXPANSION to false? That will keep people from making wrong assumptions in the future ;-)

@lebenitza
Copy link

@markmaas , I have the exactly same problem with strimzi, you can essentially do all that by scaling down the operator to 0 too but that's not the workaround I want to do from a platform like Azure... So essentially the thing you guys implemented creates downtime even if you like it or not... I cannot take my pods one by one and rollout the new PVC size.

If a pod is killed and the underlying PVC has a resize pending, postpone the attachment, make sure to detach the volume, resize and let the pod attach it again...

Like @markmaas is saying, if we knew clearly PVC resize is not supported by not allowing the storage class the allowVolumeExpansion flag and what is more, defaulting it on true, we would have not deploy such a thing in AKS or Azure for that matter.

But I guess Azure is known for creating more problems than actual fixing so that's on us.

@shan0809
Copy link

@markmaas , I have the exactly same problem with strimzi, you can essentially do all that by scaling down the operator to 0 too but that's not the workaround I want to do from a platform like Azure... So essentially the thing you guys implemented creates downtime even if you like it or not... I cannot take my pods one by one and rollout the new PVC size.

If a pod is killed and the underlying PVC has a resize pending, postpone the attachment, make sure to detach the volume, resize and let the pod attach it again...

Like @markmaas is saying, if we knew clearly PVC resize is not supported by not allowing the storage class the allowVolumeExpansion flag and what is more, defaulting it on true, we would have not deploy such a thing in AKS or Azure for that matter.

But I guess Azure is known for creating more problems than actual fixing so that's on us.

also i think expansion feature is pretty smooth in GCP and AWS. Cummon Azure, you can do better :)

@Morriz
Copy link

Morriz commented Apr 11, 2021

Why was this accepted? This PR does not satisfy k8s supported workflow that supports disk resize on a Running VM.

@AlexGoris-KasparSolutions

Ran into the same issue as the last 2-3 comments have pointed out, having to scale down any deployments/statefulsets before the disk resize will go through is really a shame and defeats the purpose of having an orchestrator which is supposed to manage these things for me. In case an operator is involved (in my case Elastic ECK operator), like @lebenitza has pointed out (thanks for the tip!), you also have to scale down the operator to avoid it scaling the deployment/stateful set up again before the resize has completed.

Essentially this means that resizing a PVC is not possible without complete downtime and unavailability of involved services, which strictly is not stated as a requirement for having `` set to true (docs), but I believe that is what the kubernetes devs intended to communicate with this flag, and I'll open an issue on the kubernetes github to ask whether this is the case and should be updated in the docs.

@andyzhangx
Copy link
Member Author

https://docs.microsoft.com/en-us/azure/aks/azure-disk-csi#resize-a-persistent-volume-without-downtime

Currently, Azure disk CSI driver supports resizing PVCs without downtime on specific regions. Follow this link to register the disk online resize feature. If your cluster is not in the supported region list, you need to delete application first to detach disk on the node before expanding PVC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add PV resize support for azure disk