-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add azuredisk PV size grow feature #64386
Add azuredisk PV size grow feature #64386
Conversation
/assign @deads2k |
@@ -131,3 +135,38 @@ func (c *ManagedDiskController) getDisk(diskName string) (string, string, error) | |||
|
|||
return "", "", err | |||
} | |||
|
|||
// ResizeDisk Expand the disk to new size | |||
func (c *ManagedDiskController) ResizeDisk(diskName string, oldSize resource.Quantity, newSize resource.Quantity) (resource.Quantity, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
report error early if oldSize > newSize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws & gce won't report error, let's be consistent with that
return oldSize, err | ||
} | ||
|
||
return diskController.ResizeDisk(spec.PersistentVolume.Spec.AzureDisk.DiskName, oldSize, newSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add a info log here with diskname, oldSize and newSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the logs in ManagedDiskController.ResizeDisk
func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How fast is resize operation on Azure btw? If it is slow and controller crashes while resize is in progress, I am wondering if there is a way to query the volume and find out if resize is in progress with cloudprovider and return early from here rather than calling resize again.
I know GCE and AWS does not implement this mechanism currently but I am going to change it for AWS at least because resize is pretty slow there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnufied azure disk resize operation could be completed in a few seconds, while Azure does not support resize when that disk is attached to a running VM, GCE and AWS supports that, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, shouldn't ResizeDisk
call then check if volume is attached somewhere before allowing volume resize operation? Expand controller does not have any kind of fencing to prevent volumes from being resused when mounted on nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnufied if that disk is already attached to a running VM, the API call will fail with error, eventually ResizeDisk
will return error, that's expected case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about Azure tbh, but I have found that it is less costly to make a GET
request and return early than make a POST
request and handle error and return (cloudproviders seem to have different quotas for mutable and immutable API requests).
And as far as I can tell, you are already fetching disk state from cloudprovider - https://github.com/kubernetes/kubernetes/pull/64386/files#diff-db9a5ad5d2cc7740ca2b73ad9d904fa2R144, so if we can ascertain that volume is attached, we should return error before even attempting to call resize.
This is not a blocker though, I know very little about Azure - so take my comment with a pinch of salt. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnufied thanks for the check, while according to https://docs.microsoft.com/en-us/rest/api/compute/disks/get, the returned value DiskProperties.ProvisioningState
is not sufficient to tell whether disk is in Attached
state or not, we need another GET
request to get diskState
. So for this PR, I think it's ok to do resize directly.
Is this user operation required? why not resizeFilesystem being called? |
5344d94
to
f5abe2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@feiskyer addressed your comments, PTAL
@@ -131,3 +135,38 @@ func (c *ManagedDiskController) getDisk(diskName string) (string, string, error) | |||
|
|||
return "", "", err | |||
} | |||
|
|||
// ResizeDisk Expand the disk to new size | |||
func (c *ManagedDiskController) ResizeDisk(diskName string, oldSize resource.Quantity, newSize resource.Quantity) (resource.Quantity, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws & gce won't report error, let's be consistent with that
return oldSize, err | ||
} | ||
|
||
return diskController.ResizeDisk(spec.PersistentVolume.Spec.AzureDisk.DiskName, oldSize, newSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the logs in ManagedDiskController.ResizeDisk
func
f5abe2e
to
e280240
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@feiskyer little change about pointer operation in golang, PTAL, thanks.
return oldSize, err | ||
} | ||
|
||
return diskController.ResizeDisk(spec.PersistentVolume.Spec.AzureDisk.DiskName, oldSize, newSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnufied thanks for the check, while according to https://docs.microsoft.com/en-us/rest/api/compute/disks/get, the returned value DiskProperties.ProvisioningState
is not sufficient to tell whether disk is in Attached
state or not, we need another GET
request to get diskState
. So for this PR, I think it's ok to do resize directly.
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
@deads2k PTAL, thanks. |
fix comments fix comments
e280240
to
880b7a3
Compare
just rebased. @feiskyer would you mark as v1.11 milestone, thanks. |
/milestone v1.11 |
/status approved-for-milestone |
/lgtm |
/assign @brendandburns |
[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process @andyzhangx @brendandburns @deads2k @feiskyer @gnufied @karataliu Pull Request Labels
|
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andyzhangx, brendandburns, feiskyer, gnufied The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here. |
I'm sorry to say that this does not fully work when combined with operators who take control of pods. For instance: Strimzi kafka operator does not allow me to scale down the pods of 1 statefulset to 0. It will immediately start a new pod because it's trying to keep the cluster "up" . That's it's job as an operator. So seamless expansion of volumes perhaps should be next on the list ? Or at least change |
@markmaas , I have the exactly same problem with strimzi, you can essentially do all that by scaling down the operator to 0 too but that's not the workaround I want to do from a platform like Azure... So essentially the thing you guys implemented creates downtime even if you like it or not... I cannot take my pods one by one and rollout the new PVC size. If a pod is killed and the underlying PVC has a resize pending, postpone the attachment, make sure to detach the volume, resize and let the pod attach it again... Like @markmaas is saying, if we knew clearly PVC resize is not supported by not allowing the storage class the allowVolumeExpansion flag and what is more, defaulting it on true, we would have not deploy such a thing in AKS or Azure for that matter. But I guess Azure is known for creating more problems than actual fixing so that's on us. |
also i think expansion feature is pretty smooth in GCP and AWS. Cummon Azure, you can do better :) |
Why was this accepted? This PR does not satisfy k8s supported workflow that supports disk resize on a Running VM. |
Ran into the same issue as the last 2-3 comments have pointed out, having to scale down any deployments/statefulsets before the disk resize will go through is really a shame and defeats the purpose of having an orchestrator which is supposed to manage these things for me. In case an operator is involved (in my case Elastic ECK operator), like @lebenitza has pointed out (thanks for the tip!), you also have to scale down the operator to avoid it scaling the deployment/stateful set up again before the resize has completed. Essentially this means that resizing a PVC is not possible without complete downtime and unavailability of involved services, which strictly is not stated as a requirement for having `` set to |
Currently, Azure disk CSI driver supports resizing PVCs without downtime on specific regions. Follow this link to register the disk online resize feature. If your cluster is not in the supported region list, you need to delete application first to detach disk on the node before expanding PVC. |
What this PR does / why we need it:
According to kubernetes/enhancements#284, add size grow feature for azure disk
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #56463
Special notes for your reviewer:
Deployment
/StatefulSet
, make sure the disk is inunattached
state and runkubectl edit pvc ...
operation again.How to use azure disk size grow feature
kubernetes.io/azure-disk
storage class withallowVolumeExpansion: true
(default is false)kubectl edit pvc pvc-azuredisk
operation, pls make sure this PVC is not mounted by any pod(change the replica count to 0, this will terminate the pod and detach the disk, need wait a few minutes) otherwise there would be resize error. Now runkubectl edit pvc pvc-azuredisk
to change azuredisk PVC size from 6GB to 10GBkubectl describe pvc pvc-azuredisk
to check PVC status:Note: volume expansion feature is beta in v1.11
Release note:
/sig azure
/assign @feiskyer @karataliu @gnufied
cc @khenidak