Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for no-downtime disk resizes. #17245

Merged
merged 21 commits into from Nov 9, 2022

Conversation

kazimierzbudzyk
Copy link
Contributor

@kazimierzbudzyk kazimierzbudzyk commented Jun 14, 2022

Implements #17240.

Enables live disk resizes in all cases where it's supported. Feature flag defaulted to true is left in allowing customers to opt-out if needed.

Copy link
Member

@tombuildsstuff tombuildsstuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @kazimierzbudzyk

Thanks for this PR - taking a look through here whilst this is off to a good start we need to think about how this should work long-term - since it likely makes sense for this to be enabled by default, I've reached out to the Service Team to get some more details on this Preview feature, which'll determine how we can proceed here.

From our side, I think this makes the most sense to be enabled by default, meaning that this may want to wait for GA - but as mentioned above I've reached out to the Service Team which should allow us to determine how to proceed here.

Thanks!

Comment on lines 266 to 269
"no_downtime_resize_enabled": {
Type: pluginsdk.TypeBool,
Optional: true,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a behavioural feature, this'd need to be added to the features block rather than being enabled against the resource - although I think we actually want to enable this functionality by default where possible once this goes GA - as such I'm not sure this makes sense as a behavioural feature here, where we can instead wait for this to hit GA first and then enable this by default?

We'll also need to update the shutDownOnResize method below to check the disk type too, since this is only supported for Data Disks and not OS Disks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @tombuildsstuff! Implementing your feedback...

This feature is something my team could use immediately (and likely a lot of other users). If possible starting with it being opt-in while in preview would be super helpful to us, especially if GA will take a bit more. After GA we definitely should flip the default of the feature flag as it will make a lot of operations much easier.

@github-actions github-actions bot added size/M and removed size/XS labels Jun 15, 2022
func shutDownOnResize(oldSizeGB, newSizeGB int) bool {
func shutDownOnResize(disk compute.Disk, oldSizeGB, newSizeGB int) bool {
// OS disks can't be expanded without downtime.
if disk.OsType != "" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tombuildsstuff - let me know if you have suggestions for a better way of detecting data disks.

@kazimierzbudzyk kazimierzbudzyk marked this pull request as ready for review June 15, 2022 12:04
@kazimierzbudzyk
Copy link
Contributor Author

Hi @tombuildsstuff,

Did you hear anything back about the GA timeline for the live-resize feature? Really looking forward to this feature being available in the provider, either as preview feature, or default behaviour post GA. I have it deployed through forked terraform-provider-azurerm to some of my test environments and it works very well.

Thanks,
Kaz

@kazimierzbudzyk
Copy link
Contributor Author

@tombuildsstuff Looks like the feature was just moved to GA according to https://azure.microsoft.com/en-us/updates/generally-available-live-resize-for-premium-ssd-and-standard-ssd-disk-storage/. I flipped default of the feature flag to true, but keeping the flag to allow users disable as it's a substantial change of behaviour.

@github-actions github-actions bot added size/L and removed size/M labels Jul 22, 2022
@kazimierzbudzyk
Copy link
Contributor Author

Announcement got pulled back - MicrosoftDocs/azure-docs#96112.

@kazimierzbudzyk
Copy link
Contributor Author

@tombuildsstuff - with GA re-announcement, is this PR good to be merged now?

@@ -150,6 +156,12 @@ The `log_analytics_workspace` block supports the following:

---

The `managed_disk` block supports the following:

* `no_downtime_resize` (Optional) Specifies if no-downtime resizes are enabled for the managed disk resources. Defaults to `true`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use the same name as azure does for this feature enable_live_resize or something like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed - digging around this it appears that Expand Without Downtime is the final name for this feature? I'd argue that live_resize_data_disks_where_possible (or something) would be clearer where possible, but since this is only for expanding (and not shrinking, which Azure doesn't support regardless of if the associated VM is online/offline) - I think it's probably worth making this expand_without_downtime for now?

@tombuildsstuff tombuildsstuff self-assigned this Oct 17, 2022
kazimierzbudzyk and others added 3 commits October 17, 2022 09:08
This extends the work done by @kazimierzbudzyk to ensure we check the Virtual Machine and Managed Disk to confirm
that they support Expand Without Downtime (and fallback to shutting the machine down if not).

Unfortunately we're unable to output a warning either way at this time (since that'd require Framework) - as such
this PR includes adding notes to the documentation calling out we'll best-effort this.
Copy link
Member

@tombuildsstuff tombuildsstuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @kazimierzbudzyk

Apologies for the delayed review here, as mentioned in #17240 we believed that this change needed to be threaded through in a number of places (including the Linux/Windows Virtual Machine, the Linux/Orchestrated/Windows Virtual Machine Scale Set resources and the Data Disk Attachment resource) - however digging into this, this doesn't appear to be the case, so the scope of this change is fine.

Digging into the changes in this PR - whilst on the whole this is looking pretty good we're going to need to make 3 changes to be able to get this merged:

  1. Rename this feature Expand Without Downtime (since that appears to be the GA name for this feature)
  2. Based on the documentation for the Linux Virtual Machine and Windows Virtual Machine resources, we need to check the Virtual Machine that we're attached to to ensure that it supports Expand Without Downtime
  3. Add an acceptance test covering this resource so that we can confirm this actually does Expand Without Downtime without causing a restart of the Virtual Machine.

As such I hope you don't mind but I'm going to push a couple of commits to make those changes so that we can get this merged - once that's done I'll ask @manicminer to take a look and then we should be good to go 👍

Apologies again for the delay reviewing this one, since we believed this change was applicable in a number of resources we wanted to ensure this was behaviourally consistent across the provider - but I think we should be good to go with those changes 👍

Thanks!

MaxItems: 1,
Elem: &pluginsdk.Resource{
Schema: map[string]*pluginsdk.Schema{
"no_downtime_resize": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears the product name for this has settled on Expand Without Downtime as such we should make this:

Suggested change
"no_downtime_resize": {
"expand_without_downtime": {

Comment on lines 1007 to 1028
func shutDownOnResize(disk *disks.Disk, oldSizeGB, newSizeGB int) bool {
// OS disks can't be expanded without downtime.
if *disk.Properties.OsType != "" {
return true
}
// Disks smaller than 4 TiB can't be expanded to 4 TiB or larger without downtime.
if oldSizeGB < 4096 && newSizeGB >= 4096 {
return true
}
// Only Premium SSD v1 and Standard SSD disks support live resize
for _, diskType := range []disks.DiskStorageAccountTypes{
disks.DiskStorageAccountTypesPremiumLRS,
disks.DiskStorageAccountTypesPremiumZRS,
disks.DiskStorageAccountTypesStandardSSDLRS,
disks.DiskStorageAccountTypesStandardSSDZRS,
} {
if *disk.Sku.Name == diskType {
return false
}
}
return true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the documentation for both Linux and Windows, we also need to check the VM that we're attached too, including that it supports Expand without Downtime (https://learn.microsoft.com/azure/virtual-machines/linux/expand-disks?tabs=azure-cli%2Cubuntu#expand-without-downtime / https://learn.microsoft.com/azure/virtual-machines/windows/expand-os-disk#expand-without-downtime) - so we'll need to first retrieve the Managed Disk, confirm it's a Virtual Machine, and then retrieve the Virtual Machine to confirm it supports this (else we'll need to fallback as we're doing here)

@@ -150,6 +156,12 @@ The `log_analytics_workspace` block supports the following:

---

The `managed_disk` block supports the following:

* `no_downtime_resize` (Optional) Specifies if no-downtime resizes are enabled for the managed disk resources. Defaults to `true`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed - digging around this it appears that Expand Without Downtime is the final name for this feature? I'd argue that live_resize_data_disks_where_possible (or something) would be clearer where possible, but since this is only for expanding (and not shrinking, which Azure doesn't support regardless of if the associated VM is online/offline) - I think it's probably worth making this expand_without_downtime for now?

@kazimierzbudzyk
Copy link
Contributor Author

@tombuildsstuff thanks for the review! Absolutely feel free to take over, really keen to see this released!

Thanks,
Kaz

Copy link
Member

@manicminer manicminer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 💻

@manicminer
Copy link
Member

Test results

Screen Shot 2022-11-09 at 23 13 46

@manicminer manicminer merged commit 944993e into hashicorp:main Nov 9, 2022
@github-actions github-actions bot added this to the v3.31.0 milestone Nov 9, 2022
manicminer added a commit that referenced this pull request Nov 9, 2022
@kazimierzbudzyk kazimierzbudzyk deleted the kb/live-disk-resize branch November 10, 2022 11:49
@github-actions
Copy link

This functionality has been released in v3.31.0 of the Terraform Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants