New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Availability Zones #586

Open
feiskyer opened this Issue Jul 11, 2018 · 20 comments

Comments

Projects
None yet
10 participants
@feiskyer
Copy link
Member

feiskyer commented Jul 11, 2018

Feature Description

  • One-line feature description (can be used as a release note): Add support of Azure Availability Zones.
  • Primary contact (assignee): @feiskyer
  • Responsible SIGs: sig-azure
  • Design proposal link (community repo): kubernetes/community#2364
  • Link to e2e and/or unit tests:
  • Reviewer(s) - (for LGTM) recommend having 2+ reviewers (at least one from code-area OWNERS file) agreed to review. Reviewers from multiple companies preferred: @khenidak
  • Approver (likely from SIG/area to which feature belongs): @brendandburns
  • Feature target (which target equals to which milestone):
    • Alpha release target (1.12)
    • Beta release target (1.13)
    • Stable release target (x.y)
@feiskyer

This comment has been minimized.

Copy link
Member

feiskyer commented Jul 11, 2018

/sig azure
/sig storage

@feiskyer

This comment has been minimized.

Copy link
Member

feiskyer commented Jul 11, 2018

/milestone v1.12

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 18, 2018

@feiskyer --

It looks like this feature is currently in the Kubernetes 1.12 Milestone.

If that is still accurate, please ensure that this issue is up-to-date with ALL of the following information:

  • One-line feature description (can be used as a release note):
  • Primary contact (assignee):
  • Responsible SIGs:
  • Design proposal link (community repo):
  • Link to e2e and/or unit tests:
  • Reviewer(s) - (for LGTM) recommend having 2+ reviewers (at least one from code-area OWNERS file) agreed to review. Reviewers from multiple companies preferred:
  • Approver (likely from SIG/area to which feature belongs):
  • Feature target (which target equals to which milestone):
    • Alpha release target (x.y)
    • Beta release target (x.y)
    • Stable release target (x.y)

Set the following:

  • Description
  • Assignee(s)
  • Labels:
    • stage/{alpha,beta,stable}
    • sig/*
    • kind/feature

Once this feature is appropriately updated, please explicitly ping @justaugustus, @kacole2, @robertsandoval, @rajendar38 to note that it is ready to be included in the Features Tracking Spreadsheet for Kubernetes 1.12.


Please note that the Features Freeze is July 31st, after which any incomplete Feature issues will require an Exception request to be accepted into the milestone.

In addition, please be aware of the following relevant deadlines:

  • Docs deadline (open placeholder PRs): 8/21
  • Test case freeze: 8/28

Please make sure all PRs for features have relevant release notes included as well.

Happy shipping!

/kind feature
/stage alpha

k8s-merge-robot added a commit to kubernetes/kubernetes that referenced this issue Jul 20, 2018

Merge pull request #66242 from feiskyer/instance-az
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add initial availability zones support for Azure nodes

**What this PR does / why we need it**:

The first part of [Azure Availability Zone feature](kubernetes/enhancements#586).

This PR adds initial availability zone (AZ) support for Azure nodes. With this PR, Azure nodes with AZ will have label `failure-domain.beta.kubernetes.io/zone=<region>-<zoneID>`, e.g. `southeastasia-1`.

It also updates instance metadata api-version to 2017-12-01, which is required for AZ.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

VirtualMachineScaleSetVM doesn't have AZ info yet. It will be supported later after new Azure Go SDK releases.

**Release note**:

```release-note
Azure nodes with availability zone now will have label `failure-domain.beta.kubernetes.io/zone=<region>-<zoneID>`.
```

/kind feature
/sig azure

/assign @brendandburns @khenidak @andyzhangx

k8s-merge-robot added a commit to kubernetes/kubernetes that referenced this issue Aug 6, 2018

Merge pull request #66553 from feiskyer/azure-disk-availablity-zone
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add availability zones support to Azure managed disks

**What this PR does / why we need it**:

Continue of [Azure Availability Zone feature](kubernetes/enhancements#586).

This PR adds availability zone support for Azure managed disks and its storage class. Zoned managed disks is enabled by default if there are zoned nodes in the cluster.

The zone could also be customized by `zone` or `zones` parameter, e.g.

```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
  name: managed-disk-zone-1
parameters:
  zone: "southeastasia-1"
  # zones: "southeastasia-1,"southeastasia-2"
  cachingmode: None
  kind: Managed
  storageaccounttype: Standard_LRS
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Delete
volumeBindingMode: Immediate
```

All zoned AzureDisk PV will also be labeled with its availability zone, e.g.

```sh
$ kubectl get pvc pvc-azuredisk-az-1
NAME                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          AGE
pvc-azuredisk-az-1   Bound     pvc-5ad0c7b8-8f0b-11e8-94f2-000d3a07de8c   5Gi        RWO            managed-disk-zone-1   2h

$ kubectl get pv pvc-5ad0c7b8-8f0b-11e8-94f2-000d3a07de8c --show-labels
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                        STORAGECLASS          REASON    AGE       LABELS
pvc-5ad0c7b8-8f0b-11e8-94f2-000d3a07de8c   5Gi        RWO            Delete           Bound     default/pvc-azuredisk-az-1   managed-disk-zone-1             2h        failure-domain.beta.kubernetes.io/region=southeastasia,failure-domain.beta.kubernetes.io/zone=southeastasia-1
```

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

See also the [KEP](kubernetes/community#2364).

DynamicProvisioningScheduling feature would be added in a following PR.

**Release note**:

```release-note
Azure managed disks now support availability zones and new parameters `zoned`, `zone` and `zones` are added for AzureDisk storage class.
```

/kind feature
/sig azure
/assign @brendandburns @khenidak @andyzhangx

k8s-merge-robot added a commit to kubernetes/kubernetes that referenced this issue Aug 8, 2018

Merge pull request #67121 from feiskyer/azdisk-affinity
Automatic merge from submit-queue (batch tested with PRs 67061, 66589, 67121, 67149). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add DynamicProvisioningScheduling and VolumeScheduling support for Azure managed disks

**What this PR does / why we need it**:

Continue of [Azure Availability Zone feature](kubernetes/enhancements#586).

This PR adds `VolumeScheduling` and `DynamicProvisioningScheduling` support to Azure managed disks.

When feature gate `VolumeScheduling` disabled, no NodeAffinity set for PV:

```yaml
kubectl describe pv
Name:              pvc-d30dad05-9ad8-11e8-94f2-000d3a07de8c
Labels:            failure-domain.beta.kubernetes.io/region=southeastasia
                   failure-domain.beta.kubernetes.io/zone=southeastasia-2
Annotations:       pv.kubernetes.io/bound-by-controller=yes
                   pv.kubernetes.io/provisioned-by=kubernetes.io/azure-disk
                   volumehelper.VolumeDynamicallyCreatedByKey=azure-disk-dynamic-provisioner
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      default
Status:            Bound
Claim:             default/pvc-azuredisk
Reclaim Policy:    Delete
Access Modes:      RWO
Capacity:          5Gi
Node Affinity:
  Required Terms:
    Term 0:        failure-domain.beta.kubernetes.io/region in [southeastasia]
                   failure-domain.beta.kubernetes.io/zone in [southeastasia-2]
Message:
Source:
    Type:         AzureDisk (an Azure Data Disk mount on the host and bind mount to the pod)
    DiskName:     k8s-5b3d7b8f-dynamic-pvc-d30dad05-9ad8-11e8-94f2-000d3a07de8c
    DiskURI:      /subscriptions/<subscription-id>/resourceGroups/<rg-name>/providers/Microsoft.Compute/disks/k8s-5b3d7b8f-dynamic-pvc-d30dad05-9ad8-11e8-94f2-000d3a07de8c
    Kind:         Managed
    FSType:
    CachingMode:  None
    ReadOnly:     false
Events:           <none>
```

When feature gate `VolumeScheduling` enabled, NodeAffinity will be populated for PV:

```yaml
kubectl describe pv
Name:              pvc-0284337b-9ada-11e8-a7f6-000d3a07de8c
Labels:            failure-domain.beta.kubernetes.io/region=southeastasia
                   failure-domain.beta.kubernetes.io/zone=southeastasia-2
Annotations:       pv.kubernetes.io/bound-by-controller=yes
                   pv.kubernetes.io/provisioned-by=kubernetes.io/azure-disk
                   volumehelper.VolumeDynamicallyCreatedByKey=azure-disk-dynamic-provisioner
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      default
Status:            Bound
Claim:             default/pvc-azuredisk
Reclaim Policy:    Delete
Access Modes:      RWO
Capacity:          5Gi
Node Affinity:
  Required Terms:
    Term 0:        failure-domain.beta.kubernetes.io/region in [southeastasia]
                   failure-domain.beta.kubernetes.io/zone in [southeastasia-2]
Message:
Source:
    Type:         AzureDisk (an Azure Data Disk mount on the host and bind mount to the pod)
    DiskName:     k8s-5b3d7b8f-dynamic-pvc-0284337b-9ada-11e8-a7f6-000d3a07de8c
    DiskURI:      /subscriptions/<subscription-id>/resourceGroups/<rg-name>/providers/Microsoft.Compute/disks/k8s-5b3d7b8f-dynamic-pvc-0284337b-9ada-11e8-a7f6-000d3a07de8c
    Kind:         Managed
    FSType:
    CachingMode:  None
    ReadOnly:     false
Events:           <none>
```

When both  `VolumeScheduling` and `DynamicProvisioningScheduling` are enabled, storage class also supports `allowedTopologies` and `volumeBindingMode: WaitForFirstConsumer` for volume topology aware dynamic provisioning:

```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
  name: managed-disk-dynamic
parameters:
  cachingmode: None
  kind: Managed
  storageaccounttype: Standard_LRS
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: failure-domain.beta.kubernetes.io/zone
    values:
    - southeastasia-2
    - southeastasia-1
```

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
DynamicProvisioningScheduling and VolumeScheduling is not supported for Azure managed disks. Feature gates DynamicProvisioningScheduling and VolumeScheduling should be enabled before using this feature.
```

/kind feature
/sig azure
/cc @brendandburns @khenidak @andyzhangx
/cc @ddebroy @msau42 @justaugustus

hh pushed a commit to ii/kubernetes that referenced this issue Aug 15, 2018

Merge pull request kubernetes#67229 from feiskyer/unzoned-disks
Automatic merge from submit-queue (batch tested with PRs 66884, 67410, 67229, 67409). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add node affinity for Azure unzoned managed disks

**What this PR does / why we need it**:

Continue of [Azure Availability Zone feature](kubernetes/enhancements#586).

Add node affinity for Azure unzoned managed disks, so that unzoned disks only scheduled to unzoned nodes.

This is required because Azure doesn't allow attaching unzoned disks to zoned VMs.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

Unzoned nodes would label `failure-domain.beta.kubernetes.io/zone=0` and the value is fault domain ( while availability zone is used for zoned nodes). So fault domain is used to populate unzoned disks.

Since there are at most 3 fault domains in each region, the PR adds 3 terms for them:

```yaml
kubectl describe pv pvc-bdf93a67-9c45-11e8-ba6f-000d3a07de8c
Name:              pvc-bdf93a67-9c45-11e8-ba6f-000d3a07de8c
Labels:            <none>
Annotations:       pv.kubernetes.io/bound-by-controller=yes
                   pv.kubernetes.io/provisioned-by=kubernetes.io/azure-disk
                   volumehelper.VolumeDynamicallyCreatedByKey=azure-disk-dynamic-provisioner
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      azuredisk-unzoned
Status:            Bound
Claim:             default/unzoned-pvc
Reclaim Policy:    Delete
Access Modes:      RWO
Capacity:          5Gi
Node Affinity:     
  Required Terms:  
    Term 0:        failure-domain.beta.kubernetes.io/region in [southeastasia]
                   failure-domain.beta.kubernetes.io/zone in [0]
    Term 1:        failure-domain.beta.kubernetes.io/region in [southeastasia]
                   failure-domain.beta.kubernetes.io/zone in [1]
    Term 2:        failure-domain.beta.kubernetes.io/region in [southeastasia]
                   failure-domain.beta.kubernetes.io/zone in [2]
Message:           
Source:
    Type:         AzureDisk (an Azure Data Disk mount on the host and bind mount to the pod)
    DiskName:     k8s-5b3d7b8f-dynamic-pvc-bdf93a67-9c45-11e8-ba6f-000d3a07de8c
    DiskURI:      /subscriptions/<subscription>/resourceGroups/<rg-name>/providers/Microsoft.Compute/disks/k8s-5b3d7b8f-dynamic-pvc-bdf93a67-9c45-11e8-ba6f-000d3a07de8c
    Kind:         Managed
    FSType:       
    CachingMode:  None
    ReadOnly:     false
Events:           <none>
```

**Release note**:

```release-note
Add node affinity for Azure unzoned managed disks
```

/sig azure
/kind feature

/cc @brendandburns @khenidak @andyzhangx @msau42
@zparnold

This comment has been minimized.

Copy link
Member

zparnold commented Aug 20, 2018

Hey there! @feiskyer I'm the wrangler for the Docs this release. Is there any chance I could have you open up a docs PR against the release-1.12 branch as a placeholder? That gives us more confidence in the feature shipping in this release and gives me something to work with when we start doing reviews/edits. Thanks! If this feature does not require docs, could you please update the features tracking spreadsheet to reflect it?

@msau42

This comment has been minimized.

Copy link
Member

msau42 commented Aug 20, 2018

Doc placeholder for only the topology-aware dynamic provisioning part: kubernetes/website#9939 (not for the entire feature)

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Sep 5, 2018

@feiskyer --
Any update on docs status for this feature? Are we still planning to land it for 1.12?
At this point, code freeze is upon us, and docs are due on 9/7 (2 days).
If we don't here anything back regarding this feature ASAP, we'll need to remove it from the milestone.

cc: @zparnold @jimangel @tfogo

@feiskyer

This comment has been minimized.

Copy link
Member

feiskyer commented Sep 6, 2018

@justaugustus Yep, codes have been merged now and docs will be added to https://github.com/kubernetes/cloud-provider-azure

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Sep 6, 2018

Thanks for the update!

@claurence

This comment has been minimized.

Copy link

claurence commented Oct 5, 2018

Kubernetes 1.13 is going to be a 'stable' release since the cycle is only 10 weeks. We encourage no big alpha features and only consider adding this feature if you have a high level of confidence it will make code slush by 11/09. Are there plans for this enhancement to graduate to alpha/beta/stable within the 1.13 release cycle? If not, can you please remove it from the 1.12 milestone or add it to 1.13?

We are also now encouraging that every new enhancement aligns with a KEP. If a KEP has been created, please link to it in the original post. Please take the opportunity to develop a KEP

@feiskyer

This comment has been minimized.

Copy link
Member

feiskyer commented Oct 5, 2018

It's already part of KEP, kubernetes/community#2364. Let's keep it in v1.13.

/milestone v1.13

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.12, v1.13 Oct 5, 2018

@spiffxp

This comment has been minimized.

Copy link
Member

spiffxp commented Oct 17, 2018

@feiskyer what work is being done in k/k for this as part of v1.13? are there issues or PR's we could link to for tracking?

@feiskyer

This comment has been minimized.

Copy link
Member

feiskyer commented Oct 17, 2018

@spiffxp Yep, we're validating the use cases and hoping to fix issues if there are. But there're no PRs yet.

@AishSundar

This comment has been minimized.

Copy link

AishSundar commented Oct 17, 2018

@feiskyer does that mean you are essentially code complete for Alpha in 1.13 and currently validating the flow? or is there any known pending work?

@feiskyer

This comment has been minimized.

Copy link
Member

feiskyer commented Oct 17, 2018

does that mean you are essentially code complete for Alpha in 1.13 and currently validating the flow?

Alpha in 1.12 and beta in 1.13.

Seems state doesn't update yet.

/stage beta

@k8s-ci-robot k8s-ci-robot added stage/beta and removed stage/alpha labels Oct 17, 2018

@tfogo

This comment has been minimized.

Copy link
Member

tfogo commented Nov 3, 2018

Hi @feiskyer, will there be any updates to the docs necessary for 1.13? The deadline for placeholder PRs for the 1.13 release is November 8. So it's important to make a docs PR as soon as possible if needed.

@feiskyer

This comment has been minimized.

Copy link
Member

feiskyer commented Nov 6, 2018

No docs are required in this release. Thanks.

@kacole2

This comment has been minimized.

Copy link
Contributor

kacole2 commented Nov 7, 2018

@feiskyer is there a k/k PR we should be tracking yet going into code slush?

@AishSundar

This comment has been minimized.

Copy link

AishSundar commented Nov 12, 2018

From offline sync with @feiskyer, all PRs needed to go into k/k are done. Tests are being developed in a standalone repo https://github.com/kubernetes/cloud-provider-azure and are not bound by Code freeze

@claurence

This comment has been minimized.

Copy link

claurence commented Jan 14, 2019

@feiskyer Hello - I’m the enhancement’s lead for 1.14 and I’m checking in on this issue to see what work/status (if any) is being planned for the 1.14 release. Enhancements freeze is Jan 29th and I want to remind that all enhancements must have a KEP - it looks like this has this KEP https://github.com/kubernetes/enhancements/blob/master/keps/sig-azure/0018-20180711-azure-availability-zones.md - let me know if that is not accurate. Thanks!

@feiskyer

This comment has been minimized.

Copy link
Member

feiskyer commented Jan 15, 2019

yep, it's being tracked by the KEP 18, and it'll still be in beta stage in v1.14.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment