Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP for Azure Availability Zones #2364

Merged
merged 1 commit into from Jul 25, 2018

Conversation

@feiskyer
Copy link
Member

feiskyer commented Jul 11, 2018

The KEP adds Azure Availability Zones support to Kubernetes, which includes:

  • Detect availability zones automatically when registering new nodes and node's label failure-domain.beta.kubernetes.io/zone will be replaced with AZ instead of fault domain
  • LoadBalancer and PublicIP will be provisioned with zone redundant
  • GetLabelsForVolume interface will be implemented for Azure managed disks and it will also be added to PersistentVolumeLabel admission controller so as to support DynamicProvisioningScheduling

/sig azure
/sig storage

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Jul 11, 2018


### PersistentVolumeLabel

Besides PVLabeler interface, [PersistentVolumeLabel](https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/admission/storage/persistentvolume/label/admission.go) admission controller should also updated with AzureDisk support, so that new PVs could be applied with above labels automatically.

This comment has been minimized.

@msau42

msau42 Jul 11, 2018

Member

This admission controller is deprecated now and I believe its replacement is in the cloud controller manager

This comment has been minimized.

@feiskyer

feiskyer Jul 11, 2018

Author Member

Sorry, didn't notice this has been deprecated. Cloud controller manager will GA in v1.13 or v1.14. What's the replacement before its GA?

This comment has been minimized.

@msau42

msau42 Jul 11, 2018

Member

I'm not sure if there is a replacement in the meantime, but both methods should be implemented

This comment has been minimized.

@feiskyer

feiskyer Jul 12, 2018

Author Member

@msau42 you're right,

  • if cloud-controller-manager is used, then that is actually PVLabeler interface which is used persistent volume label controller
  • or else, this admission controller is used.

Will add this info to both this and PVLabler part.

parameters:
kind: Managed
storageaccounttype: Premium_LRS
zone: "centralus-1"

This comment has been minimized.

@msau42

msau42 Jul 11, 2018

Member

The new topology aware dynamic provisioning feature adds a new allowedTopologies field to Storageclass which is intended to replace the zone/zones parameters.

This comment has been minimized.

@feiskyer

feiskyer Jul 11, 2018

Author Member

Topology-aware dynamic provisioning feature is still alpha in v1.12 if I'm not wrong, so these two parameters are still required before its GA, right?

And thanks of reminding this. allowedTopologies feature should also be added here.

This comment has been minimized.

@msau42

msau42 Jul 11, 2018

Member

Correct just wanted to make sure you know that this area is changing and as part of beta, we want to get all the 3 in tree cloud provider plugins to start moving to the new method as well.


## AzureDisk

When Azure managed disks are created, the `PersistentVolumeLabel` admission controller automatically adds zone labels to them. The scheduler (via `VolumeZonePredicate`) will then ensure that pods that claim a given volume are only placed into the same zone as that volume, as volumes cannot be attached across zones.

This comment has been minimized.

@msau42

msau42 Jul 11, 2018

Member

Eventually the VolumeZone predicate will be deprecated and replaced by PV.NodeAffinity


The proposal includes required changes to support availability zones for various functions in Azure cloud provider and AzureDisk volumes:

- Detect availability zones automatically when registering new nodes and node's label `failure-domain.beta.kubernetes.io/zone` will be replaced with AZ instead of fault domain

This comment has been minimized.

@liggitt

liggitt Jul 11, 2018

Member

rather than having the kubelet self-register this, can the node controller pull this information and add it into the node object?


## Node registration

When nodes are started, kubelet automatically adds labels to them with region and zone information:

This comment has been minimized.

@liggitt

liggitt Jul 11, 2018

Member

allowing nodes to self-label has been problematic (enumerated in #911). is there a reason we wouldn't have the node controller pull this info from the cloud provider and manage the labels on the node objects?

This comment has been minimized.

@liggitt

This comment has been minimized.

@feiskyer

feiskyer Jul 11, 2018

Author Member

there're actually two ways supported today:

  • Without cloud-controller-manager (kubelet --cloud-privider=azure), kubelet registers itself with zone labels
  • With cloud-controller-manager (kubelet --cloud-provider=external), node controller adds zone lables

Until cloud-controller-manager GA (probably v1.13 or v1.14), both way should be supported. I should probably state this more clearly.

This comment has been minimized.

@liggitt

liggitt Jul 11, 2018

Member

Until cloud-controller-manager GA (probably v1.13 or v1.14), both way should be supported

I agree we need to support the old way until we can transition away from it, but I don't think we should expand it to new labels if possible

This comment has been minimized.

@feiskyer

feiskyer Jul 12, 2018

Author Member

I agree we need to support the old way until we can transition away from it, but I don't think we should expand it to new labels if possible

Agreed. this is only for existing zone labels.

This comment has been minimized.

@liggitt

liggitt Jul 12, 2018

Member

Agreed. this is only for existing zone labels.

great, thanks for confirming

@thockin
Copy link
Member

thockin left a comment

It's not clear to me - is this proposing changes to Kubernetes concepts or adding new labels, or is it just changing the Azure implementation with no impact to the rest of the system?

I'm not sure that changes that stay within the azure implementation need a KEP?


The proposal includes required changes to support availability zones for various functions in Azure cloud provider and AzureDisk volumes:

- Detect availability zones automatically when registering new nodes and node's label `failure-domain.beta.kubernetes.io/zone` will be replaced with AZ instead of fault domain

This comment has been minimized.

@thockin

thockin Jul 11, 2018

Member

I don't understand the statement. You want to replace a standard label with ... what?

This comment has been minimized.

@feiskyer

feiskyer Jul 12, 2018

Author Member

Azure is setting fault domain in label failure-domain.beta.kubernetes.io/zone today. It will be replaced with AZ. Fault domain and AZ are different on Azure.


Currently, Azure nodes are registered with label `failure-domain.beta.kubernetes.io/zone=faultDomain`.

The format of fault domain is numbers (e.g. `1` or `2`), which is in same format with AZ (e.g. `1` or `3`). If AZ is using same format with faultDomain, then there'll be scheduler issues for clusters with both AZ and non-AZ nodes. So AZ will use a different format in kubernetes: `<region>-<AZ>`, e.g. `centralus-1`.

This comment has been minimized.

@thockin

thockin Jul 11, 2018

Member

Maybe you should lead in with some details about what AZ means and what faultDomain means? Why are they not interchangeable terms (they are both pretty commonly used to mean the same thing)

This comment has been minimized.

@feiskyer

feiskyer Jul 12, 2018

Author Member

Good catch. This should be definitely addressed.

@feiskyer feiskyer force-pushed the feiskyer:kep-azure-az branch 2 times, most recently from 7a9eb71 to f03f856 Jul 12, 2018

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Jul 12, 2018

Added a new commit with topology-aware provisioning and difference between fault domain and availability zones.

@brendandburns

This comment has been minimized.

Copy link
Contributor

brendandburns commented Jul 12, 2018

Hey @thockin good question. The KEP process is somewhat vague on what constitutes a KEP:

https://kubernetes.io/docs/imported/community/keps/#what-type-of-work-should-be-tracked-by-a-kep

I think that it is clearly within the scope of a SIG to track something as a KEP if they want too:

"As the local bodies of governance, SIGs should have broad latitude in describing what constitutes an enhancement which should be tracked through the KEP process."

But it's also clear that I don't think that this meets the guidelines for things that are required to be a KEP:

"any technical effort (refactoring, major architectural change) that will impact a large section of the development community should also be communicated widely. The KEP process is suited for this even if it will have zero impact on the typical user or operator."

...

"Enhancements that have major impacts on multiple SIGs should use the KEP process."

So I guess given that this KEP has been started, we may as well finish it, but going forward SIG-Azure will develop guidelines for what should and should not be a KEP

"SIGs may find that helpful to enumerate what does not require a KEP rather than what does."

Sound good?

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 12, 2018

@brendandburns / @thockin --

So from a Features Maintainer perspective, Feature Owners are required to submit design proposals for units of work that qualify as Features i.e., those tracked within k/features. The feature submission for Azure AZs can be found here: kubernetes/enhancements#586
I'd expect to see an uptick in KEPs across release if contributors are following the Features submission process.

From SIG Azure Chair perspective, anything that moves us along the line of transparency w.r.t. SIG roadmap is a step in the right direction.
One thing that I appreciate about what's currently defined in the KEP metadata is that we consider scope; whether a KEP is kubernetes-wide, multi-SIG, single SIG, which lends itself to considering KEPs for what might be considered a smaller feature.

That said, I'm happy to firm up guidelines on expectations for KEP submission in SIG Azure! :)

@jdumars jdumars added this to Backlog in KEP Tracking Jul 12, 2018

@@ -139,13 +146,13 @@ Note that zonal PublicIPs are not supported. We may add this easily if there’r

## AzureDisk

When Azure managed disks are created, the `PersistentVolumeLabel` admission controller automatically adds zone labels to them. The scheduler (via `VolumeZonePredicate`) will then ensure that pods that claim a given volume are only placed into the same zone as that volume, as volumes cannot be attached across zones.
When Azure managed disks are created, the `PersistentVolumeLabel` admission controller or PV controller automatically adds zone labels to them. The scheduler (via `VolumeZonePredicate` or `PV.NodeAffinity` in the future) will then ensure that pods that claim a given volume are only placed into the same zone as that volume, as volumes cannot be attached across zones.

This comment has been minimized.

@msau42

msau42 Jul 12, 2018

Member

PV controller is actually the "PV label controller in cloud-controller-manager" (not to be be confused with the PV controller in kube-controller-manager).

The admission controller needs to be updated to also add PV.NodeAffinity in addition to the zone labels.

This comment has been minimized.

@feiskyer

feiskyer Jul 13, 2018

Author Member

Yep, exactly. Updated with that information.

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Jul 13, 2018

@thockin @brendandburns @justaugustus Thanks for clarification for KEP process. As Brendan's suggestion, could we continue the KEP and develop guidelines in sig-azure?

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 13, 2018

Absolutely @feiskyer. Let's keep this going!

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Jul 19, 2018

@thockin @brendandburns @justaugustus @ddebroy Addressed comments and squashed commits. PTAL

@ddebroy
Copy link
Member

ddebroy left a comment

Thank you @feiskyer . /lgtm

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 20, 2018

Overall LGTM, but I'd like to hold pending a review from Kal or Cole as well.
/lgtm
/hold
/cc @khenidak @colemickens

@khenidak

This comment has been minimized.

Copy link

khenidak commented Jul 20, 2018

/LGTM

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 20, 2018

Awesome, thanks for the review, Kal!
@feiskyer -- thanks for putting this together! :)

/hold cancel

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 20, 2018

Ooooooo, I'm going to need to add an OWNERS for the sig-azure folder.
Will handle next week.

@khenidak

This comment has been minimized.

Copy link

khenidak commented Jul 21, 2018

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 21, 2018

Ooooooo, I'm going to need to add an OWNERS for the sig-azure folder.
Will handle next week.

Fixed in #2399

/approve

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 21, 2018

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 25, 2018

@jdumars @calebamiles @idvoretskyi -- could you /approve this?
It's approved from the SIG Azure side.

@idvoretskyi
Copy link
Member

idvoretskyi left a comment

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jul 25, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: idvoretskyi, justaugustus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit eb02547 into kubernetes:master Jul 25, 2018

3 checks passed

cla/linuxfoundation feiskyer authorized
Details
pull-community-verify Job succeeded.
Details
tide In merge pool.
Details
@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 25, 2018

Thanks @idvoretskyi 🎉

@feiskyer feiskyer deleted the feiskyer:kep-azure-az branch Jul 26, 2018

@feiskyer

This comment has been minimized.

Copy link
Member Author

feiskyer commented Jul 26, 2018

Thanks @idvoretskyi and @justaugustus

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request Aug 6, 2018

Kubernetes Submit Queue
Merge pull request #66553 from feiskyer/azure-disk-availablity-zone
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add availability zones support to Azure managed disks

**What this PR does / why we need it**:

Continue of [Azure Availability Zone feature](kubernetes/enhancements#586).

This PR adds availability zone support for Azure managed disks and its storage class. Zoned managed disks is enabled by default if there are zoned nodes in the cluster.

The zone could also be customized by `zone` or `zones` parameter, e.g.

```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
  name: managed-disk-zone-1
parameters:
  zone: "southeastasia-1"
  # zones: "southeastasia-1,"southeastasia-2"
  cachingmode: None
  kind: Managed
  storageaccounttype: Standard_LRS
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Delete
volumeBindingMode: Immediate
```

All zoned AzureDisk PV will also be labeled with its availability zone, e.g.

```sh
$ kubectl get pvc pvc-azuredisk-az-1
NAME                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          AGE
pvc-azuredisk-az-1   Bound     pvc-5ad0c7b8-8f0b-11e8-94f2-000d3a07de8c   5Gi        RWO            managed-disk-zone-1   2h

$ kubectl get pv pvc-5ad0c7b8-8f0b-11e8-94f2-000d3a07de8c --show-labels
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                        STORAGECLASS          REASON    AGE       LABELS
pvc-5ad0c7b8-8f0b-11e8-94f2-000d3a07de8c   5Gi        RWO            Delete           Bound     default/pvc-azuredisk-az-1   managed-disk-zone-1             2h        failure-domain.beta.kubernetes.io/region=southeastasia,failure-domain.beta.kubernetes.io/zone=southeastasia-1
```

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

See also the [KEP](kubernetes/community#2364).

DynamicProvisioningScheduling feature would be added in a following PR.

**Release note**:

```release-note
Azure managed disks now support availability zones and new parameters `zoned`, `zone` and `zones` are added for AzureDisk storage class.
```

/kind feature
/sig azure
/assign @brendandburns @khenidak @andyzhangx

calebamiles pushed a commit to calebamiles/community that referenced this pull request Sep 5, 2018

> - Availability Zones are unique physical locations within an Azure region. Each zone is made up of one or more data centers equipped with independent power, cooling, and networking.
>
> An Availability Zone in an Azure region is a combination of a fault domain and an update domain (Same like FD, but for updates. When upgrading a deployment, it is carried out one update domain at a time). For example, if you create three or more VMs across three zones in an Azure region, your VMs are effectively distributed across three fault domains and three update domains.

This comment has been minimized.

@krmayankk

krmayankk Jan 22, 2019

Contributor

is there further classification like does an update domain have multiple fault domains ?

This comment has been minimized.

@feiskyer

feiskyer Jan 22, 2019

Author Member

No, each VM is assigned with a single update domain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.