WIP: Implement OpenStack Nova (Compute) AZs #30757

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
8 participants
Contributor

kiall commented Aug 17, 2016

WIP - Looking for feedback.

  • Remove reading of Instance UUID from /var/lib/cloud, as we'll need
    to call the metadata service anyway to get the availability zone.

Q: I'd love to see this backported to 1.3, but I suspect this removal might prevent that.

  • Update and rename parseMetadataUUID and readInstanceID to handle
    both ID and AZ.

Q: The OpenStack provider is missing more data, e.g. instance type. These feel like they should be updated and implemented similarly to AWS's selfAwsInstance struct. But, again, as we would very much like to see a 1.3 backport, I've held off. If a 1.3 backport is off the table, I make the switch.

  • Update GetZone to set the FailureDomain to the Nova AZ.

We do not yet implement Cinder (Block Storage) AZs, as OpenStack does
not enforce AZ affinity of Compute and Block Storage. Additionally,
many OpenStack deployments use a single block storage AZ, where the
backing storage driver itself implements this failure domain. Any
Block Storage AZ support must take this quirk into account.


This change is Reviewable

@googlebot googlebot added the cla: yes label Aug 17, 2016

k8s-bot commented Aug 17, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot commented Aug 17, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot commented Aug 17, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot commented Aug 17, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot commented Aug 17, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot commented Aug 17, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

k8s-bot commented Aug 17, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message may repeat a few times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Otherwise, if this message is too spammy, please complain to ixdy.

Implement OpenStack Nova (Compute) AZs
* Remove reading of Instance UUID from /var/lib/cloud, as we'll need
  to call the metadata service anyway to get the availability zone.
* Update and rename parseMetadataUUID and readInstanceID to handle
  both ID and AZ.
* Update GetZone to set the FailureDomain to the Nova AZ.

We do not yet implement Cinder (Block Storage) AZs, as OpenStack does
not enforce AZ affinity of Compute and Block Storage. Additionally,
many OpenStack deployments use a single block storage AZ, where the
backing storage driver itself implements this failure domain. Any
Block Storage AZ support must take this quirk into account.
Contributor

kiall commented Aug 17, 2016

Patch has now been tested against OpenStack Liberty, where the az happened to be the default of "nova" in a "region1" region:

$ kubectl describe nodes
Name:                   hcp-kubernetes-node-3737ba16-639d-11e6-8b93-fa163eefb5e8
Labels:                 beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        failure-domain.beta.kubernetes.io/region=region1
                        failure-domain.beta.kubernetes.io/zone=nova
                        kubernetes.io/hostname=hcp-kubernetes-node-3737ba16-639d-11e6-8b93-fa163eefb5e8
Member

mikedanese commented Aug 17, 2016

@kubernetes/sig-openstack

Contributor

dagnello commented Aug 17, 2016

adding @anguslees

Contributor

anguslees commented Aug 26, 2016

I like the patch, but I think we should hold off until we can read metadata from config-drive too.

Specifically, I'm concerned that removing the bit that reads instanceID from /var/lib/cloud will break a bunch of things for deployments that don't provide metadata server, and that breakage will be worse than not providing az info.

I have some old code for config-drive support that I've been meaning to rebase for a while, let me accelerate that..

Contributor

anguslees commented Aug 26, 2016

For completeness, another alternative for az info would be to pull OS-EXT-AZ:availability_zone info from a nova lookup (possibly remotely). I think this requires the extension to be added to gophercloud first however.

harlowja commented Aug 26, 2016

Just a late comment,

Remove reading of Instance UUID from /var/lib/cloud, as we'll need to call the metadata service anyway to get the availability zone.

Since /var/lib/cloud is created and managed by cloud-init (a tool that I also work on), and you folk seem like you just want to have the AZ from it (which cloud-init actually already has since it typically has extracted that already from the metadata service, be that config-drive or REST) would it be up for consideration to have a new tool that cloud-init provides that would let you read this (already existing and stored) info instead.

The thing most people don't know I think is that obj.pkl @ https://cloudinit.readthedocs.io/en/latest/topics/dir_layout.html is actually a pickled python object that has all the datasource data (captured at fetch time) @ https://git.launchpad.net/cloud-init/tree/cloudinit/sources/__init__.py#n56 (which includes all the metadata, all the ...)

Just seems like what's needed is a better helper-tool to extract that data for usage here?

Contributor

kiall commented Aug 26, 2016

@harlowja - ah. I actually spent a good 30 mins trying to find out what I could expect from that file, but without a defined format, I felt I was barking up the private no guarantee over backwards compatibility tree! Is that considered a stable cloud-init interface?

pickle.load() it :-P

As for a stable interface, hmmm, I wouldn't say the file itself is, but it could be, or a tool that extracts info from it could be IMHO.

Contributor

kiall commented Aug 29, 2016

@harlowja a tool to handle the data (or - a directory structure on disk mirroring the API's you have crawled?) would certainly be useful.

Contributor

kiall commented Aug 29, 2016

For completeness, another alternative for az info would be to pull OS-EXT-AZ:availability_zone info from a nova lookup (possibly remotely). I think this requires the extension to be added to gophercloud first however.

It would also require that each instance has API credentials for the OpenStack APIs, which would be rather annoying given OpenStack's pretty basic authorization policies. Most clouds have "user" or "cloud wide admin" and nothing else.

k8s-bot commented Aug 29, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

harlowja commented Aug 29, 2016

Right, scott (one of the other cloud-init authors) and I have wanted that for a while; maybe this can be the motivating factor ;)

Contributor

anguslees commented Aug 30, 2016

Config drive code is in #31671

I hope you don't mind @kiall, but I also included a version of this FailureDomain PR since it was only a few additional lines on top of the config drive restructure.

Contributor

kiall commented Aug 30, 2016

@anguslees don't mind at all, will test out your PR ASAP

k8s-bot commented Sep 1, 2016

Can one of the admins verify that this patch is reasonable to test? If so, please reply "ok to test".
(Note: "add to whitelist" is no longer supported. Please update configurations in kubernetes/test-infra/jenkins/job-configs/kubernetes-jenkins-pull instead.)

This message will repeat several times in short succession due to jenkinsci/ghprb-plugin#292. Sorry.

Contributor

kiall commented Sep 20, 2016

Closed in favor of #31671

@kiall kiall closed this Sep 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment