pkg/asset/machines/aws: Only return available zones #1210

wking · 2019-02-08T00:12:13Z

Zones can also be impaired, unavailable, or "information" (docs here and here). I dunno what that last one is about. But we only want to put machines in available zones ;).

This is a bit racy, because we could get different responses at both call-sites. Moving forward, we'll want to shift this request into a single available-zones asset.

Fixes #1209.

Zones can also be impaired, unavailable, or "information" [1,2]. I dunno what that last one is about. But we only want to put machines in available zones ;). This is a bit racy, because we could get different responses at both call-sites. Moving forward, we'll want to shift this request into a single available-zones asset. [1]: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeAvailabilityZones.html [2]: https://www.terraform.io/docs/providers/aws/d/availability_zones.html#state

wking · 2019-02-08T01:34:18Z

e2e-aws:

Flaky tests:

[Feature:DeploymentConfig] deploymentconfigs with test deployments [Conformance] should run a deployment to completion and then scale to zero [Suite:openshift/conformance/parallel/minimal]

Failing tests:

[Conformance][Area:Networking][Feature:Router] The default ClusterIngress should support default wildcard reencrypt routes through external DNS [Suite:openshift/conformance/parallel/minimal]

/retest

wking · 2019-02-08T02:47:21Z

All green, when you have a moment @abhinavdahiya, @staebler.

abhinavdahiya · 2019-02-08T03:00:14Z

Moving forward, we'll want to shift this request into a single available-zones asset.

Do we have an PR open or tracking issue...

wking · 2019-02-08T03:26:55Z

Moving forward, we'll want to shift this request into a single available-zones asset.

Do we have an PR open or tracking issue...

#1121 will get us closer. I can file a ticket for what will still be left after that later tonight.

wking · 2019-02-08T03:29:27Z

But "zone goes offline mid-asset-graph" is probably going to break automatic zone selection even once we're down to a single zone-listing call (we'll just be internally consistent when we die ;).

wking · 2019-02-08T08:01:39Z

Ok, I think #1045, #1121, and #1212 together will address the consolidation to a single available-zones request in Go. I still think we want this PR to get the filtering in Go (which we'll want forever) and Terraform (which we'll need until #1045 and #1121 together let us drop that Terraform zone request).

staebler · 2019-02-15T16:24:08Z

Ok, I think #1045, #1121, and #1212 together will address the consolidation to a single available-zones request in Go. I still think we want this PR to get the filtering in Go (which we'll want forever) and Terraform (which we'll need until #1045 and #1121 together let us drop that Terraform zone request).

I don't think we are quite there yet for dropping the Terraform zone request even after #1045 and #1121. I would like to see us send from Go to Terraform the union of the zones used in all of the machine pools. Then, we can get rid of the Terraform zone request.

staebler · 2019-02-15T16:25:15Z

Do we need to modify the Go fetch of availability zones as well to only return the zones that have a state of "available"?

…-release:4.0.0-0.6 Clayton pushed 4.0.0-0.nightly-2019-02-27-213933 to quay.io/openshift-release-dev/ocp-release:4.0.0-0.6. Extracting the associated RHCOS build: $ oc adm release info --pullspecs quay.io/openshift-release-dev/ocp-release:4.0.0-0.6 | grep machine-os-content machine-os-content registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-02-27-213933@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b $ oc image info registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-02-26-125216@sha256:1262533e31a427917f94babeef2774c98373409897863ae742ff04120f32f79b | grep version version=47.330 that's the same machine-os-content image referenced from 4.0.0-0.5, which we used for installer v0.13.0. Renaming OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE gets us CI testing of the pinned release despite openshift/release@60007df2 (Use RELEASE_IMAGE_LATEST for CVO payload, 2018-10-03, openshift/release#1793). Also comment out regions which this particular RHCOS build wasn't pushed to, leaving only: $ curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/47.330/meta.json | jq -r '.amis[] | .name' ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 ca-central-1 eu-central-1 eu-west-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-east-2 us-west-1 us-west-2 I'd initially expected to export the pinning environment variables in release.sh, but I've put them in build.sh here because our continuous integration tests use build.sh directly and don't go through release.sh. Using the slick, new change-log generator from [1], here's everything that changed in the update payload: $ oc adm release info --changelog ~/.local/lib/go/src --changes-from quay.io/openshift-release-dev/ocp-release:4.0.0-0.5 quay.io/openshift-release-dev/ocp-release:4.0.0-0.6 # 4.0.0-0.6 Created: 2019-02-28 20:40:11 +0000 UTC Image Digest: `sha256:5ce3d05da3bfa3d0310684f5ac53d98d66a904d25f2e55c2442705b628560962` Promoted from registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-02-27-213933 ## Changes from 4.0.0-0.5 ### Components * Kubernetes 1.12.4 ### New images * [pod](https://github.com/openshift/images) git [2f60da39](openshift/images@2f60da3) `sha256:c0d602467dfe0299ce577ba568a9ef5fb9b0864bac6455604258e7f5986d3509` ### Rebuilt images without code change * [cloud-credential-operator](https://github.com/openshift/cloud-credential-operator) git [01bbf372](openshift/cloud-credential-operator@01bbf37) `sha256:f87be09923a5cb081722634d2e0c3d0a5633ea2c23da651398d4e915ad9f73b0` * [cluster-autoscaler](https://github.com/openshift/kubernetes-autoscaler) git [d8a4a304](openshift/kubernetes-autoscaler@d8a4a30) `sha256:955413b82cf8054ce149bc05c18297a8abe9c59f9d0034989f08086ae6c71fa6` * [cluster-autoscaler-operator](https://github.com/openshift/cluster-autoscaler-operator) git [73c46659](openshift/cluster-autoscaler-operator@73c4665) `sha256:756e813fce04841993c8060d08a5684c173cbfb61a090ae67cb1558d76a0336e` * [cluster-bootstrap](https://github.com/openshift/cluster-bootstrap) git [05a5c8e6](openshift/cluster-bootstrap@05a5c8e) `sha256:dbdd90da7d256e8d49e4e21cb0bdef618c79d83f539049f89f3e3af5dbc77e0f` * [cluster-config-operator](https://github.com/openshift/cluster-config-operator) git [aa1805e7](openshift/cluster-config-operator@aa1805e) `sha256:773d3355e6365237501d4eb70d58cd0633feb541d4b6f23d6a5f7b41fd6ad2f5` * [cluster-dns-operator](https://github.com/openshift/cluster-dns-operator) git [ffb04ae9](openshift/cluster-dns-operator@ffb04ae) `sha256:ca15f98cc1f61440f87950773329e1fdf58e73e591638f18c43384ad4f8f84da` * [cluster-machine-approver](https://github.com/openshift/cluster-machine-approver) git [2fbc6a6b](openshift/cluster-machine-approver@2fbc6a6) `sha256:a66af3b1f4ae98257ab600d54f8c94f3a4136f85863bbe0fa7c5dba65c5aea46` * [cluster-node-tuned](https://github.com/openshift/openshift-tuned) git [278ee72d](openshift/openshift-tuned@278ee72) `sha256:ad71743cc50a6f07eba013b496beab9ec817603b07fd3f5c022fffbf400e4f4b` * [cluster-node-tuning-operator](https://github.com/openshift/cluster-node-tuning-operator) git [b5c14deb](openshift/cluster-node-tuning-operator@b5c14de) `sha256:e61d1fdb7ad9f5fed870e917a1bc8fac9ccede6e4426d31678876bcb5896b000` * [cluster-openshift-controller-manager-operator](https://github.com/openshift/cluster-openshift-controller-manager-operator) git [3f79b51b](openshift/cluster-openshift-controller-manager-operator@3f79b51) `sha256:8f3b40b4dd29186975c900e41b1a94ce511478eeea653b89a065257a62bf3ae9` * [cluster-svcat-apiserver-operator](https://github.com/openshift/cluster-svcat-apiserver-operator) git [547648cb](openshift/cluster-svcat-apiserver-operator@547648c) `sha256:e7c9323b91dbb11e044d5a1277d1e29d106d92627a6c32bd0368616e0bcf631a` * [cluster-svcat-controller-manager-operator](https://github.com/openshift/cluster-svcat-controller-manager-operator) git [9261f420](openshift/cluster-svcat-controller-manager-operator@9261f42) `sha256:097a429eda2306fcd49e14e4f5db8ec3a09a90fa29ebdbc98cc519511ab6fb5b` * [cluster-version-operator](https://github.com/openshift/cluster-version-operator) git [70c0232e](openshift/cluster-version-operator@70c0232) `sha256:7d59edff68300e13f0b9e56d2f2bc1af7f0051a9fbc76cc208239137ac10f782` * [configmap-reloader](https://github.com/openshift/configmap-reload) git [3c2f8572](openshift/configmap-reload@3c2f857) `sha256:32360c79d8d8d54cea03675c24f9d0a69877a2f2e16b949ca1d97440b8f45220` * [console-operator](https://github.com/openshift/console-operator) git [32ed7c03](openshift/console-operator@32ed7c0) `sha256:f8c07cb72dc8aa931bbfabca9b4133f3b93bc96da59e95110ceb8c64f3efc755` * [container-networking-plugins-supported](https://github.com/openshift/ose-containernetworking-plugins) git [f6a58dce](openshift/ose-containernetworking-plugins@f6a58dc) `sha256:c6434441fa9cc96428385574578c41e9bc833b6db9557df1dd627411d9372bf4` * [container-networking-plugins-unsupported](https://github.com/openshift/ose-containernetworking-plugins) git [f6a58dce](openshift/ose-containernetworking-plugins@f6a58dc) `sha256:bb589cf71d4f41977ec329cf808cdb956d5eedfc604e36b98cfd0bacce513ffc` * [coredns](https://github.com/openshift/coredns) git [fbcb8252](openshift/coredns@fbcb825) `sha256:2f1812a95e153a40ce607de9b3ace7cae5bee67467a44a64672dac54e47f2a66` * [docker-builder](https://github.com/openshift/builder) git [1a77d837](openshift/builder@1a77d83) `sha256:27062ab2c62869e5ffeca234e97863334633241089a5d822a19350f16945fbcb` * [etcd](https://github.com/openshift/etcd) git [a0e62b48](openshift/etcd@a0e62b4) `sha256:e4e9677d004f8f93d4f084739b4502c2957c6620d633e1fdb379c33243c684fa` * [grafana](https://github.com/openshift/grafana) git [58efe0eb](openshift/grafana@58efe0e) `sha256:548abcc50ccb8bb17e6be2baf050062a60fc5ea0ca5d6c59ebcb8286fc9eb043` * [haproxy-router](https://github.com/openshift/router) git [2c33f47f](openshift/router@2c33f47) `sha256:c899b557e4ee2ea7fdbe5c37b5f4f6e9f9748a39119130fa930d9497464bd957` * [k8s-prometheus-adapter](https://github.com/openshift/k8s-prometheus-adapter) git [815fa76b](openshift/k8s-prometheus-adapter@815fa76) `sha256:772c1b40b21ccaa9ffcb5556a1228578526a141b230e8ac0afe19f14404fdffc` * [kube-rbac-proxy](https://github.com/openshift/kube-rbac-proxy) git [3f271e09](openshift/kube-rbac-proxy@3f271e0) `sha256:b6de05167ecab0472279cdc430105fac4b97fb2c43d854e1c1aa470d20a36572` * [kube-state-metrics](https://github.com/openshift/kube-state-metrics) git [2ab51c9f](openshift/kube-state-metrics@2ab51c9) `sha256:611c800c052de692c84d89da504d9f386d3dcab59cbbcaf6a26023756bc863a0` * [libvirt-machine-controllers](https://github.com/openshift/cluster-api-provider-libvirt) git [7ff8b08f](openshift/cluster-api-provider-libvirt@7ff8b08) `sha256:6ab8749886ec26d45853c0e7ade3c1faaf6b36e09ba2b8a55f66c6cc25052832` * [multus-cni](https://github.com/openshift/ose-multus-cni) git [61f9e088](https://github.com/openshift/ose-multus-cni/commit/61f9e0886370ea5f6093ed61d4cfefc6dadef582) `sha256:e3f87811d22751e7f06863e7a1407652af781e32e614c8535f63d744e923ea5c` * [oauth-proxy](https://github.com/openshift/oauth-proxy) git [b771960b](openshift/oauth-proxy@b771960) `sha256:093a2ac687849e91671ce906054685a4c193dfbed27ebb977302f2e09ad856dc` * [openstack-machine-controllers](https://github.com/openshift/cluster-api-provider-openstack) git [c2d845b](openshift/cluster-api-provider-openstack@c2d845b) `sha256:f9c321de068d977d5b4adf8f697c5b15f870ccf24ad3e19989b129e744a352a7` * [operator-registry](https://github.com/operator-framework/operator-registry) git [0531400c](operator-framework/operator-registry@0531400) `sha256:730f3b504cccf07e72282caf60dc12f4e7655d7aacf0374d710c3f27125f7008` * [prom-label-proxy](https://github.com/openshift/prom-label-proxy) git [46423f9d](openshift/prom-label-proxy@46423f9) `sha256:3235ad5e22b6f560d447266e0ecb2e5655fda7c0ab5c1021d8d3a4202f04d2ca` * [prometheus](https://github.com/openshift/prometheus) git [6e5fb5dc](openshift/prometheus@6e5fb5d) `sha256:013455905e4a6313f8c471ba5f99962ec097a9cecee3e22bdff3e87061efad57` * [prometheus-alertmanager](https://github.com/openshift/prometheus-alertmanager) git [4617d550](openshift/prometheus-alertmanager@4617d55) `sha256:54512a6cf25cf3baf7fed0b01a1d4786d952d93f662578398cad0d06c9e4e951` * [prometheus-config-reloader](https://github.com/openshift/prometheus-operator) git [f8a0aa17](openshift/prometheus-operator@f8a0aa1) `sha256:244fc5f1a4a0aa983067331c762a04a6939407b4396ae0e86a1dd1519e42bb5d` * [prometheus-node-exporter](https://github.com/openshift/node_exporter) git [f248b582](openshift/node_exporter@f248b58) `sha256:390e5e1b3f3c401a0fea307d6f9295c7ff7d23b4b27fa0eb8f4017bd86d7252c` * [prometheus-operator](https://github.com/openshift/prometheus-operator) git [f8a0aa17](openshift/prometheus-operator@f8a0aa1) `sha256:6e697dcaa19e03bded1edf5770fb19c0d2cd8739885e79723e898824ce3cd8f5` * [service-catalog](https://github.com/openshift/service-catalog) git [b24ffd6f](openshift/service-catalog@b24ffd6) `sha256:85ea2924810ced0a66d414adb63445a90d61ab5318808859790b1d4b7decfea6` * [service-serving-cert-signer](https://github.com/openshift/service-serving-cert-signer) git [30924216](openshift/service-serving-cert-signer@3092421) `sha256:7f89db559ffbd3bf609489e228f959a032d68dd78ae083be72c9048ef0c35064` * [telemeter](https://github.com/openshift/telemeter) git [e12aabe4](openshift/telemeter@e12aabe) `sha256:fd518d2c056d4ab8a89d80888e0a96445be41f747bfc5f93aa51c7177cf92b92` ### [aws-machine-controllers](https://github.com/openshift/cluster-api-provider-aws) * client: add cluster-api-provider-aws to UserAgent for AWS API calls [openshift#167](openshift/cluster-api-provider-aws#167) * Drop the yaml unmarshalling [openshift#155](openshift/cluster-api-provider-aws#155) * [Full changelog](openshift/cluster-api-provider-aws@46f4852...c0c3b9e) ### [cli, deployer, hyperkube, hypershift, node, tests](https://github.com/openshift/ose) * Build OSTree using baked SELinux policy [#22081](https://github.com/openshift/ose/pull/22081) * NodeName was being cleared for `oc debug node/X` instead of set [#22086](https://github.com/openshift/ose/pull/22086) * UPSTREAM: 73894: Print the involved object in the event table [#22039](https://github.com/openshift/ose/pull/22039) * Publish CRD openapi [#22045](https://github.com/openshift/ose/pull/22045) * UPSTREAM: 00000: wait for CRD discovery to be successful once before [#22149](https://github.com/openshift/ose/pull/22149) * `oc adm release info --changelog` should clone if necessary [#22148](https://github.com/openshift/ose/pull/22148) * [Full changelog](openshift/ose@c547bc3...0cbcfc5) ### [cluster-authentication-operator](https://github.com/openshift/cluster-authentication-operator) * Add redeploy on serving cert and operator pod template change [openshift#75](openshift/cluster-authentication-operator#75) * Create the service before waiting for serving certs [openshift#84](openshift/cluster-authentication-operator#84) * [Full changelog](openshift/cluster-authentication-operator@78dd53b...35879ec) ### [cluster-image-registry-operator](https://github.com/openshift/cluster-image-registry-operator) * Enable subresource status [openshift#209](openshift/cluster-image-registry-operator#209) * Add ReadOnly flag [openshift#210](openshift/cluster-image-registry-operator#210) * do not setup ownerrefs for clusterscoped/cross-namespace objects [openshift#215](openshift/cluster-image-registry-operator#215) * s3: include operator version in UserAgent for AWS API calls [openshift#212](openshift/cluster-image-registry-operator#212) * [Full changelog](openshift/cluster-image-registry-operator@0780074...8060048) ### [cluster-ingress-operator](https://github.com/openshift/cluster-ingress-operator) * Adds info log msg indicating ns/secret used by DNSManager [openshift#134](openshift/cluster-ingress-operator#134) * Introduce certificate controller [openshift#140](openshift/cluster-ingress-operator#140) * [Full changelog](openshift/cluster-ingress-operator@1b4fa5a...09d14db) ### [cluster-kube-apiserver-operator](https://github.com/openshift/cluster-kube-apiserver-operator) * bump(*): fix installer pod shutdown and rolebinding [openshift#307](openshift/cluster-kube-apiserver-operator#307) * bump to fix early status [openshift#309](openshift/cluster-kube-apiserver-operator#309) * [Full changelog](openshift/cluster-kube-apiserver-operator@4016927...fa75c05) ### [cluster-kube-controller-manager-operator](https://github.com/openshift/cluster-kube-controller-manager-operator) * bump(*): fix installer pod shutdown and rolebinding [openshift#183](openshift/cluster-kube-controller-manager-operator#183) * bump to fix empty status [openshift#184](openshift/cluster-kube-controller-manager-operator#184) * [Full changelog](openshift/cluster-kube-controller-manager-operator@95f5f32...53ff6d8) ### [cluster-kube-scheduler-operator](https://github.com/openshift/cluster-kube-scheduler-operator) * Rotate kubeconfig [openshift#62](openshift/cluster-kube-scheduler-operator#62) * Don't pass nil function pointer to NewConfigObserver [openshift#65](openshift/cluster-kube-scheduler-operator#65) * [Full changelog](openshift/cluster-kube-scheduler-operator@50848b4...7066c96) ### [cluster-monitoring-operator](https://github.com/openshift/cluster-monitoring-operator) * *: Clean test invocation and documenation [openshift#267](openshift/cluster-monitoring-operator#267) * pkg/operator: fix progressing state of cluster operator [openshift#268](openshift/cluster-monitoring-operator#268) * jsonnet/main.jsonnet: Bump Prometheus to v2.7.1 [openshift#246](openshift/cluster-monitoring-operator#246) * OWNERS: Remove ironcladlou [openshift#204](openshift/cluster-monitoring-operator#204) * test/e2e: Refactor framework setup & wait for query logic [openshift#265](openshift/cluster-monitoring-operator#265) * jsonnet: Update dependencies [openshift#269](openshift/cluster-monitoring-operator#269) * [Full changelog](openshift/cluster-monitoring-operator@94b701f...3609aea) ### [cluster-network-operator](https://github.com/openshift/cluster-network-operator) * Update to be able to track both DaemonSets and Deployments [openshift#102](openshift/cluster-network-operator#102) * openshift-sdn: more service-catalog netnamespace fixes [openshift#108](openshift/cluster-network-operator#108) * [Full changelog](openshift/cluster-network-operator@9db4d03...15204e6) ### [cluster-openshift-apiserver-operator](https://github.com/openshift/cluster-openshift-apiserver-operator) * bump to fix status reporting [openshift#157](openshift/cluster-openshift-apiserver-operator#157) * [Full changelog](openshift/cluster-openshift-apiserver-operator@1ce6ac7...0a65fe4) ### [cluster-samples-operator](https://github.com/openshift/cluster-samples-operator) * use pumped up rate limiter, shave 30 seconds from startup creates [openshift#113](openshift/cluster-samples-operator#113) * [Full changelog](openshift/cluster-samples-operator@4726068...f001324) ### [cluster-storage-operator](https://github.com/openshift/cluster-storage-operator) * WaitForFirstConsumer in AWS StorageClass [openshift#12](openshift/cluster-storage-operator#12) * [Full changelog](openshift/cluster-storage-operator@dc42489...b850242) ### [console](https://github.com/openshift/console) * Add back OAuth configuration link in kubeadmin notifier [openshift#1202](openshift/console#1202) * Normalize display of <ResourceIcon> across browsers, platforms [openshift#1210](openshift/console#1210) * Add margin spacing so event info doesn't run together before truncating [openshift#1170](openshift/console#1170) * [Full changelog](openshift/console@a0b75bc...d10fb8b) ### [docker-registry](https://github.com/openshift/image-registry) * Bump k8s and OpenShift, use new docker-distribution branch [openshift#165](openshift/image-registry#165) * [Full changelog](openshift/image-registry@75a1fbe...afcc7da) ### [installer](https://github.com/openshift/installer) * data: route53 A records with SimplePolicy should not use health check [openshift#1308](openshift#1308) * bootkube.sh: do not hide problems with render [openshift#1274](openshift#1274) * data/bootstrap/files/usr/local/bin/bootkube: etcdctl from release image [openshift#1315](openshift#1315) * pkg/types/validation: Drop v1beta1 backwards compat hack [openshift#1251](openshift#1251) * pkg/asset/tls: self-sign etcd-client-ca [openshift#1267](openshift#1267) * pkg/asset/tls: self-sign aggregator-ca [openshift#1275](openshift#1275) * pkg/types/validation/installconfig: Drop nominal v1beta2 support [openshift#1319](openshift#1319) * Removing unused/deprecated security groups and ports. Updated AWS doc [openshift#1306](openshift#1306) * [Full changelog](openshift/installer@0208204...563f71f) ### [jenkins, jenkins-agent-maven, jenkins-agent-nodejs](https://github.com/openshift/jenkins) * recover from jenkins deps backleveling workflow-durable-task-step fro… [openshift#806](openshift/jenkins#806) * [Full changelog](openshift/jenkins@2485f9a...e4583ca) ### [machine-api-operator](https://github.com/openshift/machine-api-operator) * Rename labels from sigs.k8s.io to machine.openshift.io [openshift#213](openshift/machine-api-operator#213) * Remove clusters.cluster.k8s.io CRD [openshift#225](openshift/machine-api-operator#225) * MAO: Stop setting statusProgressing=true when resyincing same version [openshift#217](openshift/machine-api-operator#217) * Generate clientset for machine health check API [openshift#223](openshift/machine-api-operator#223) * [Full changelog](openshift/machine-api-operator@bf95d7d...34c3424) ### [machine-config-controller, machine-config-daemon, machine-config-operator, machine-config-server, setup-etcd-environment](https://github.com/openshift/machine-config-operator) * daemon: Only print status if os == RHCOS [openshift#495](openshift/machine-config-operator#495) * Add pod image to image-references [openshift#500](openshift/machine-config-operator#500) * pkg/daemon: stash the node object [openshift#464](openshift/machine-config-operator#464) * Eliminate use of cpu limits [openshift#503](openshift/machine-config-operator#503) * MCD: add ign validation check for mc.ignconfig [openshift#481](openshift/machine-config-operator#481) * [Full changelog](openshift/machine-config-operator@875f25e...f0b87fc) ### [operator-lifecycle-manager](https://github.com/operator-framework/operator-lifecycle-manager) * fix(owners): remove cross-namespace and cluster->namespace ownerrefs [openshift#729](operator-framework/operator-lifecycle-manager#729) * [Full changelog](operator-framework/operator-lifecycle-manager@1ac9ace...9186781) ### [operator-marketplace](https://github.com/operator-framework/operator-marketplace) * [opsrc] Do not delete csc during purge [openshift#117](operator-framework/operator-marketplace#117) * Remove Dependency on Owner References [openshift#118](operator-framework/operator-marketplace#118) * [Full changelog](operator-framework/operator-marketplace@7b53305...fedd694) [1]: openshift/origin#22030

wking · 2019-03-05T08:46:48Z

Do we need to modify the Go fetch of availability zones as well to only return the zones that have a state of "available"?

Like this?

staebler · 2019-03-05T17:51:30Z

Do we need to modify the Go fetch of availability zones as well to only return the zones that have a state of "available"?

Like this?

Yeah, just like that. It boggles my mind that I made that comment. I can't understand how I could have missed that, particularly given that it is the majority of the changes in the PR.

abhinavdahiya · 2019-03-05T23:54:36Z

data/data/aws/vpc/common.tf

@@ -3,7 +3,9 @@
 data "aws_region" "current" {}

 // Fetch a list of available AZs
-data "aws_availability_zones" "azs" {}
+data "aws_availability_zones" "azs" {


Maybe I'm wrong, but with #1121 the available check in go will make sure that the masterAZs chosen were available and az_to_<public/private>subnet maps will make sure we only create machines in AZs chosen by installer

?

Maybe I'm wrong, but with #1121 the available check in go...

Yup, the Terraform part of this is code that is hopefully going away soon. But we want the Go part of this long-term, and the Terraform code is still there now, so I think we might as well fix it too.

Well technically terraform should use the AZs given by the installer ie if the AZ goes out of service between asset generation and terraform call... it would be better if we fail with error that AZ is out of service rather than the AZ not found in the az_to_<public/private>subnet maps.

Well technically terraform should use the AZs given by the installer...

Agreed.

... ie if the AZ goes out of service between asset generation and terraform call...

I think using Go-provided zones means we can drop this data "aws_availability_zones" section altogether. Guarding against a zone going down is always going to be racy up through actually creating resources in the zone, so I'd rather save a few lines and the mental overhead and just have Terraform assume Go has already done any required validation and blindly consume what it was given.

I think using Go-provided zones means we can drop this data "aws_availability_zones" section altogether. Guarding against a zone going down is always going to be racy up through actually creating resources in the zone, so I'd rather save a few lines and the mental overhead and just have Terraform assume Go has already done any required validation and blindly consume what it was given.

I can get behind that change. :)

I can get behind that change. :)

Did you want me to pick that up in this PR? Or submit it as a follow-up once this lands?

I think using Go-provided zones means we can drop this data "aws_availability_zones" section altogether. Guarding against a zone going down is always going to be racy up through actually creating resources in the zone, so I'd rather save a few lines and the mental overhead and just have Terraform assume Go has already done any required validation and blindly consume what it was given.

I can get behind that change. :)

But that would need some changes esp https://jira.coreos.com/browse/CORS-902, that will push this PR to 4.1

I can get behind that change. :)

Did you want me to pick that up in this PR? Or submit it as a follow-up once this lands?

follow up is fine.

abhinavdahiya · 2019-03-06T00:38:27Z

/lgtm

openshift-ci-robot · 2019-03-06T00:38:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [abhinavdahiya,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This was originally part of avoiding broken zones, see e8921c3 (ci-operator/templates/openshift: Get e2e-aws out of us-east-1b, 2019-03-22, openshift#3204) and b717933 (ci-operator/templates/openshift/installer/cluster-launch-installer-*: Random AWS regions for IPI, 2020-01-23, openshift#6833). But the installer has had broken-zone avoidence since way back in openshift/installer@71aef620b6 (pkg/asset/machines/aws: Only return available zones, 2019-02-07, openshift/installer#1210) I dunno how reliably AWS sets 'state: impaired' and similar; it didn't seem to protect us from e8921c3. But we're getting ready to pivot to using multiple AWS accounts, which creates two issues with hard-coding region names in the step: 1. References by name are not stable between accounts. From the AWS docs [1]: To ensure that resources are distributed across the Availability Zones for a Region, we independently map Availability Zones to names for each AWS account. For example, the Availability Zone us-east-1a for your AWS account might not be the same location as us-east-1a for another AWS account. So "aah, us-east-1a is broken, let's use b and c instead" might apply to one account but not the other. And the installer does not currently accept zone IDs. 2. References by name may not exist in other accounts. From the AWS docs [1]: As Availability Zones grow over time, our ability to expand them can become constrained. If this happens, we might restrict you from launching an instance in a constrained Availability Zone unless you already have an instance in that Availability Zone. Eventually, we might also remove the constrained Availability Zone from the list of Availability Zones for new accounts. Therefore, your account might have a different number of available Availability Zones in a Region than another account. And it turns out that for some reason they sometimes don't name sequentially, e.g. our new account lacks us-west-1a: $ AWS_PROFILE=ci aws --region us-west-1 ec2 describe-availability-zones | jq -r '.AvailabilityZones[] | .ZoneName + " " + .ZoneId + " " + .State' | sort us-west-1a usw1-az3 available us-west-1b usw1-az1 available $ AWS_PROFILE=ci-2 aws --region us-west-1 ec2 describe-availability-zones | jq -r '.AvailabilityZones[] | .ZoneName + " " + .ZoneId + " " + .State' | sort us-west-1b usw1-az3 available us-west-1c usw1-az1 available I have no idea why they decided to do that, but we have to work with the world as it is ;). Removing the us-east-1 overrides helps reduce our exposure, although we are still vulnerable to (2) with the a/b default line. We'll do something about that in follow-up work. Leaving the "which zones?" decision up to the installer would cause it to try to set up each available zone, and that causes more API contention and resource consumption than we want. Background on that in 51c4a37 (ci-operator/templates/openshift: Explicitly set AWS availability zones, 2019-03-28, openshift#3285) and d87fffb (ci-operator/templates/openshift: Drop us-east-1c, 2019-04-26, openshift#3615), as well as the rejected/rotted-out [2]. [1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html [2]: openshift/installer#1487

wking added the platform/aws label Feb 8, 2019

openshift-ci-robot requested review from flaper87 and staebler February 8, 2019 00:12

openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 8, 2019

abhinavdahiya reviewed Mar 5, 2019

View reviewed changes

openshift-ci-robot assigned abhinavdahiya Mar 6, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 6, 2019

openshift-merge-robot merged commit d8e7559 into openshift:master Mar 6, 2019

wking deleted the aws-available-zones branch March 6, 2019 02:06

wking mentioned this pull request Mar 27, 2019

pkg/asset/machines/worker: Structured workers #1481

Merged

wking mentioned this pull request Apr 8, 2021

ci-operator/step-registry/ipi/conf/aws: Drop us-east-1 zone overrides openshift/release#17576

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/asset/machines/aws: Only return available zones #1210

pkg/asset/machines/aws: Only return available zones #1210

wking commented Feb 8, 2019

wking commented Feb 8, 2019

wking commented Feb 8, 2019

abhinavdahiya commented Feb 8, 2019

wking commented Feb 8, 2019

wking commented Feb 8, 2019

wking commented Feb 8, 2019

staebler commented Feb 15, 2019

staebler commented Feb 15, 2019

wking commented Mar 5, 2019

staebler commented Mar 5, 2019

abhinavdahiya Mar 5, 2019

wking Mar 6, 2019

abhinavdahiya Mar 6, 2019

wking Mar 6, 2019

abhinavdahiya Mar 6, 2019

wking Mar 6, 2019

abhinavdahiya Mar 6, 2019

abhinavdahiya Mar 6, 2019

abhinavdahiya commented Mar 6, 2019

openshift-ci-robot commented Mar 6, 2019

pkg/asset/machines/aws: Only return available zones #1210

pkg/asset/machines/aws: Only return available zones #1210

Conversation

wking commented Feb 8, 2019

wking commented Feb 8, 2019

wking commented Feb 8, 2019

abhinavdahiya commented Feb 8, 2019

wking commented Feb 8, 2019

wking commented Feb 8, 2019

wking commented Feb 8, 2019

staebler commented Feb 15, 2019

staebler commented Feb 15, 2019

wking commented Mar 5, 2019

staebler commented Mar 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavdahiya commented Mar 6, 2019

openshift-ci-robot commented Mar 6, 2019