pkg/asset/machines/worker: Default to MachineSets only where we need them #1487

wking · 2019-03-28T16:53:42Z

Builds on #1481, I'll rebase after that lands.

And not for additional zones. This will keep us from consuming more NAT gateways and associated EIPs than we need, once subsequent work ties those resources to Machine(Set) consumption. This means that pools where replicas is unset or zero will receive no MachineSets (unless the user explicitly configured zones for the pool), but creating MachineSets later on is something we want to be easy anyway, so I don't see a need to require the installer to inject a template MachineSet into the cluster.

The "In each private subnet" -> "For each private subnet" change is because the NAT gateways currently live in the public subnet, but their purpose is to handle egress from machines in the private subnets.

Spun off from #1481, where this change was contentious. The difference is mostly for folks who run openshift-install create cluster without providing an install-config.yaml. With this approach, that would work in us-east-1, although the users will have to add additional zone infra later if they decide they actually do want something in, say, us-east-1e. With the approach taken in #1481, those users will need to get their EIP limits bumped (or change to using an install-config.yaml) before the install will go through, but they won't have to worry about adding more zone infra on day-2. Personally, I'd rather prioritize out-of-the-box success, and leave it to folks who for some reason need many zones to explicitly configure them (or set them up as day-2 operations).

openshift-ci-robot · 2019-03-28T16:54:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…them And not for additional zones. This will keep us from consuming more NAT gateways and associated EIPs than we need, once subsequent work ties those resources to Machine(Set) consumption. This means that pools where replicas is unset or zero will receive no MachineSets (unless the user explicitly configured zones for the pool), but creating MachineSets later on is something we want to be easy anyway, so I don't see a need to require the installer to inject a template MachineSet into the cluster. The "In each private subnet" -> "For each private subnet" change is because the NAT gateways currently live *in* the public subnet, but their purpose is to handle egress from machines in the private subnets.

wking · 2019-03-28T19:11:49Z

Rebased onto master with f71ba29 -> a9c7453, now that #1481 has landed.

wking · 2019-03-28T20:34:13Z

images test-platform flake:

release "release-latest" failed: pod release-latest was already deleted

/retest

smarterclayton · 2019-05-01T22:31:11Z

What risk exists in this?

abhinavdahiya · 2019-05-01T22:44:42Z

What risk exists in this?

/hold

This restricts us deploying/setting up AZs based on machine count chosen at install-time, even though the users didn't know there were making that choice for them and there is no way to expand the subnets to new AZs Day-2 and that's why i'm against it.

users who specifically chooses less AZs -> already deploy only to that AZs, what i think the better solution that we have in place already.

openshift-ci-robot · 2019-05-15T21:13:58Z

@wking: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-upgrade	`a9c7453`	link	`/test e2e-aws-upgrade`
ci/prow/e2e-aws	`a9c7453`	link	`/test e2e-aws`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

abhinavdahiya · 2019-07-24T22:16:24Z

closing due to inactivity. Please reopen if needed.

/close

openshift-ci-robot · 2019-07-24T22:16:26Z

@abhinavdahiya: Closed this PR.

In response to this:

closing due to inactivity. Please reopen if needed.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This was originally part of avoiding broken zones, see e8921c3 (ci-operator/templates/openshift: Get e2e-aws out of us-east-1b, 2019-03-22, openshift#3204) and b717933 (ci-operator/templates/openshift/installer/cluster-launch-installer-*: Random AWS regions for IPI, 2020-01-23, openshift#6833). But the installer has had broken-zone avoidence since way back in openshift/installer@71aef620b6 (pkg/asset/machines/aws: Only return available zones, 2019-02-07, openshift/installer#1210) I dunno how reliably AWS sets 'state: impaired' and similar; it didn't seem to protect us from e8921c3. But we're getting ready to pivot to using multiple AWS accounts, which creates two issues with hard-coding region names in the step: 1. References by name are not stable between accounts. From the AWS docs [1]: To ensure that resources are distributed across the Availability Zones for a Region, we independently map Availability Zones to names for each AWS account. For example, the Availability Zone us-east-1a for your AWS account might not be the same location as us-east-1a for another AWS account. So "aah, us-east-1a is broken, let's use b and c instead" might apply to one account but not the other. And the installer does not currently accept zone IDs. 2. References by name may not exist in other accounts. From the AWS docs [1]: As Availability Zones grow over time, our ability to expand them can become constrained. If this happens, we might restrict you from launching an instance in a constrained Availability Zone unless you already have an instance in that Availability Zone. Eventually, we might also remove the constrained Availability Zone from the list of Availability Zones for new accounts. Therefore, your account might have a different number of available Availability Zones in a Region than another account. And it turns out that for some reason they sometimes don't name sequentially, e.g. our new account lacks us-west-1a: $ AWS_PROFILE=ci aws --region us-west-1 ec2 describe-availability-zones | jq -r '.AvailabilityZones[] | .ZoneName + " " + .ZoneId + " " + .State' | sort us-west-1a usw1-az3 available us-west-1b usw1-az1 available $ AWS_PROFILE=ci-2 aws --region us-west-1 ec2 describe-availability-zones | jq -r '.AvailabilityZones[] | .ZoneName + " " + .ZoneId + " " + .State' | sort us-west-1b usw1-az3 available us-west-1c usw1-az1 available I have no idea why they decided to do that, but we have to work with the world as it is ;). Removing the us-east-1 overrides helps reduce our exposure, although we are still vulnerable to (2) with the a/b default line. We'll do something about that in follow-up work. Leaving the "which zones?" decision up to the installer would cause it to try to set up each available zone, and that causes more API contention and resource consumption than we want. Background on that in 51c4a37 (ci-operator/templates/openshift: Explicitly set AWS availability zones, 2019-03-28, openshift#3285) and d87fffb (ci-operator/templates/openshift: Drop us-east-1c, 2019-04-26, openshift#3615), as well as the rejected/rotted-out [2]. [1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html [2]: openshift/installer#1487

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 28, 2019

openshift-ci-robot requested review from smarterclayton and staebler March 28, 2019 16:54

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 28, 2019

wking force-pushed the fewer-default-MachineSets branch from d21aa5c to f71ba29 Compare March 28, 2019 16:55

wking force-pushed the fewer-default-MachineSets branch from f71ba29 to a9c7453 Compare March 28, 2019 19:11

openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 28, 2019

wking mentioned this pull request Mar 28, 2019

ci-operator/templates/openshift: Explicitly set AWS availability zones openshift/release#3285

Merged

This was referenced Mar 28, 2019

data/aws: create fewer NAT gateways #1450

Closed

Multiple NAT Gateways created for single AZ cluster #1411

Closed

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 1, 2019

openshift-ci-robot closed this Jul 24, 2019

wking mentioned this pull request Apr 8, 2021

ci-operator/step-registry/ipi/conf/aws: Drop us-east-1 zone overrides openshift/release#17576

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/asset/machines/worker: Default to MachineSets only where we need them #1487

pkg/asset/machines/worker: Default to MachineSets only where we need them #1487

wking commented Mar 28, 2019

openshift-ci-robot commented Mar 28, 2019

wking commented Mar 28, 2019

wking commented Mar 28, 2019

smarterclayton commented May 1, 2019

abhinavdahiya commented May 1, 2019

openshift-ci-robot commented May 15, 2019

abhinavdahiya commented Jul 24, 2019

openshift-ci-robot commented Jul 24, 2019

pkg/asset/machines/worker: Default to MachineSets only where we need them #1487

pkg/asset/machines/worker: Default to MachineSets only where we need them #1487

Conversation

wking commented Mar 28, 2019

openshift-ci-robot commented Mar 28, 2019

wking commented Mar 28, 2019

wking commented Mar 28, 2019

smarterclayton commented May 1, 2019

abhinavdahiya commented May 1, 2019

openshift-ci-robot commented May 15, 2019

abhinavdahiya commented Jul 24, 2019

openshift-ci-robot commented Jul 24, 2019