Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to using ec2.DescribeInstanceTypes for building the MachineTypes list #8847

Merged
merged 1 commit into from
Apr 6, 2020

Conversation

rifelpet
Copy link
Member

@rifelpet rifelpet commented Apr 4, 2020

ec2.DescribeInstanceTypes provides us with all of the information needed to build the machine types list. I consider this an intermediate PR: we can switch to this in the hack script and validate its output, then later switch to using the call in upup/pkg/fi/cloudup/awsup/machine_types.go directly, removing the hardcoded list altogether which would decouple instance type support from kops releases.

Marking WIP until I get a chance to review the differences in upup/pkg/fi/cloudup/awsup/machine_types.go since ideally there would be none.

In the case of the missing instance types (cr1.8xlarge, hs1.8xlarge, m6g.*), I'm guessing the AWS account I ran this on doesn't have those instance types available: the cr1 and hs1 might have been deprecated before my account was created and I know my account hasnt opted in to the m6g preview yet. We can figure out a short term workaround for those given that once the followup PR would land, the account running ec2.DescribeInstanceTypes will always be the same account the instance lives in, so I'd be confident that the instance type in use will be in the API response.

I still need to review the Memory, Disk, and ENI differences and determine which values are correct.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 4, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 4, 2020
@hakman
Copy link
Member

hakman commented Apr 4, 2020

This is super cool :).

InstanceENIs: 15,
InstanceIPsPerENI: 50,
InstanceENIs: 4,
InstanceIPsPerENI: 15,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correct change based on the table here

@@ -949,14 +912,14 @@ var MachineTypes []AWSMachineTypeInfo = []AWSMachineTypeInfo{
MemoryGB: 192,
Cores: 96,
InstanceENIs: 15,
InstanceIPsPerENI: 50,
InstanceIPsPerENI: 30,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correct change based on the table here

@@ -1170,7 +1133,7 @@ var MachineTypes []AWSMachineTypeInfo = []AWSMachineTypeInfo{
MemoryGB: 256,
Cores: 64,
InstanceENIs: 15,
InstanceIPsPerENI: 30,
InstanceIPsPerENI: 50,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correct change based on the table here

@@ -1252,7 +1215,7 @@ var MachineTypes []AWSMachineTypeInfo = []AWSMachineTypeInfo{
MemoryGB: 256,
Cores: 64,
InstanceENIs: 15,
InstanceIPsPerENI: 30,
InstanceIPsPerENI: 50,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correct change based on the table here

@@ -1380,7 +1343,7 @@ var MachineTypes []AWSMachineTypeInfo = []AWSMachineTypeInfo{
MemoryGB: 256,
Cores: 64,
InstanceENIs: 15,
InstanceIPsPerENI: 30,
InstanceIPsPerENI: 50,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correct change based on the table here

@@ -2394,7 +2284,7 @@ var MachineTypes []AWSMachineTypeInfo = []AWSMachineTypeInfo{
Name: "t3a.small",
MemoryGB: 2,
Cores: 2,
InstanceENIs: 3,
InstanceENIs: 2,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correct change based on the table here

@@ -837,7 +800,7 @@ var MachineTypes []AWSMachineTypeInfo = []AWSMachineTypeInfo{
{
Name: "i3.metal",
MemoryGB: 512,
Cores: 64,
Cores: 72,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table here states "i3.metal provides 72 logical processors on 36 physical cores". So this is correct.

@@ -1697,7 +1587,7 @@ var MachineTypes []AWSMachineTypeInfo = []AWSMachineTypeInfo{
// r3 family
{
Name: "r3.large",
MemoryGB: 15.25,
MemoryGB: 15,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table here gives the 15.25 number.

@rifelpet
Copy link
Member Author

rifelpet commented Apr 5, 2020

Regarding some of these differences, I wouldn't be surprised if AWS is inconsistent in their many sources of this information.

I'm tempted to round the MemoryGB fields to 2 decimal places. That would reduce the differences to just the r3.large and t1.micro. Though grep reveals to me that MemoryGB doesnt seem to be used anywhere. I suspect it could eventually be used to provide default values for kubelet's reserved resources fields but thats not the case today.

With that, I'm not concerned about changes to the MemoryGB fields. That leaves these discrepancies:

  • Missing instance types: I'm tempted to hardcode the missing instance types into the hack script so that they'll be included by default, still with 0 values for ENI and IP information.
  • EphemeralDisks changes: I'm still searching but I don't believe the size is actually used anywhere, only the number of disks is used for device mapping.

If someone could confirm that MemoryGB and EphemeralDisks sizes aren't actually used, then I think this is about ready to merge. We could consider removing the fields (replacing EphemeralDisks with a single count integer rather than a slice of sizes) as well.

@johngmyers
Copy link
Member

Code crossreference shows the fields as being unreferenced, except that hack/machine_types uses MemoryGB to sort the instance types. Unless something is referenced through reflection.

I was thinking of suggesting the rounding.

@johngmyers
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 6, 2020
@rifelpet rifelpet changed the title [WIP] Switch to using ec2.DescribeInstanceTypes for building the MachineTypes list Switch to using ec2.DescribeInstanceTypes for building the MachineTypes list Apr 6, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 6, 2020
@justinsb
Copy link
Member

justinsb commented Apr 6, 2020

Yes, I'm reasonably sure we aren't using Memory or EphemeralDisk sizes 👍

This is great - thank you!

@k8s-ci-robot k8s-ci-robot merged commit a06cebc into kubernetes:master Apr 6, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Apr 6, 2020
@rifelpet rifelpet deleted the machine-types branch August 6, 2020 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants