Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managed Node Group launch template disk size fix #1229

Conversation

JustASquid
Copy link
Contributor

Proposed changes

Correctly handle disk size changes when a launch template is created by ManagedNodeGroup

Related issues (optional)

Resolves #1228

Copy link

github-actions bot commented Jul 3, 2024

PR is now waiting for a maintainer to run the acceptance tests. This PR will only perform build and linting.
Note for the maintainer: To run the acceptance tests, please comment /run-acceptance-tests on the PR

Copy link
Contributor

@flostadler flostadler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR!
Could you please add a test for this? Let me know if you need help with that; https://github.com/pulumi/pulumi-eks/blob/master/examples/examples_nodejs_test.go#L415 is a good example of how to create an additional test.

nodejs/eks/nodegroup.ts Outdated Show resolved Hide resolved
Copy link

PR is now waiting for a maintainer to run the acceptance tests. This PR will only perform build and linting.
Note for the maintainer: To run the acceptance tests, please comment /run-acceptance-tests on the PR

@JustASquid
Copy link
Contributor Author

Hey @flostadler, I had a crack at making a test (I haven't run it yet), does this look like what you'd be looking for?

Copy link
Contributor

@flostadler flostadler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution! Mostly some small nits.
Once you've added the new tests to the test matrix I'll start an integration test run :)

examples/tests/managed-ng-disk-size/index.ts Outdated Show resolved Hide resolved
With(integration.ProgramTestOptions{
Dir: path.Join(getCwd(t), "tests", "managed-ng-disk-size"),
ExtraRuntimeValidation: func(t *testing.T, info integration.RuntimeValidationStackInfo) {
utils.RunEKSSmokeTest(t,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the issue before that the nodes where booted with the wrong disk size or that the creation of the managed node group failed?

If it's the former we'd need to add additional verification to ensure the disk size is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They booted with the wrong disk size, so yeah, would be good to check. Is there another test I can look at that does similar verification, so I can use that as a basis?

Copy link
Contributor

@flostadler flostadler Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could create a similar func like this one here that's checking if the node is ready:

func IsNodeReady(t *testing.T, clientset *kubernetes.Clientset, node *corev1.Node) (bool, *corev1.Node) {
// Attempt to retrieve Node.
o, err := clientset.CoreV1().Nodes().Get(context.TODO(), node.Name, metav1.GetOptions{})
if err != nil {
return false, nil
}
// Check the returned Node's conditions for readiness.
for _, condition := range o.Status.Conditions {
if condition.Type == corev1.NodeReady {
t.Logf("Checking if Node %q is Ready | Condition.Status: %q | Condition: %v\n", node.Name, condition.Status, condition)
return condition.Status == corev1.ConditionTrue, o
}
}
return false, o
}

It could do something like:

var desiredSizeGB int64 = 50
nodeEphemeralStorage := o.Status.Capacity[corev1.ResourceEphemeralStorage]
return nodeEphemeralStorage.CmpInt64(desiredSizeGB * 1_000_000_000) >= 0

Alternatively we could check the EC2 instances directly using AWS APIs, but I feel like the kube only approach is cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a change to use the utils.ValidateNodes function. It's my first time using Go and I don't have a way to actually run the code so I'm sure it's missing something but hopefully it's a start?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll kick off the tests :)

examples/examples_nodejs_test.go Outdated Show resolved Hide resolved
examples/examples_nodejs_test.go Show resolved Hide resolved
@@ -1836,9 +1839,19 @@ Content-Type: text/x-shellscript; charset="us-ascii"
? { httpTokens: "required", httpPutResponseHopLimit: 2, httpEndpoint: "enabled" }
: undefined;

const blockDeviceMappings = [
{
deviceName: "/dev/xvda",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mental note for myself: This works only for AL2 and AL2023 based AMIs, which is fine right now because we only support AL2. Will need to be taken into account when working on #1195

Copy link

github-actions bot commented Aug 9, 2024

PR is now waiting for a maintainer to run the acceptance tests. This PR will only perform build and linting.
Note for the maintainer: To run the acceptance tests, please comment /run-acceptance-tests on the PR

Copy link

PR is now waiting for a maintainer to run the acceptance tests. This PR will only perform build and linting.
Note for the maintainer: To run the acceptance tests, please comment /run-acceptance-tests on the PR

@flostadler
Copy link
Contributor

/run-acceptance-tests

@pulumi-bot
Copy link
Contributor

Please view the PR build - https://github.com/pulumi/pulumi-eks/actions/runs/10366894620

Copy link

PR is now waiting for a maintainer to run the acceptance tests. This PR will only perform build and linting.
Note for the maintainer: To run the acceptance tests, please comment /run-acceptance-tests on the PR

@flostadler
Copy link
Contributor

@JustASquid I had to merge in some changes from the master branch because they were required for the build to work. I'll re-run the tests now

@flostadler
Copy link
Contributor

/run-acceptance-tests

@pulumi-bot
Copy link
Contributor

Please view the PR build - https://github.com/pulumi/pulumi-eks/actions/runs/10368370295

@flostadler
Copy link
Contributor

@JustASquid this seems to have broken managed node groups:

      	* creating EKS Node Group (example-managed-nodegroups-eksCluster-b3afc1b:example-managed-ng0-f56bd8b): operation error EKS: CreateNodegroup, https response error StatusCode: 400, RequestID: 2f56b595-7e06-428e-ba3c-cc309d169e6a, InvalidRequestException: The request must contain the parameter ebs

https://github.com/pulumi/pulumi-eks/actions/runs/10368370295/job/28702007770#step:21:519

I can have a look later today to see what exactly is going wrong here

@flostadler
Copy link
Contributor

@JustASquid After debugging this, it seems that we need an additional check whether disksize is defined here: https://github.com/pulumi/pulumi-eks/pull/1229/files#diff-fa11b07a99c4eb8e85f987e265e143389230bafb7636b3b6bab10cf6158a120eR1847
If it's not defined we shouldn't pass blockDeviceMappings to the launch template.

Do you want to add this change or should I add it on top of this PR?

@JustASquid
Copy link
Contributor Author

@flostadler I just pushed a fix.

Copy link

PR is now waiting for a maintainer to run the acceptance tests. This PR will only perform build and linting.
Note for the maintainer: To run the acceptance tests, please comment /run-acceptance-tests on the PR

@flostadler
Copy link
Contributor

/run-acceptance-tests

@pulumi-bot
Copy link
Contributor

Please view the PR build - https://github.com/pulumi/pulumi-eks/actions/runs/10472336069

@flostadler
Copy link
Contributor

@JustASquid The existing tests are happy now, but I found that newly added tests aren't being executed for community PRs. I ran the MNG_DiskSize test locally and there were some minor issues in the test setup. I fixed those and created a new PR with the combined changes #1305. But don't worry, you're gonna get all the credit for this fix and excellent tests!

Thanks again for this contribution

@flostadler flostadler closed this Aug 20, 2024
flostadler added a commit that referenced this pull request Aug 21, 2024
This fixes the handling of disk size changes when a custom launch
template is created for the MNG.
It is based on the Community PR #1229 by @JustASquid, I just added some
test fixes on top.

Fixes #1228

---------

Co-authored-by: Daniel West <daniel.west.279@gmail.com>
flostadler added a commit that referenced this pull request Sep 4, 2024
This fixes the handling of disk size changes when a custom launch
template is created for the MNG.
It is based on the Community PR #1229 by @JustASquid, I just added some
test fixes on top.

Fixes #1228

---------

Co-authored-by: Daniel West <daniel.west.279@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot specify Managed Node Group DiskSize when a launch template is created
3 participants