Skip to content

fix: New LKE clusters get stuck provisioning when HA price is undefined#9558

Merged
jdamore-linode merged 4 commits intolinode:developfrom
jdamore-linode:fix-lke-create-no-ha-available
Sep 5, 2023
Merged

fix: New LKE clusters get stuck provisioning when HA price is undefined#9558
jdamore-linode merged 4 commits intolinode:developfrom
jdamore-linode:fix-lke-create-no-ha-available

Conversation

@jdamore-linode
Copy link
Contributor

Description 📝

This fixes an issue where creating an LKE Cluster in environments where REACT_APP_LKE_HIGH_AVAILABILITY_PRICE is undefined results in a cluster with nodes stuck in "provisioning" state, and the only way to get them un-stuck is to enable high availability, which is irreversible. The issue is fixed by passing false for control_plane.high_availability in cases where the environment variable is undefined.

I don't think this is strictly a Cloud Manager issue because the API does not respond with a 400, and the API docs state that control_plane.high_availability is treated as false by default. Regardless, explicitly passing false results in clusters that provision successfully, and leaving it absent seems to result in clusters that get stuck.

Major Changes 🔄

  • Pass false to control_plane.high_availability explicitly when the HA control plane prompt is not present

How to test 🧪

To Reproduce the Issue

  1. Check out develop, remove the REACT_APP_LKE_HIGH_AVAILABILITY_PRICE environment variable from your .env file if necessary, and build Cloud Manager
  2. Create an LKE cluster, observe that you're redirected to the cluster's details page
  3. Wait ~20 minutes and confirm that the Cluster's nodes are still in "provisioning" state

To Verify These Changes

  1. Check out this branch, remove the REACT_APP_LKE_HIGH_AVAILABILITY_PRICE environment variable from your .env file if necessary, and build Cloud Manager
  2. Create an LKE cluster, observe that you're redirected to the cluster's details page
  3. Confirm that the cluster finishes provisioning after a few minutes
  4. Re-add the REACT_APP_LKE_HIGH_AVAILABILITY_PRICE environment variable, create an LKE cluster with HA enabled and an LKE cluster with HA disabled, and confirm that both clusters finish provisioning after a few minutes

@jdamore-linode
Copy link
Contributor Author

I don't think we need to worry about fitting this into the 1.100 release since this issue does not impact cloud.linode.com.

@bnussman-akamai bnussman-akamai added Add'tl Approval Needed Waiting on another approval! and removed Ready for Review labels Aug 17, 2023
@TylerWJ
Copy link
Contributor

TylerWJ commented Aug 22, 2023

It took about 12 minutes to create a HA cluster with 3 shared 2 GB instances

@bnussman-akamai bnussman-akamai added Approved Multiple approvals and ready to merge! and removed Add'tl Approval Needed Waiting on another approval! labels Aug 22, 2023
@jdamore-linode jdamore-linode merged commit 76fc65e into linode:develop Sep 5, 2023
abailly-akamai pushed a commit that referenced this pull request Sep 7, 2023
…ed (#9558)

* Pass `false` for `control_plane.high_availability` when creating LKE cluster when no HA price is defined

* Added changeset: Fix stuck LKE node pools when HA Control Plane is unavailable

---------

Co-authored-by: mjac0bs <mjacobs@akamai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved Multiple approvals and ready to merge!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants