Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider all possible cluster states before passing them to StateChangeConf #1114

Conversation

furkatgofurov7
Copy link
Contributor

@furkatgofurov7 furkatgofurov7 commented May 4, 2023

Issue:

#1003
rancher/eks-operator#84

Problem

When importing an EKS cluster using the terraform provider, sometimes rancher cluster reconciliation loop can be fast and go ahead with cluster creation (going from "pending" to "active" too fast) and the provider can miss it, resulting on provider waiting for the cluster to be in the "pending" state even though it went pass that state long before and provider simply missed that

Solution

Instead of predefining the expectedState beforehand and overwriting it later if need to be, define it as slice literal with keeping the same state ("active") and append the new states later if need to be (i.e. if cluster.Driver == clusterDriverImported || (cluster.Driver == clusterDriverEKSV2 && cluster.EKSConfig.Imported) == True) and pass it to
StateChangeConf struct. Anyways, StateChangeConf struct expects Target to be []string so nothing is changed behaviour wise but just all possible cluster states are considered before passing them.

Copy link
Contributor

@mjura mjura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@furkatgofurov7 furkatgofurov7 self-assigned this May 16, 2023
Copy link
Contributor

@a-blender a-blender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@furkatgofurov7 I see that you were unable to repro this bug after multiple attempts #1003 (comment) and the original author of the issue saw a 30 sec timeout in terraform apply but the cluster still became active in about a minute. This was with using the latest terraform v3.0.0.

  1. Is it still an issue for them?

  2. What evidence is there that not predefining the expectedState in the provider fixes this particular issue? I'm fine to have it go in as a noted enhancement but I don't see how this explicitly resolves Intermittently imports of EKS clusters never finish #1003.

@furkatgofurov7
Copy link
Contributor Author

furkatgofurov7 commented May 19, 2023

  1. Is it still an issue for them?

Hey @a-blender thanks for review, as per #1003 (comment) it should be still an issue

  1. What evidence is there that not predefining the expectedState in the provider fixes this particular issue? I'm fine to have it go in as a noted enhancement but I don't see how this explicitly resolves Intermittently imports of EKS clusters never finish #1003.

Based on the logs from the linked issue description:

Error: [ERROR] waiting for cluster (c-xfbkg) to be created: timeout while waiting for state to become 'pending' (last state: 'active', timeout: 30m0s)

and this comment, considering all possible cluster states should not hurt and in case of provider missing to catch the state of the rancher it will still be able to move on.
Also, I forgot to mention that, the piece of code where expectedState is predefined beforehand appears only in this case, in all other cases, i.e clusterUpdate the target state is a slice with bunch of different states, so from that perspective this should align also with that.

Since the reproduction was not possible, maybe we should take this in as an improvement to the provider, WDYT?

@mbologna mbologna requested a review from a-blender May 23, 2023 10:25
@furkatgofurov7
Copy link
Contributor Author

@a-blender can we merge this, based on #1003 (comment) it helps to fix the issue in #1003 ?

@furkatgofurov7 furkatgofurov7 merged commit 20adef8 into rancher:master Jul 11, 2023
1 check passed
@furkatgofurov7 furkatgofurov7 deleted the fix-cluster-create-target-state-logic branch July 11, 2023 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants