Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in WaitForGuestNet? #1127

Open
Itxaka opened this issue Jul 9, 2020 · 0 comments
Open

Race condition in WaitForGuestNet? #1127

Itxaka opened this issue Jul 9, 2020 · 0 comments
Labels
acknowledged Status: Issue or Pull Request Acknowledged area/guest Area: Guest Operating System bug Type: Bug community/contribution Community: Contribution Help Wanted size/s Relative Sizing: Small
Milestone

Comments

@Itxaka
Copy link

Itxaka commented Jul 9, 2020

Terraform Version

0.12.28

vSphere Provider Version

v1.18.3

Affected Resource(s)

  • vsphere_virtual_machine

Terraform Configuration Files

N/A

Debug Output

This debug output was obtained by running a forked version with extra debug added.
You can see this version here, only changes are all the log output: https://github.com/Itxaka/terraform-provider-vsphere/blob/v1.18.3.debug/vsphere/internal/helper/virtualmachine/virtual_machine_helper.go#L275

Output from a failed run below:
https://gist.github.com/Itxaka/007124e6f6be589a7279ed39cd8eb0dd

Panic Output

None

Expected Behavior

WaitForGuestNet properly behaves when the gateway is obtained later than the IP address.

Actual Behavior

If you get IP info before you get the gateway info, the check for the routable wont ever complete due to how the code is written.

ArrayOfGuestNicInfo obtains the IP info and tries to compare it to the gateway mask but both gateways are nil because we havent got that info yet. That stops the whole WaitForGuestNet processing as it will only fire once there is a change, and there wont be any more changes to ArrayOfGuestNicInfo. It just stays on a limbo there after one failure to compare the masks.

Lets go step by step.

  • 1 - WaitForGuestNet is run and it launches a client.PropertyCollector().wait()[0] in order to detect changes in the VM guest.net and guest.ipStack properties as to detect IP/route changes
  • 2 - For each detected change, it goes into a switch statement[1] and depending if its ArrayOfGuestStackInfo or ArrayOfGuestNicInfo its dealt different
  • 3A - For ArrayOfGuestStackInfo[2] we access the data at IpRouteConfig.IpRoute.Network to try to find either 0.0.0.0 or :: and in case its found we parse IpRouteConfig.IpRoute.Network.Gateway.IpAddress to obtain the gateway address (ipv4/6)[3][4]
  • 3B - For ArrayOfGuestNicInfo[5] we access the data at IpConfig.IpAddress to obtain the IP, and once we got it we get the mask for that IP[6] and try to make sure it is the same as the gateway mask[7]
  • 4 - If the masks are the same, we return true meaning that the IP is there AND routable, otherwise we keep waiting until it matches or we time out (5 minutes)

Now the problem according to the logs up here is that step 3A does not obtain a gateway at first so the values for v4gw and v6gw are empty but step 3B runs and fails due to those values being empty but it never retries again...because why would it?
This client.PropertyCollector().wait() only fires up once there is a change, and only if the type matches it will rerun the proper path, but the values that fire up the check again not change anymore so that path is never executed again, leading to the timeout.

This is perfectly reflected in the log:

ArrayOfGuestNicInfo is run first but ipv4/ipv6 gateways are empty:

2020-07-08T16:24:47.300+0200 [DEBUG] plugin.terraform-provider-vsphere_v1.18.3_x4: 2020/07/08 16:24:47 [DEBUG]["itxaka-master-14"] IP "10.164.93.152" checked against gateways: ipv4 -> "" and ipv6 -> ""

ArrayOfGuestStackInfo is run afterwards and fills the proper ipv4 gateways but the process that should check the masks have already run once and will not run again:

2020-07-08T16:25:17.058+0200 [DEBUG] plugin.terraform-provider-vsphere_v1.18.3_x4: 2020/07/08 16:25:17 [DEBUG]["itxaka-master-14"] Got ipv4 gateway: "10.164.80.1"

[0] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L292
[1] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L298
[2] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L312
[3] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L305
[4] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L307
[5] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L312
[6] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L328
[7] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L332

Im guessing that the check for mask should be outside of the wait in a different wait that ....waits for either of the gateway vars to be filled AND the IP values. Otherwise checking values without knowing if they are gonna be filled is just asking for trouble :D

Steps to Reproduce

We can usually trigger this when our vCenter is launching several instances at the same time. But basically its due to getting the gateway later than the IP info.

Important Factoids

Nope

References

  • #0000

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@Itxaka Itxaka added the bug Type: Bug label Jul 9, 2020
@bill-rich bill-rich added acknowledged Status: Issue or Pull Request Acknowledged size/s Relative Sizing: Small community/contribution Community: Contribution Help Wanted labels Jul 23, 2020
@tenthirtyam tenthirtyam added the area/guest Area: Guest Operating System label Feb 22, 2022
@tenthirtyam tenthirtyam added this to the Backlog milestone Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged Status: Issue or Pull Request Acknowledged area/guest Area: Guest Operating System bug Type: Bug community/contribution Community: Contribution Help Wanted size/s Relative Sizing: Small
Projects
None yet
Development

No branches or pull requests

3 participants