Race condition in WaitForGuestNet? #1127
Labels
acknowledged
Status: Issue or Pull Request Acknowledged
area/guest
Area: Guest Operating System
bug
Type: Bug
community/contribution
Community: Contribution Help Wanted
size/s
Relative Sizing: Small
Milestone
Terraform Version
0.12.28
vSphere Provider Version
v1.18.3
Affected Resource(s)
vsphere_virtual_machine
Terraform Configuration Files
N/A
Debug Output
This debug output was obtained by running a forked version with extra debug added.
You can see this version here, only changes are all the log output: https://github.com/Itxaka/terraform-provider-vsphere/blob/v1.18.3.debug/vsphere/internal/helper/virtualmachine/virtual_machine_helper.go#L275
Output from a failed run below:
https://gist.github.com/Itxaka/007124e6f6be589a7279ed39cd8eb0dd
Panic Output
None
Expected Behavior
WaitForGuestNet properly behaves when the gateway is obtained later than the IP address.
Actual Behavior
If you get IP info before you get the gateway info, the check for the routable wont ever complete due to how the code is written.
ArrayOfGuestNicInfo
obtains the IP info and tries to compare it to the gateway mask but both gateways arenil
because we havent got that info yet. That stops the wholeWaitForGuestNet
processing as it will only fire once there is a change, and there wont be any more changes toArrayOfGuestNicInfo
. It just stays on a limbo there after one failure to compare the masks.Lets go step by step.
WaitForGuestNet
is run and it launches aclient.PropertyCollector().wait()
[0] in order to detect changes in the VMguest.net
andguest.ipStack
properties as to detect IP/route changesArrayOfGuestStackInfo
orArrayOfGuestNicInfo
its dealt differentArrayOfGuestStackInfo
[2] we access the data atIpRouteConfig.IpRoute.Network
to try to find either0.0.0.0
or::
and in case its found we parseIpRouteConfig.IpRoute.Network.Gateway.IpAddress
to obtain the gateway address (ipv4/6)[3][4]ArrayOfGuestNicInfo
[5] we access the data atIpConfig.IpAddress
to obtain the IP, and once we got it we get the mask for that IP[6] and try to make sure it is the same as the gateway mask[7]Now the problem according to the logs up here is that step 3A does not obtain a gateway at first so the values for v4gw and v6gw are empty but step 3B runs and fails due to those values being empty but it never retries again...because why would it?
This
client.PropertyCollector().wait()
only fires up once there is a change, and only if the type matches it will rerun the proper path, but the values that fire up the check again not change anymore so that path is never executed again, leading to the timeout.This is perfectly reflected in the log:
ArrayOfGuestNicInfo
is run first but ipv4/ipv6 gateways are empty:2020-07-08T16:24:47.300+0200 [DEBUG] plugin.terraform-provider-vsphere_v1.18.3_x4: 2020/07/08 16:24:47 [DEBUG]["itxaka-master-14"] IP "10.164.93.152" checked against gateways: ipv4 -> "" and ipv6 -> ""
ArrayOfGuestStackInfo
is run afterwards and fills the proper ipv4 gateways but the process that should check the masks have already run once and will not run again:2020-07-08T16:25:17.058+0200 [DEBUG] plugin.terraform-provider-vsphere_v1.18.3_x4: 2020/07/08 16:25:17 [DEBUG]["itxaka-master-14"] Got ipv4 gateway: "10.164.80.1"
[0] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L292
[1] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L298
[2] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L312
[3] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L305
[4] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L307
[5] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L312
[6] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L328
[7] hashicorp/terraform-provider-vsphere:vsphere/internal/helper/virtualmachine/virtual_machine_helper.go@v1.18.3#L332
Im guessing that the check for mask should be outside of the
wait
in a differentwait
that ....waits for either of the gateway vars to be filled AND the IP values. Otherwise checking values without knowing if they are gonna be filled is just asking for trouble :DSteps to Reproduce
We can usually trigger this when our vCenter is launching several instances at the same time. But basically its due to getting the gateway later than the IP info.
Important Factoids
Nope
References
Community Note
The text was updated successfully, but these errors were encountered: