azurerm_private_endpoint fails intermittently with retriable error #2 #21293
Labels
service/network
upstream/microsoft
Indicates that there's an upstream issue blocking this issue/PR
v/3.x
Is there an existing issue for this?
Community Note
Terraform Version
1.2.3
AzureRM Provider Version
3.50.0
Affected Resource(s)/Data Source(s)
azurerm_private_endpoint
Terraform Configuration Files
Debug Output/Panic Output
`Error: waiting for creation of Private Endpoint "mwe-test-1db-pe" (Resource Group "mwe-test-01-rg"): Code="RetryableError" Message="A retryable error occurred." Details=[{"code":"ReferencedResourceNotProvisioned","message":"Cannot proceed with operation because resource /subscriptions/***/resourceGroups/mwe-test-01-rg/providers/Microsoft.Network/virtualNetworks/mwe-test-01-vnet/subnets/mwe-test-1db-subnet used by resource /subscriptions/***/resourceGroups/mwe-test-01-rg/providers/Microsoft.Network/networkInterfaces/mwe-test-1db-pe.nic.771d4852-46ce-48b3-80ed-f98344f7f778 is not in Succeeded state. Resource is in Updating state and the last operation that updated/is updating the resource is PutSubnetOperation."}]`
Expected Behaviour
Either Terraform should automatically retry retriable errors and not fail or PE interactions with Subnet should occur only when Subnet is in a Succeeded state (dependency issue?).
Actual Behaviour
Sometimes during provisioning of a private endpoint, we have seen the following error. Looking into the Azure portal, the Private Endpoint indeed exists and is working. However, we cannot just run terraform apply again, since it does not exist in state. We need to manually delete the PE first (or could manually import it).
Terraform logs show that the subnet resource creation was completed before the creation of the Private Endpoint.
Issue was first encountered when using azurerm version 3.39.1 and also was still present with the latest (at this point) version 3.50.0
Steps to Reproduce
Issue appears randomly and is present both when creating multiple Private Endpoints or a single one.
Important Factoids
As mentioned in #16182 issue - there is a higher chance to encounter the error when multiple Private Endpoints are being created in parallel, but it happens also when creating a single Private Endpoint too. We're trying to workaround the issue by deploying a time_sleep resource, dependent on the Subnet resource and adding a depends_on = property on Private Endpoint resource
References
The bug is pretty much the same as described in an already closed #16182 issue.
The text was updated successfully, but these errors were encountered: