fix(test): make Test_TerraformRecipe_AzureResourceGroup safe under concurrent CI#12062
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
This PR addresses flakiness in the cloud functional test Test_TerraformRecipe_AzureResourceGroup caused by deterministic Azure Resource Group naming that collides across concurrent CI runs in a shared subscription. It introduces a per-run seed that is mixed into the RG name and updates the scheduled purge workflow to include the new test RG prefix.
Changes:
- Add a
uniqueSeedBicep parameter and include it in theuniqueString(...)used to name the Terraform-provisioned Azure resource group. - Pass
UNIQUE_IDfrom the CI environment into the deployment asuniqueSeedfor the affected cloud test. - Expand the Azure purge workflow allowlist to include
tfrg*resource groups (note: see inline comment for a regex issue).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
test/functional-portable/corerp/cloud/resources/testdata/corerp-resources-terraform-azurerg.bicep |
Adds uniqueSeed and uses it to make the Azure RG name vary per run. |
test/functional-portable/corerp/cloud/resources/recipe_terraform_test.go |
Passes UNIQUE_ID into the template parameters to avoid cross-run collisions in CI. |
.github/workflows/purge-azure-test-resources.yaml |
Attempts to allowlist tfrg* resource groups for cleanup by the scheduled purge job. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #12062 +/- ##
=======================================
Coverage 52.13% 52.14%
=======================================
Files 734 734
Lines 46704 46704
=======================================
+ Hits 24350 24352 +2
+ Misses 20017 20016 -1
+ Partials 2337 2336 -1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…ncurrent CI The recipe used by Test_TerraformRecipe_AzureResourceGroup creates an Azure resource group whose name was derived from uniqueString(resourceGroup().id), where resourceGroup() is the Radius RG hard-coded to "kind-radius" in every CI run. That made the Azure RG name deterministic across runs and across PRs sharing the test subscription, so concurrent runs (or one run firing inside the ARM eventual-consistency window of another's destroy) collide with azurerm's RequiresImport preflight and fail with "A resource with the ID ... already exists". Mix a per-run seed (UNIQUE_ID, already produced by the workflow) into the uniqueString call so each run gets a distinct Azure RG name. Also expand the purge-azure-test-resources sweeper allowlist to include the tfrg* prefix so any RG that does leak (e.g. recipe destroy killed mid-apply) is cleaned up by the scheduled job instead of living forever in the test subscription. Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
593c027 to
6372b5d
Compare
Radius functional test overviewClick here to see the test run details
Test Status⌛ Building Radius and pushing container images for functional tests... |
Why
Test_TerraformRecipe_AzureResourceGroupstarted flaking on cloud PRs (most recently the dependabot PR #12037) with:Root cause is not a leaked RG — it is a deterministic name collision between concurrent CI runs.
The terraform recipe deployed by the test names its Azure resource group:
resourceGroup()here is the Radius RG, which is hard-coded tokind-radiusin every functional test cluster. SouniqueString(resourceGroup().id)resolves to the same value (t2fpfmmhcn44c) on every PR, every run, in every CI invocation against the shared test subscription. The Azure RG name is therefore globally constant.That is fine when only one cloud job runs at a time. It breaks the moment two cloud jobs overlap (or one starts inside the ARM eventual-consistency window of another's destroy) because the
azurermprovider'sRequiresImportpreflightGETsees a still-tearing-down ghost RG and refuses to create.Verified from the failing run's
applications-rplog on PR #12037 — the failing terraform create at 18:07:45 immediately followed a successful destroy from thefix/recipe-tag-versioncloud job that completed at 18:07:01 in the same subscription.What this PR changes
testdata/corerp-resources-terraform-azurerg.bicep— add a newuniqueSeedbicep param and mix it into the recipe parameter:recipe_terraform_test.go— pass the workflow-providedUNIQUE_IDenv var asuniqueSeed:UNIQUE_IDis already exposed on the cloud-test job by.github/workflows/functional-test-cloud.yaml, so every CI invocation now gets a distinct Azure RG name. Local runs (noUNIQUE_IDset) fall back to empty seed and behave exactly as today..github/workflows/purge-azure-test-resources.yaml— extend the sweeper allowlist to include thetfrgprefix so any RG that does leak (e.g. recipe destroy killed mid-apply) is reaped by the scheduled purge job instead of accumulating in the test subscription.Test plan
gofmt -d,go build,go vetof the test package are clean locally.mainso its corerp-cloud run picks up the fix and demonstrates the failure is gone.Risk
Low. Bicep param defaults to empty string so the behavior change is opt-in via the test executor. The only callers that pass a non-empty seed are CI runs that should already have been getting distinct names. Worst case the seed is empty and we are back to today's deterministic name.