Skip to content

fix(test): make Test_TerraformRecipe_AzureResourceGroup safe under concurrent CI#12062

Merged
sylvainsf merged 1 commit into
mainfrom
fix/terraform-azurerg-concurrency
Jun 5, 2026
Merged

fix(test): make Test_TerraformRecipe_AzureResourceGroup safe under concurrent CI#12062
sylvainsf merged 1 commit into
mainfrom
fix/terraform-azurerg-concurrency

Conversation

@sylvainsf

Copy link
Copy Markdown
Contributor

Why

Test_TerraformRecipe_AzureResourceGroup started flaking on cloud PRs (most recently the dependabot PR #12037) with:

Error: A resource with the ID "/subscriptions/.../resourceGroups/tfrgt2fpfmmhcn44c" already exists.
To be managed via Terraform this resource needs to be imported into the State.

Root cause is not a leaked RG — it is a deterministic name collision between concurrent CI runs.

The terraform recipe deployed by the test names its Azure resource group:

name: 'tfrg${uniqueString(resourceGroup().id)}'

resourceGroup() here is the Radius RG, which is hard-coded to kind-radius in every functional test cluster. So uniqueString(resourceGroup().id) resolves to the same value (t2fpfmmhcn44c) on every PR, every run, in every CI invocation against the shared test subscription. The Azure RG name is therefore globally constant.

That is fine when only one cloud job runs at a time. It breaks the moment two cloud jobs overlap (or one starts inside the ARM eventual-consistency window of another's destroy) because the azurerm provider's RequiresImport preflight GET sees a still-tearing-down ghost RG and refuses to create.

Verified from the failing run's applications-rp log on PR #12037 — the failing terraform create at 18:07:45 immediately followed a successful destroy from the fix/recipe-tag-version cloud job that completed at 18:07:01 in the same subscription.

What this PR changes

  1. testdata/corerp-resources-terraform-azurerg.bicep — add a new uniqueSeed bicep param and mix it into the recipe parameter:

    name: 'tfrg${uniqueString(resourceGroup().id, uniqueSeed)}'
  2. recipe_terraform_test.go — pass the workflow-provided UNIQUE_ID env var as uniqueSeed:

    step.NewDeployExecutor(template, ..., "appName="+appName, "uniqueSeed="+os.Getenv("UNIQUE_ID"))

    UNIQUE_ID is already exposed on the cloud-test job by .github/workflows/functional-test-cloud.yaml, so every CI invocation now gets a distinct Azure RG name. Local runs (no UNIQUE_ID set) fall back to empty seed and behave exactly as today.

  3. .github/workflows/purge-azure-test-resources.yaml — extend the sweeper allowlist to include the tfrg prefix so any RG that does leak (e.g. recipe destroy killed mid-apply) is reaped by the scheduled purge job instead of accumulating in the test subscription.

Test plan

Risk

Low. Bicep param defaults to empty string so the behavior change is opt-in via the test executor. The only callers that pass a non-empty seed are CI runs that should already have been getting distinct names. Worst case the seed is empty and we are back to today's deterministic name.

Copilot AI review requested due to automatic review settings June 5, 2026 21:21
@sylvainsf sylvainsf requested review from a team as code owners June 5, 2026 21:21
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses flakiness in the cloud functional test Test_TerraformRecipe_AzureResourceGroup caused by deterministic Azure Resource Group naming that collides across concurrent CI runs in a shared subscription. It introduces a per-run seed that is mixed into the RG name and updates the scheduled purge workflow to include the new test RG prefix.

Changes:

  • Add a uniqueSeed Bicep parameter and include it in the uniqueString(...) used to name the Terraform-provisioned Azure resource group.
  • Pass UNIQUE_ID from the CI environment into the deployment as uniqueSeed for the affected cloud test.
  • Expand the Azure purge workflow allowlist to include tfrg* resource groups (note: see inline comment for a regex issue).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
test/functional-portable/corerp/cloud/resources/testdata/corerp-resources-terraform-azurerg.bicep Adds uniqueSeed and uses it to make the Azure RG name vary per run.
test/functional-portable/corerp/cloud/resources/recipe_terraform_test.go Passes UNIQUE_ID into the template parameters to avoid cross-run collisions in CI.
.github/workflows/purge-azure-test-resources.yaml Attempts to allowlist tfrg* resource groups for cleanup by the scheduled purge job.

Comment thread .github/workflows/purge-azure-test-resources.yaml
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown

Unit Tests

    2 files  ±0    438 suites  ±0   7m 20s ⏱️ -8s
5 322 tests ±0  5 320 ✅ ±0  2 💤 ±0  0 ❌ ±0 
6 476 runs  ±0  6 474 ✅ ±0  2 💤 ±0  0 ❌ ±0 

Results for commit 6372b5d. ± Comparison against base commit a75a40e.

♻️ This comment has been updated with latest results.

@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.14%. Comparing base (a75a40e) to head (6372b5d).

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #12062   +/-   ##
=======================================
  Coverage   52.13%   52.14%           
=======================================
  Files         734      734           
  Lines       46704    46704           
=======================================
+ Hits        24350    24352    +2     
+ Misses      20017    20016    -1     
+ Partials     2337     2336    -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…ncurrent CI

The recipe used by Test_TerraformRecipe_AzureResourceGroup creates an Azure
resource group whose name was derived from uniqueString(resourceGroup().id),
where resourceGroup() is the Radius RG hard-coded to "kind-radius" in every CI
run. That made the Azure RG name deterministic across runs and across PRs
sharing the test subscription, so concurrent runs (or one run firing inside
the ARM eventual-consistency window of another's destroy) collide with
azurerm's RequiresImport preflight and fail with "A resource with the ID ...
already exists".

Mix a per-run seed (UNIQUE_ID, already produced by the workflow) into the
uniqueString call so each run gets a distinct Azure RG name.

Also expand the purge-azure-test-resources sweeper allowlist to include the
tfrg* prefix so any RG that does leak (e.g. recipe destroy killed mid-apply)
is cleaned up by the scheduled job instead of living forever in the test
subscription.

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
@sylvainsf sylvainsf force-pushed the fix/terraform-azurerg-concurrency branch from 593c027 to 6372b5d Compare June 5, 2026 22:13
@radius-functional-tests

radius-functional-tests Bot commented Jun 5, 2026

Copy link
Copy Markdown

Radius functional test overview

🔍 Go to test action run

Click here to see the test run details
Name Value
Repository radius-project/radius
Commit ref 6372b5d
Unique ID funcb53fcf4ac3
Image tag pr-funcb53fcf4ac3
  • gotestsum 1.13.0
  • KinD: v0.29.0
  • Dapr: 1.14.4
  • Azure KeyVault CSI driver: 1.4.2
  • Azure Workload identity webhook: 1.3.0
  • Bicep recipe location ghcr.io/radius-project/dev/test/testrecipes/test-bicep-recipes/<name>:pr-funcb53fcf4ac3
  • Terraform recipe location http://tf-module-server.radius-test-tf-module-server.svc.cluster.local/<name>.zip (in cluster)
  • applications-rp test image location: ghcr.io/radius-project/dev/applications-rp:pr-funcb53fcf4ac3
  • dynamic-rp test image location: ghcr.io/radius-project/dev/dynamic-rp:pr-funcb53fcf4ac3
  • controller test image location: ghcr.io/radius-project/dev/controller:pr-funcb53fcf4ac3
  • ucp test image location: ghcr.io/radius-project/dev/ucpd:pr-funcb53fcf4ac3
  • deployment-engine test image location: ghcr.io/radius-project/deployment-engine:latest

Test Status

⌛ Building Radius and pushing container images for functional tests...
✅ ucp-cloud functional tests succeeded
✅ Container images build succeeded
⌛ Publishing Bicep Recipes for functional tests...
✅ Recipe publishing succeeded
⌛ Starting ucp-cloud functional tests...
⌛ Starting corerp-cloud functional tests...
✅ ucp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded

@sylvainsf sylvainsf merged commit 217df75 into main Jun 5, 2026
60 checks passed
@sylvainsf sylvainsf deleted the fix/terraform-azurerg-concurrency branch June 5, 2026 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants