Summary
The Show DNS Configuration step in .github/workflows/_deploy-infrastructure.yml (and the matching block in cloud-infrastructure/cluster/deploy-cluster.sh) uses the az containerapp extension to read the Container Apps Environment, then parses the result with jq. This produces two failure modes that both prevent the operator from seeing the DNS records they need:
- Crash (exit 5). The step runs under
bash -e. az containerapp is an extension that dynamic-installs on first use, printing a
version-prefixed progress banner to stderr. The code captured output with 2>&1, merging that banner into the JSON variable, so jq failed:
jq: parse error: Invalid numeric literal at line 1, column 6
Error: Process completed with exit code 5.
- Silent false-negative. Guarding the
jq call (validate JSON first) stops the crash but exposes a second bug: when the extension command returns
empty/errors on the runner, the step falls through to "DNS configuration instructions will be shown after the Container Apps Environment is created" —
even when the environment already exists. The records are silently hidden.
Root cause
The az containerapp extension is the wrong tool here:
- It dynamic-installs (banner → crash vector under
2>&1).
- Its availability on the runner is non-deterministic (install can fail/be skipped → false-negative).
The data we need lives on the plain ARM resource and is readable with the core CLI — no extension, no dynamic-install, no banner:
Microsoft.App/managedEnvironments → .properties.customDomainConfiguration.customDomainVerificationId, .properties.defaultDomain
Microsoft.App/containerApps → .properties.configuration.ingress.customDomains
Suggested fix
Replace az containerapp env show / az containerapp show with core az resource show --resource-type …, keeping stdout-only capture (-o json 2>/dev/null) and jq empty validation as defense-in-depth. Affected calls:
_deploy-infrastructure.yml: env_details, app_gateway_details, back_office_details
deploy-cluster.sh: the env_details read in the InvalidCustomHostNameValidation / FailedCnameValidation branch
Verified locally with no extension installed: core az resource show returns valid JSON and the step prints the correct TXT/CNAME records and exits
0.
Reproduction (no Azure required, for the crash)
Run as a real bash -e script (don't wrap in &&/|| — that disables set -e and hides it):
cat > /tmp/repro.sh <<'EOF'
set -e
env_details="0.2.3 The command requires the extension containerapp" # az dynamic-install banner via 2>&1
[[ "$env_details" != "" ]] && echo "$env_details" | jq -r '.properties.defaultDomain'
EOF
bash /tmp/repro.sh; echo "exit: $?"
# -> jq: parse error: Invalid numeric literal at line 1, column 6 ; exit: 5
Environment
- GitHub-hosted runner,
bash -e
- Azure CLI with
az containerapp extension (dynamic-install)
AI prompt
In .github/workflows/_deploy-infrastructure.yml ("Show DNS Configuration" step,
runs under bash -e) and cloud-infrastructure/cluster/deploy-cluster.sh, the code
reads the Container Apps Environment with the az containerapp EXTENSION and parses
it with jq. This has two bugs:
- Crash:
az containerapp dynamic-installs on first use and prints a
version-prefixed banner to stderr; the old 2>&1 merged it into the JSON, so jq
died ("Invalid numeric literal ... column 6") and bash -e aborted with exit 5.
- False-negative: when the extension command returns empty on the runner, the step
falls through to "instructions will be shown after the environment is created"
even though the environment exists, hiding the DNS records.
Fix: replace the az containerapp extension calls with the CORE az resource show,
which needs no extension and no dynamic install. Same ARM property paths:
- managedEnvironments:
az resource show --name "$RG" --resource-group "$RG" \
--resource-type Microsoft.App/managedEnvironments -o json
-> .properties.customDomainConfiguration.customDomainVerificationId
-> .properties.defaultDomain
- containerApps (app-gateway / back-office):
az resource show --name app-gateway --resource-group "$RG" \
--resource-type Microsoft.App/containerApps -o json
-> .properties.configuration.ingress.customDomains
Apply to these reads:
- _deploy-infrastructure.yml: env_details, app_gateway_details, back_office_details
- deploy-cluster.sh: env_details in the InvalidCustomHostNameValidation /
FailedCnameValidation branch
Keep stdout-only capture (-o json 2>/dev/null || echo "") and validate with
[[ -n "$x" ]] && echo "$x" | jq empty 2>/dev/null before parsing, falling through
to the existing "not created yet" message when missing/invalid. Do not add any
az extension add. Leave az containerapp revision list alone (it uses --query/-o
tsv, not jq). Keep printed record formats identical.
Verify: run the env/app-gateway branch as a standalone bash -e script (NOT wrapped
in &&/||) against a real environment with the containerapp extension NOT installed;
confirm it prints the TXT/CNAME records and exits 0.
Severity
Low
Is this bug security related?
Code of Conduct
Summary
The Show DNS Configuration step in
.github/workflows/_deploy-infrastructure.yml(and the matching block incloud-infrastructure/cluster/deploy-cluster.sh) uses theaz containerappextension to read the Container Apps Environment, then parses the result withjq. This produces two failure modes that both prevent the operator from seeing the DNS records they need:bash -e.az containerappis an extension that dynamic-installs on first use, printing aversion-prefixed progress banner to stderr. The code captured output with
2>&1, merging that banner into the JSON variable, sojqfailed:jqcall (validate JSON first) stops the crash but exposes a second bug: when the extension command returnsempty/errors on the runner, the step falls through to "DNS configuration instructions will be shown after the Container Apps Environment is created" —
even when the environment already exists. The records are silently hidden.
Root cause
The
az containerappextension is the wrong tool here:2>&1).The data we need lives on the plain ARM resource and is readable with the core CLI — no extension, no dynamic-install, no banner:
Microsoft.App/managedEnvironments→.properties.customDomainConfiguration.customDomainVerificationId,.properties.defaultDomainMicrosoft.App/containerApps→.properties.configuration.ingress.customDomainsSuggested fix
Replace
az containerapp env show/az containerapp showwith coreaz resource show --resource-type …, keeping stdout-only capture (-o json 2>/dev/null) andjq emptyvalidation as defense-in-depth. Affected calls:_deploy-infrastructure.yml:env_details,app_gateway_details,back_office_detailsdeploy-cluster.sh: theenv_detailsread in theInvalidCustomHostNameValidation/FailedCnameValidationbranchVerified locally with no extension installed: core
az resource showreturns valid JSON and the step prints the correct TXT/CNAME records and exits0.
Reproduction (no Azure required, for the crash)
Run as a real
bash -escript (don't wrap in&&/||— that disablesset -eand hides it):Environment
bash -eaz containerappextension (dynamic-install)AI prompt
In .github/workflows/_deploy-infrastructure.yml ("Show DNS Configuration" step,
runs under
bash -e) and cloud-infrastructure/cluster/deploy-cluster.sh, the codereads the Container Apps Environment with the
az containerappEXTENSION and parsesit with jq. This has two bugs:
az containerappdynamic-installs on first use and prints aversion-prefixed banner to stderr; the old
2>&1merged it into the JSON, so jqdied ("Invalid numeric literal ... column 6") and
bash -eaborted with exit 5.falls through to "instructions will be shown after the environment is created"
even though the environment exists, hiding the DNS records.
Fix: replace the
az containerappextension calls with the COREaz resource show,which needs no extension and no dynamic install. Same ARM property paths:
Apply to these reads:
- _deploy-infrastructure.yml: env_details, app_gateway_details, back_office_details
- deploy-cluster.sh: env_details in the InvalidCustomHostNameValidation /
FailedCnameValidation branch
Keep stdout-only capture (
-o json 2>/dev/null || echo "") and validate with[[ -n "$x" ]] && echo "$x" | jq empty 2>/dev/nullbefore parsing, falling throughto the existing "not created yet" message when missing/invalid. Do not add any
az extension add. Leaveaz containerapp revision listalone (it uses --query/-otsv, not jq). Keep printed record formats identical.
Verify: run the env/app-gateway branch as a standalone
bash -escript (NOT wrappedin &&/||) against a real environment with the containerapp extension NOT installed;
confirm it prints the TXT/CNAME records and exits 0.
Severity
Low
Is this bug security related?
Code of Conduct