feat(commons/azure): Workload Identity for cert-manager and external-dns, with Service Principal fallback#361
Merged
Merged
Conversation
Adds azure_client_secret variable for environments using service principal auth instead of workload identity, and wires it into the Azure DNS01 solver configuration in the ClusterIssuer.
Adds azure_* variables and wires Azure credentials into the external-dns helm values. Supports both service principal and workload identity flows.
…rains The istiod chart installs a PDB with minAvailable=1. A single replica istiod blocks node drains (e.g. during EKS AMI upgrades). Setting both replicaCount and autoscaleMin to 2 prevents the HPA from scaling back to 1.
…configurable Add azure_workload_identity_enabled (default: true) to cert_manager and external_dns. When false, WI annotations and pod labels are omitted, azure_client_id is not required, and useWorkloadIdentityExtension is set to false in the azure.json secret.
… is disabled cert_manager and external_dns now support both auth modes for Azure: - WI (default): azure_workload_identity_enabled=true, uses SA annotation + pod label - SP (opt-out): azure_workload_identity_enabled=false, requires azure_client_secret azure_client_id is now always required for Azure regardless of auth mode.
…y is enabled When azure_workload_identity_enabled=true (default), callers must pass azure_federated_credential_id from module.iam. This enforces that the Azure AD federation exists before the Helm release runs, and creates the correct apply-time dependency ordering automatically via Tofu references.
4 tasks
cert_manager only manages helm_release resources and a terraform_data for validation — no kubernetes_* resources. Removing the unused provider keeps the dependency graph minimal and drops the kubernetes provider lock hash from the module.
sebastiancorrea81
approved these changes
May 20, 2026
gdrojas
added a commit
that referenced
this pull request
May 20, 2026
Make the change backwards-compatible: existing Azure callers passing azure_client_secret keep working without any code change. Workload Identity is now opt-in by setting azure_workload_identity_enabled = true (which then requires azure_federated_credential_id). This diverges from the cert_manager / external_dns default in #361 (those default to WI = true) on purpose — the agent module has more deployed callers, so the conservative default avoids breaking them at plan time on the first apply after the bump.
gdrojas
added a commit
that referenced
this pull request
May 21, 2026
…back The agent module unconditionally required azure_client_secret and injected AZURE_CLIENT_SECRET into the Helm Secret — even when callers wanted to use Workload Identity. The downstream service scripts in nullplatform/services don't actually consume the secret today (the SP auth path in azure-cosmos-db/scripts/azure/resolve_azure_context is commented out and the azurerm provider relies on ARM_USE_OIDC / ARM_USE_MSI env wiring), so the over-validation was a strict regression for WI users. Mirrors the PR #361 pattern applied to cert_manager and external_dns: - azure_workload_identity_enabled (default true) gates the auth mode - azure_federated_credential_id is required when WI is enabled — pass the id output of an infrastructure/azure/iam module instance to enforce ordering between the federated identity credential and the agent SA - azure_client_secret stays optional unless the caller opts out of WI - ServiceAccount gets azure.workload.identity/client-id annotation and the pod gets azure.workload.identity/use=true label only in WI mode, so the Azure Workload Identity webhook injects the federated token env vars at runtime - AZURE_CLIENT_SECRET is dropped from the Secret in WI mode Locals are null-tolerant so the cross_variable_validation preconditions fire with a clear message instead of templatefile blowing up on missing inputs at graph-construction time. Tests cover both auth modes plus AWS / GCP / OCI to prevent the WI wiring from leaking into non-Azure paths. BREAKING CHANGE: callers passing cloud_provider = "azure" now hit a plan error unless they either pass azure_federated_credential_id (preferred, recommended path) or set azure_workload_identity_enabled = false and keep passing azure_client_secret.
gdrojas
added a commit
that referenced
this pull request
May 21, 2026
Make the change backwards-compatible: existing Azure callers passing azure_client_secret keep working without any code change. Workload Identity is now opt-in by setting azure_workload_identity_enabled = true (which then requires azure_federated_credential_id). This diverges from the cert_manager / external_dns default in #361 (those default to WI = true) on purpose — the agent module has more deployed callers, so the conservative default avoids breaking them at plan time on the first apply after the bump.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
azure_federated_credential_idis enforced at plan time when WI is enabled — caller must passmodule.iam_cert_manager.id, which also creates the correct Tofu dependency ordering.azure_workload_identity_enabled = false.azure.jsonsecret is built conditionally to include only the fields required by the selected auth method.terraform_datapreconditions reject misconfigured inputs attofu plantime (missing client ID, missing federated credential when WI=true, missing client secret when WI=false).azure_federated_credential_idrequired by the new validation.istiodto 2 replicas to avoid PDB blocking node drains on single-replica setup.Auth modes
azure_workload_identity_enabledtruefalseazure_client_idazure_federated_credential_idazure_client_secretBreaking changes
Azure consumers of
cert_managerandexternal_dnsmust now passazure_federated_credential_id(default WI mode). Failure mode is a cleartofu planerror pointing to the fix:Consumers using Service Principal can opt out with
azure_workload_identity_enabled = falseand passazure_client_secretinstead.All consumers of
istio:istiod_replicasdefault goes from1→2. Existing clusters will redeployistiodwith an extra replica. Override withistiod_replicas = 1to preserve the old behavior (not recommended — single-replica istiod blocks node drains due to its PDB).