Skip to content

Add Azure Kubernetes Service (AKS) hosting support#16088

Open
mitchdenny wants to merge 73 commits intomainfrom
feature/aks-support
Open

Add Azure Kubernetes Service (AKS) hosting support#16088
mitchdenny wants to merge 73 commits intomainfrom
feature/aks-support

Conversation

@mitchdenny
Copy link
Copy Markdown
Member

Description

WIP — Adds first-class Azure Kubernetes Service (AKS) support to Aspire via a new Aspire.Hosting.Azure.Kubernetes package.

Motivation

Aspire's Aspire.Hosting.Kubernetes package supports end-to-end deployment to any conformant Kubernetes cluster via Helm charts, but it has no awareness of Azure-specific capabilities. Users who deploy to AKS must manually provision the cluster, configure workload identity, set up monitoring, and manage networking outside of Aspire.

What's here so far (Phase 1)

  • New Aspire.Hosting.Azure.Kubernetes package with dependencies on Aspire.Hosting.Kubernetes and Aspire.Hosting.Azure
  • AzureKubernetesEnvironmentResource — unified resource that extends AzureProvisioningResource and implements IAzureComputeEnvironmentResource, internally wrapping a KubernetesEnvironmentResource for Helm deployment
  • AddAzureKubernetesEnvironment() entry point (mirrors AddAzureContainerAppEnvironment() pattern)
  • Configuration extensions: WithVersion, WithSkuTier, WithNodePool, AsPrivateCluster, WithContainerInsights, WithAzureLogAnalyticsWorkspace
  • AzureKubernetesInfrastructure eventing subscriber
  • Implementation spec at docs/specs/aks-support.md

What's planned next

  • Workload identity (federated credentials + ServiceAccount YAML generation)
  • VNet integration (WithDelegatedSubnet)
  • Full Bicep provisioning (pending Azure.Provisioning.ContainerService package availability in internal feeds)
  • Unit tests with Bicep snapshot verification
  • E2E deployment tests

Validation

  • Package builds successfully with dotnet build /p:SkipNativeBuild=true
  • Follows established patterns from Aspire.Hosting.Azure.AppContainers

Fixes # (issue)

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
    • No. Follow-up changes expected.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes
    • No
  • Did you add public API?
    • Yes
      • If yes, did you have an API Review for it?
        • Yes
        • No
      • Did you add <remarks /> and <code /> elements on your triple slash comments?
        • Yes
        • No
    • No
  • Does the change make any security assumptions or guarantees?
    • Yes
    • No
  • Does the change require an update in our Aspire docs?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 12, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16088

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16088"

@github-actions
Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentExtensions.cs Outdated
Comment thread docs/specs/aks-support.md
@github-actions
Copy link
Copy Markdown
Contributor

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

Copy link
Copy Markdown
Member Author

@mitchdenny mitchdenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review: found 7 issues — 2 bugs (deadlock + missing exit code check), 1 security concern (credential file leak), 2 correctness issues (orphaned resources, redundant allocation), 1 behavioral concern (FindNodePoolResource identity), 1 documentation gap (region-locked VM sizes).

Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs Outdated
Comment thread src/Aspire.Hosting.Kubernetes/KubernetesResource.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/tools/GenVmSizes.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs Outdated
Comment thread src/Aspire.Hosting.Azure.Kubernetes/AksNodeVmSizes.Generated.cs
@mitchdenny mitchdenny marked this pull request as ready for review April 15, 2026 02:33
@mitchdenny mitchdenny self-assigned this Apr 15, 2026
@mitchdenny mitchdenny added this to the 13.3 milestone Apr 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds first-class Azure Kubernetes Service (AKS) support to Aspire via a new Aspire.Hosting.Azure.Kubernetes package, integrating Azure provisioning (Bicep via Azure.Provisioning) with the existing Helm-based Kubernetes publishing pipeline.

Changes:

  • Introduces AddAzureKubernetesEnvironment() and AKS resource types (AKS cluster, node pools, subnet/workload identity wiring) plus an AKS-specific infrastructure subscriber.
  • Extends Kubernetes publishing to support parent compute environments (AKS wrapping an inner Kubernetes environment), node pool scheduling, kubeconfig targeting, and deploy-time IValueProvider resolution for Helm values.
  • Adds tests (including Bicep snapshot verification) and automation to periodically update VM size constants.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/Aspire.Hosting.Azure.Kubernetes.Tests/Snapshots/AzureKubernetesEnvironmentExtensionsTests.AddAzureKubernetesEnvironment_WithVersion.verified.bicep Adds verified Bicep snapshot for AKS version configuration.
tests/Aspire.Hosting.Azure.Kubernetes.Tests/Snapshots/AzureKubernetesEnvironmentExtensionsTests.AddAzureKubernetesEnvironment_BasicConfiguration.verified.bicep Adds verified Bicep snapshot for basic AKS provisioning outputs.
tests/Aspire.Hosting.Azure.Kubernetes.Tests/AzureKubernetesInfrastructureTests.cs Adds tests for AKS infra behavior (default user pool, affinity, deployment target/registry flow).
tests/Aspire.Hosting.Azure.Kubernetes.Tests/AzureKubernetesEnvironmentExtensionsTests.cs Adds API and configuration tests for AKS environment extensions and Bicep generation.
tests/Aspire.Hosting.Azure.Kubernetes.Tests/Aspire.Hosting.Azure.Kubernetes.Tests.csproj Introduces new test project for AKS hosting package.
src/Aspire.Hosting.Kubernetes/Resources/ServiceAccountV1.cs Adds Kubernetes ServiceAccount resource model for YAML generation (workload identity).
src/Aspire.Hosting.Kubernetes/KubernetesResource.cs Adds support for deferring Helm value resolution via IValueProvider and normalizes expression keys.
src/Aspire.Hosting.Kubernetes/KubernetesPublishingContext.cs Adds parent-environment matching, node pool nodeSelector application, and capturing deploy-time value providers.
src/Aspire.Hosting.Kubernetes/KubernetesNodePoolResource.cs Introduces node pool resource abstraction for scheduling workloads in Kubernetes environments.
src/Aspire.Hosting.Kubernetes/KubernetesNodePoolAnnotation.cs Adds annotation to associate compute resources with a node pool for scheduling.
src/Aspire.Hosting.Kubernetes/KubernetesInfrastructure.cs Enables parent environment targeting and sets deployment target annotations accordingly.
src/Aspire.Hosting.Kubernetes/KubernetesEnvironmentResource.cs Adds kubeconfig path support, parent compute env linkage, and captured Helm value providers list.
src/Aspire.Hosting.Kubernetes/KubernetesEnvironmentExtensions.cs Adds public node pool API (AddNodePool, WithNodePool) for Kubernetes environments.
src/Aspire.Hosting.Kubernetes/Deployment/HelmDeploymentEngine.cs Adds deploy-time IValueProvider resolution, and passes --kubeconfig to helm/kubectl when set.
src/Aspire.Hosting.Kubernetes/Aspire.Hosting.Kubernetes.csproj Exposes internals to AKS package and its tests.
src/Aspire.Hosting.Azure.Kubernetes/tools/GenVmSizes.cs Adds tool to generate AKS VM size constants from Azure SKU data.
src/Aspire.Hosting.Azure.Kubernetes/README.md Adds initial package README and basic usage snippet.
src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs Adds AKS event subscriber to fetch kubeconfig, ensure user pool, and wire workload identity/SA resources.
src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs Adds unified AKS provisioning + compute environment resource wrapping inner Kubernetes environment.
src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentExtensions.cs Adds AddAzureKubernetesEnvironment plus configuration extensions (version, tier, subnet, node pools, registry, identity).
src/Aspire.Hosting.Azure.Kubernetes/Aspire.Hosting.Azure.Kubernetes.csproj Introduces new AKS hosting package project and dependencies.
src/Aspire.Hosting.Azure.Kubernetes/AksSubnetAnnotation.cs Adds annotation to track AKS subnet references for Bicep wiring.
src/Aspire.Hosting.Azure.Kubernetes/AksSkuTier.cs Adds AKS SKU tier enum.
src/Aspire.Hosting.Azure.Kubernetes/AksNodeVmSizes.Generated.cs Adds generated VM size constants consumed by node pool configuration.
src/Aspire.Hosting.Azure.Kubernetes/AksNodePoolResource.cs Adds AKS-specific node pool resource extending kubernetes node pool abstraction.
src/Aspire.Hosting.Azure.Kubernetes/AksNodePoolConfig.cs Adds node pool config record + pool mode enum (System/User).
src/Aspire.Hosting.Azure.Kubernetes/AksNetworkProfile.cs Adds internal network profile model for AKS network settings.
docs/specs/aks-support.md Adds implementation spec describing architecture, phases, and design decisions.
Directory.Packages.props Adds Azure.Provisioning.ContainerService dependency version.
Aspire.slnx Adds AKS package and test projects to the solution.
.github/workflows/update-azure-vm-sizes.yml Adds scheduled workflow to regenerate VM size constants and open a PR.

Comment on lines +248 to +252
// Get the actual provisioned cluster name from the Bicep output.
// The Azure.Provisioning SDK may add a unique suffix to the name
// (e.g., take('aks-${uniqueString(resourceGroup().id)}', 63)).
var clusterName = await environment.NameOutputReference.GetValueAsync(context.CancellationToken).ConfigureAwait(false)
?? environment.Name;
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

environment.Name is the Aspire resource name (e.g., "aks"), but the provisioned AKS cluster name in Bicep is generated (see snapshots: take('aks-${uniqueString(resourceGroup().id)}', 63)). This will cause az aks get-credentials (and the resource group lookup) to target a non-existent cluster. Use the provisioned cluster name output instead (e.g., resolve environment.NameOutputReference after provisioning) and pass that resolved name through to GetResourceGroupAsync / az aks get-credentials.

Copilot uses AI. Check for mistakes.
// Scope the Helm chart name to this AKS environment to avoid
// conflicts when multiple environments deploy to the same cluster
// or when re-deploying with different environment names.
k8sEnvBuilder.Resource.HelmChartName = $"{builder.Environment.ApplicationName}-{name}".ToLowerInvariant().Replace(' ', '-');
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm chart names have stricter character constraints than just lowercasing and replacing spaces; application names can include underscores or other characters that may result in an invalid chart/release name. Prefer using the existing Helm/Kubernetes naming helper used elsewhere in the repo (e.g., a ToHelmChartName()-style sanitizer) to guarantee a valid name.

Suggested change
k8sEnvBuilder.Resource.HelmChartName = $"{builder.Environment.ApplicationName}-{name}".ToLowerInvariant().Replace(' ', '-');
k8sEnvBuilder.Resource.HelmChartName = $"{builder.Environment.ApplicationName}-{name}".ToHelmChartName();

Copilot uses AI. Check for mistakes.
Comment on lines +171 to +172
var defaultPool = new AksNodePoolResource("workload", defaultConfig, environment);
appModel.Resources.Add(defaultPool);
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds an AksNodePoolResource directly to the model, but node pools added via AddNodePool(...) are excluded from the manifest in publish mode. Adding the default pool without excluding it can make manifests/publishing output inconsistent (and can expose an implementation detail users didn’t declare). Consider marking this default node pool as excluded-from-manifest using the same mechanism used by the AddNodePool(...) path, or avoid adding it as a standalone resource if it’s only needed to attach KubernetesNodePoolAnnotation.

Copilot uses AI. Check for mistakes.
## Usage example

Then, in the _AppHost.cs_ file of `AppHost`, add an AKS environment and deploy services to it:

Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage example doesn’t show how to actually target AKS for deployment. As written, myService is not assigned to the AKS compute environment (it likely needs .WithComputeEnvironment(aks) or similar), so users may copy/paste a non-working example.

Copilot uses AI. Check for mistakes.
Comment thread .github/workflows/update-azure-vm-sizes.yml Outdated
mitchdenny and others added 8 commits April 15, 2026 22:10
These are infrastructure configuration concerns better handled via
ConfigureInfrastructure(...) customization. The internal properties
and Bicep generation logic remain for users who customize directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
End-to-end test that deploys the Aspire starter template to AKS using
the full aspire deploy pipeline (Bicep provisioning + container build +
ACR push + Helm deploy). Follows the ACA deployment test pattern.

Test flow:
1. Create starter project via aspire new
2. Add Aspire.Hosting.Azure.Kubernetes package
3. Modify AppHost to use AddAzureKubernetesEnvironment + WithComputeEnvironment
4. aspire deploy --clear-cache (provisions AKS + ACR + deploys)
5. Verify pods running via kubectl
6. Port-forward and verify HTTP endpoints
7. aspire destroy for cleanup

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
T1.2: WithVersion - deploys with K8s 1.30, verifies via kubectl
T1.3: NodePool - custom pool with nodeSelector verification
T1.4: VNet - subnet integration with VNet IP verification
T1.5: WorkloadIdentity - Azure Storage ref with WI SA/pod labels
T1.6: ExplicitRegistry - bring-your-own ACR
T1.7: PerPoolSubnet - different subnets per node pool

All tests use aspire deploy --clear-cache (full provisioning pipeline)
and follow the same pattern as T1.1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TypeScript variants of the AKS Tier 1 tests using ExpressReact template:
- TypeScriptAksDeploymentTests: basic AKS deploy from TS AppHost
- TypeScriptAksNodePoolDeploymentTests: custom node pool from TS
- TypeScriptAksVnetDeploymentTests: VNet/subnet integration from TS

Uses addAzureKubernetesEnvironment(), addNodePool(), withSubnet() from
the auto-generated TypeScript SDK (via AspireExport attributes).
Follows TypeScriptExpressDeploymentTests pattern with bundle install.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Same rationale as AsPrivateCluster and WithSkuTier: Kubernetes version
is an infrastructure configuration concern. The internal property and
Bicep generation remain for ConfigureInfrastructure customization.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Workspace

These APIs set internal properties but the ConfigureAksInfrastructure
callback never emits the corresponding Bicep (addonProfiles.omsagent,
azureMonitorProfile, data collection rules). Shipping non-functional
APIs is misleading.

Follow-up issue #16150 will add these back when Bicep generation is
implemented. Internal properties remain for future use.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Root cause: string replacement (content.Replace(oldCode, newCode))
failed silently due to line ending mismatches between the template
output and our raw string literals. aspire deploy completed in 97ms
as a no-op because the AKS environment was never added.

Fix: Write the ENTIRE AppHost.cs content instead of patching it.
This is immune to line ending, whitespace, and template changes.
Each test now has a self-documenting raw string literal showing
exactly what AppHost code is being tested.

TypeScript tests: added guard checks to throw if the apphost.ts
replacement didn't change anything.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Default system pool: Standard_D4s_v5 (4 vCPUs) -> Standard_D2s_v5 (2 vCPUs)
Default workload pool: Standard_D4s_v5 (4 vCPUs) -> Standard_D2s_v5 (2 vCPUs)
Max workload pool: 10 -> 3 (reduces quota reservation)

Total minimum vCPU: 8 -> 4 (fits within CI subscription quota)

The deployment tests were failing with:
  ErrCode_InsufficientVCPUQuota: Insufficient vcpu quota requested 8,
  remaining 0 for family standardDSv5Family for region westus3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mitchdenny
Copy link
Copy Markdown
Member Author

/deployment-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Deployment tests starting on PR #16088...

This will deploy to real Azure infrastructure. Results will be posted here when complete.

View workflow run

<ItemGroup>
<InternalsVisibleTo Include="Aspire.Hosting.Azure.Tests" />
<InternalsVisibleTo Include="Aspire.Hosting.Azure.ContainerRegistry" />
<InternalsVisibleTo Include="Aspire.Hosting.Azure.Kubernetes" />
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try really hard not to do this. What is causing us to use IVT?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the IVT. Reverted to linking ProcessSpec/ProcessUtil/ProcessResult directly (same pattern as Aspire.Hosting.Azure itself uses).

</ItemGroup>

<ItemGroup>
<InternalsVisibleTo Include="Aspire.Hosting.Azure.Kubernetes" />
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try not to use IVT. The issue is if someone ever uses different versions, and we changed the internal method, they will get errors.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This IVT is for the test project (Aspire.Hosting.Azure.Kubernetes.Tests) — same pattern as every other package's IVT to its test project. The cross-package IVT (Azure -> Azure.Kubernetes) has been removed per your other comment.

/// <c>WithComputeEnvironment(aksEnv)</c> but the inner <c>KubernetesEnvironmentResource</c>
/// needs to process the resource.
/// </remarks>
public IComputeEnvironmentResource? ParentComputeEnvironment { get; set; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of

Suggested change
public IComputeEnvironmentResource? ParentComputeEnvironment { get; set; }
public IComputeEnvironmentResource? OwningComputeEnvironment { get; set; }

? The description says "that owns this Kubernetes environment."

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — renamed to OwningComputeEnvironment. Checked Docker Compose for precedent — it doesn't have this pattern (flat architecture). This is unique to AKS's layered design where an AzureKubernetesEnvironmentResource wraps a KubernetesEnvironmentResource.

@eerhardt
Copy link
Copy Markdown
Member

I tried using this on my machine, and I got:

Steps Summary:
                Step timeline:                               0s                      7m 49s
                                                             │───────┬──────┬─────┬───────│
      7m 49s  ✗ pipeline-execution                           │╶──────────────────────────╴│
     14.95ms  ✓ process-parameters                           │╴                           │
      6.48ms  ✓   deploy-prereq                              │╴                           │
       0.94s  ✓     validate-azure-login                     │╴                           │
      46.24s  ✓       create-provisioning-context            │╶──╴                        │
      14.77s  ✓         provision-aks-acr                    │   ╴                        │
      6m 32s  ✓           provision-aks                      │   ╶───────────────────────╴│
      0.69ms  ✓             provision-azure-bicep-resources  │                          ╴ │
      13.88s  ✓               push-webfrontend               │                           ╴│
      14.78s  ✓               push-apiservice                │                           ╴│
       5.14s  ✓               aks-get-credentials-aks        │                          ╴ │
      4.19ms  ✓                 prepare-aks-k8s              │                          ╴ │
     13.35ms  ✗                   helm-deploy-aks-k8s        │                           ╴│ — Helm deployment failed: 'helm' was not found.
Please install 'helm' and ensure it is available on your PATH to deploy to Kubernetes.
       4.22s  ✓           login-to-acr-aks-acr               │    ╴                       │
      0.77ms  ✓             push-prereq                      │    ╴                       │
      1.00ms  ✓   publish-prereq                             │╴                           │
     14.33ms  ✓     publish-azureb2fa7                       │╴                           │
      1.04ms  ✓       publish                                │╴                           │
      1.23ms  ✓   build-prereq                               │╴                           │
      28.49s  ✓     build-webfrontend                        │╶─╴                         │
      11.77s  ✓     build-apiservice                         │╴                           │
      0.56ms  ✓       build                                  │  ╴                         │
     79.74ms  ✓ publish-aks-k8s                              │╴                           │
       1.32s  ✓ fetch-tenant                                 │╴                           │
       1.15s  ✓ fetch-subscription                           │ ╴                          │
       1.79s  ✓ fetch-resource-groups                        │ ╴                          │
       2.71s  ✓ fetch-regions                                │  ╴                         │

❌ Pipeline failed
For more details, add --log-level debug/trace to the command.
------------------------------------------------------------

It took 7 minutes before I got an error saying I needed helm.

@eerhardt
Copy link
Copy Markdown
Member

Following the deployment, I opened the dashboard and it has a warning at the top:

image

mitchdenny and others added 2 commits April 16, 2026 11:30
1. Remove IVT from Aspire.Hosting.Azure -> Aspire.Hosting.Azure.Kubernetes.
   Revert to linking ProcessSpec/ProcessUtil/ProcessResult directly
   (same pattern as Aspire.Hosting.Azure itself).

2. Rename ParentComputeEnvironment -> OwningComputeEnvironment per Eric's
   suggestion. Better describes the ownership relationship.

3. Remove all 9 new AKS E2E deployment tests due to capacity issues in
   the deployment test subscription. The existing AksStarter* tests remain.
   Will re-add verification tests in a follow-up.

4. Add Helm CLI prerequisite check pipeline step. Fails fast with clear
   error message if helm is not on PATH, before any deployment steps run.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
vmSize defaults to Standard_D2s_v5, minCount to 1, maxCount to 3.
ARM/Bicep requires vmSize (no Azure default), so we provide a sensible
default. Users can now call just aks.AddNodePool("workload") for the
common case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mitchdenny
Copy link
Copy Markdown
Member Author

/deployment-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Deployment tests starting on PR #16088...

This will deploy to real Azure infrastructure. Results will be posted here when complete.

View workflow run

@mitchdenny
Copy link
Copy Markdown
Member Author

Following the deployment, I opened the dashboard and it has a warning at the top:

Parity with Docker Compose - although I think I'll leave auth on.

@mitchdenny
Copy link
Copy Markdown
Member Author

/deployment-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Deployment tests starting on PR #16088...

This will deploy to real Azure infrastructure. Results will be posted here when complete.

View workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🎬 CLI E2E Test Recordings — 72 recordings uploaded (commit ef225ea)

View recordings
Test Recording
AddPackageInteractiveWhileAppHostRunningDetached ▶️ View Recording
AddPackageWhileAppHostRunningDetached ▶️ View Recording
AgentCommands_AllHelpOutputs_AreCorrect ▶️ View Recording
AgentInitCommand_DefaultSelection_InstallsSkillOnly ▶️ View Recording
AgentInitCommand_MigratesDeprecatedConfig ▶️ View Recording
AspireAddPackageVersionToDirectoryPackagesProps ▶️ View Recording
AspireUpdateRemovesAppHostPackageVersionFromDirectoryPackagesProps ▶️ View Recording
Banner_DisplayedOnFirstRun ▶️ View Recording
Banner_DisplayedWithExplicitFlag ▶️ View Recording
Banner_NotDisplayedWithNoLogoFlag ▶️ View Recording
CertificatesClean_RemovesCertificates ▶️ View Recording
CertificatesTrust_WithNoCert_CreatesAndTrustsCertificate ▶️ View Recording
CertificatesTrust_WithUntrustedCert_TrustsCertificate ▶️ View Recording
ConfigSetGet_CreatesNestedJsonFormat ▶️ View Recording
CreateAndRunAspireStarterProject ▶️ View Recording
CreateAndRunAspireStarterProjectWithBundle ▶️ View Recording
CreateAndRunEmptyAppHostProject ▶️ View Recording
CreateAndRunJavaEmptyAppHostProject ▶️ View Recording
CreateAndRunJsReactProject ▶️ View Recording
CreateAndRunPythonReactProject ▶️ View Recording
CreateAndRunTypeScriptEmptyAppHostProject ▶️ View Recording
CreateAndRunTypeScriptStarterProject ▶️ View Recording
CreateJavaAppHostWithViteApp ▶️ View Recording
CreateTypeScriptAppHostWithViteApp ▶️ View Recording
DashboardRunWithOtelTracesReturnsNoTraces ▶️ View Recording
DeployK8sBasicApiService ▶️ View Recording
DeployK8sWithGarnet ▶️ View Recording
DeployK8sWithMongoDB ▶️ View Recording
DeployK8sWithMySql ▶️ View Recording
DeployK8sWithPostgres ▶️ View Recording
DeployK8sWithRabbitMQ ▶️ View Recording
DeployK8sWithRedis ▶️ View Recording
DeployK8sWithSqlServer ▶️ View Recording
DeployK8sWithValkey ▶️ View Recording
DeployTypeScriptAppToKubernetes ▶️ View Recording
DescribeCommandResolvesReplicaNames ▶️ View Recording
DescribeCommandShowsRunningResources ▶️ View Recording
DetachFormatJsonProducesValidJson ▶️ View Recording
DetachFormatJsonProducesValidJsonWhenRestartingExistingInstance ▶️ View Recording
DoListStepsShowsPipelineSteps ▶️ View Recording
DoctorCommand_DetectsDeprecatedAgentConfig ▶️ View Recording
DoctorCommand_WithSslCertDir_ShowsTrusted ▶️ View Recording
DoctorCommand_WithoutSslCertDir_ShowsPartiallyTrusted ▶️ View Recording
GlobalMigration_HandlesCommentsAndTrailingCommas ▶️ View Recording
GlobalMigration_HandlesMalformedLegacyJson ▶️ View Recording
GlobalMigration_PreservesAllValueTypes ▶️ View Recording
GlobalMigration_SkipsWhenNewConfigExists ▶️ View Recording
GlobalSettings_MigratedFromLegacyFormat ▶️ View Recording
InitTypeScriptAppHost_AugmentsExistingViteRepoAtRoot ▶️ View Recording
InvalidAppHostPathWithComments_IsHealedOnRun ▶️ View Recording
LegacySettingsMigration_AdjustsRelativeAppHostPath ▶️ View Recording
LogsCommandShowsResourceLogs ▶️ View Recording
OtelLogsReturnsStructuredLogsFromStarterApp ▶️ View Recording
PsCommandListsRunningAppHost ▶️ View Recording
PsFormatJsonOutputsOnlyJsonToStdout ▶️ View Recording
PublishWithConfigureEnvFileUpdatesEnvOutput ▶️ View Recording
PublishWithDockerComposeServiceCallbackSucceeds ▶️ View Recording
PublishWithoutOutputPathUsesAppHostDirectoryDefault ▶️ View Recording
RestoreGeneratesSdkFiles ▶️ View Recording
RestoreRefreshesGeneratedSdkAfterAddingIntegration ▶️ View Recording
RestoreSupportsConfigOnlyHelperPackageAndCrossPackageTypes ▶️ View Recording
RunFromParentDirectory_UsesExistingConfigNearAppHost ▶️ View Recording
SecretCrudOnDotNetAppHost ▶️ View Recording
SecretCrudOnTypeScriptAppHost ▶️ View Recording
StagingChannel_ConfigureAndVerifySettings_ThenSwitchChannels ▶️ View Recording
StartAndWaitForTypeScriptSqlServerAppHostWithNativeAssets ▶️ View Recording
StopAllAppHostsFromAppHostDirectory ▶️ View Recording
StopAllAppHostsFromUnrelatedDirectory ▶️ View Recording
StopNonInteractiveMultipleAppHostsShowsError ▶️ View Recording
StopNonInteractiveSingleAppHost ▶️ View Recording
StopWithNoRunningAppHostExitsSuccessfully ▶️ View Recording
UnAwaitedChainsCompileWithAutoResolvePromises ▶️ View Recording

📹 Recordings uploaded automatically from CI run #24493257211

@mitchdenny
Copy link
Copy Markdown
Member Author

/deployment-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Deployment tests starting on PR #16088...

This will deploy to real Azure infrastructure. Results will be posted here when complete.

View workflow run

@mitchdenny
Copy link
Copy Markdown
Member Author

/deployment-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Deployment tests starting on PR #16088...

This will deploy to real Azure infrastructure. Results will be posted here when complete.

View workflow run

Stop setting DASHBOARD__FRONTEND__AUTHMODE and DASHBOARD__OTLP__AUTHMODE
to 'Unsecured' on the Aspire dashboard deployed to Kubernetes. This
matches the Docker Compose behavior where the dashboard uses its default
auth mode (BrowserToken).

Update snapshot tests for environment resource tests. Publisher test
snapshots need regeneration (dashboard ConfigMap removed, file numbering
shifts).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Deployment E2E Tests failed — 26 passed, 4 failed, 0 cancelled

View test results and recordings

View workflow run

Test Result Recording
Deployment.EndToEnd-AcaCompactNamingDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-VnetSqlServerConnectivityDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-TypeScriptVnetSqlServerInfraDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-VnetKeyVaultConnectivityDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-VnetKeyVaultInfraDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AzureKeyVaultDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AksStarterDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AcaCustomRegistryDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AcaManagedRedisDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AzureLogAnalyticsDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AksStarterWithRedisDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AcaDeploymentErrorOutputTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AcaStarterDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AzureContainerRegistryDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-NspStorageKeyVaultDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AcaExistingRegistryDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-TypeScriptExpressDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AzureAppConfigDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AzureServiceBusDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AuthenticationTests ✅ Passed
Deployment.EndToEnd-VnetSqlServerInfraDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-VnetStorageBlobConnectivityDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AzureEventHubsDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AppServiceReactDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AzureStorageDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-VnetStorageBlobInfraDeploymentTests ✅ Passed ▶️ View Recording
Deployment.EndToEnd-AcrPurgeTaskDeploymentTests ❌ Failed ▶️ View Recording
Deployment.EndToEnd-PythonFastApiDeploymentTests ❌ Failed ▶️ View Recording
Deployment.EndToEnd-AcaCompactNamingUpgradeDeploymentTests ❌ Failed ▶️ View Recording
Deployment.EndToEnd-AppServicePythonDeploymentTests ❌ Failed ▶️ View Recording

Copy link
Copy Markdown
Member

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of AKS hosting support — 9 issues found:

  • 1 stale fix (AllowUnsafeBlocks still present despite prior review)
  • 2 public API concerns (KubeConfigPath, OwningComputeEnvironment should be internal)
  • 1 missing public API (WithVersion/WithSkuTier/AsPrivateCluster extensions not exposed despite spec claiming implemented)
  • 1 security concern (command injection risk in az CLI argument interpolation)
  • 1 bug (FindNodePoolResource fallback missing ManifestPublishingCallbackAnnotation.Ignore)
  • 1 risk (beta dependency on Azure.Provisioning.ContainerService)
  • 1 config concern (ServiceCidr default overlaps with spec examples)
  • 1 docs issue (README example missing WithComputeEnvironment)

<PropertyGroup>
<TargetFramework>$(DefaultTargetFramework)</TargetFramework>
<IsPackable>true</IsPackable>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still present: AllowUnsafeBlocks and linked Process files.

A previous review thread (by @JamesNK) flagged AllowUnsafeBlocks and linked ProcessSpec.cs/ProcessUtil.cs/ProcessResult.cs as unnecessary. @mitchdenny replied "Fixed — removed AllowUnsafeBlocks and the linked Process files. Now uses IProcessRunner from Aspire.Hosting.Azure via IVT." However, the current code still has both AllowUnsafeBlocks and the Compile Include links to the Process files. The fix either wasn't pushed or was reverted.

Comment thread docs/specs/aks-support.md

#### Private DNS Zone auto-linking
- 🔲 When backing services have private endpoints in same VNet as AKS, Private DNS Zones should be auto-linked

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec says WithVersion(), WithSkuTier(), AsPrivateCluster() are "✅ Implemented" but no public extension methods exist.

The spec lists these in the "✅ Implemented" section and the resource has internal properties (KubernetesVersion, SkuTier, IsPrivateCluster), but there are no public extension methods for users to set them. The ConfigureAksInfrastructure callback reads these properties, so the Bicep generation supports them — but they're only settable via internals/tests. Either the extension methods need to be added or the spec's "Implemented" status should be updated.

azPath,
$"aks get-credentials --resource-group \"{resourceGroup}\" --name \"{clusterName}\" --file -",
context.Logger).ConfigureAwait(false);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command injection risk: clusterName and resourceGroup are interpolated into shell arguments without escaping.

clusterName comes from a Bicep output (NameOutputReference.GetValueAsync) and resourceGroup from IConfiguration or an az resource list response. A value containing " (double quote) could break argument parsing. The values are quote-wrapped but quotes within the values are not escaped.

Consider validating that the values match expected patterns (alphanumeric, hyphens, underscores) before interpolation, or use ProcessSpec with structured argument handling.

/// Gets or sets the service CIDR.
/// </summary>
public string ServiceCidr { get; set; } = "10.0.4.0/22";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded ServiceCidr default (10.0.4.0/22) overlaps with the spec's example subnets.

This CIDR overlaps with the spec's example GPU subnet (10.0.4.0/24). While AksNetworkProfile is currently internal and only used when explicitly set, if the network profile is ever auto-applied alongside subnet configuration, this default will silently conflict with user-configured VNet address spaces. Consider using a non-overlapping default (e.g., 172.16.0.0/16) or documenting the constraint prominently.

{
return existing;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FindNodePoolResource fallback creates AksNodePoolResource without ManifestPublishingCallbackAnnotation.Ignore.

The EnsureDefaultUserNodePool method correctly adds ManifestPublishingCallbackAnnotation.Ignore to the default pool (line 171). But the FindNodePoolResource fallback path (line 195) creates a new AksNodePoolResource and adds it to appModel.Resources without ManifestPublishingCallbackAnnotation.Ignore. This is inconsistent with both the default pool path and the AddNodePool() extension method (which calls .ExcludeFromManifest()). Pools created through this fallback would leak into publishing output.

/// </summary>
/// <remarks>
/// This is used by Azure Kubernetes Service (AKS) integration to isolate credentials
/// fetched via <c>az aks get-credentials</c> from the user's default kubectl context.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KubeConfigPath is public but appears to be an implementation detail that should be internal.

This property is only set by AzureKubernetesInfrastructure.GetAksCredentialsAsync and consumed by HelmDeploymentEngine. It exposes the path to a temp file containing cluster credentials. Making it part of the public API surface commits to maintaining this property and invites misuse. Consider making this internal.

/// </summary>
/// <remarks>
/// This is used by Azure Kubernetes Service (AKS) integration where the user calls
/// <c>WithComputeEnvironment(aksEnv)</c> but the inner <c>KubernetesEnvironmentResource</c>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OwningComputeEnvironment is public with a public setter but only used internally by the AKS integration.

This is only set in AddAzureKubernetesEnvironment() and consumed within KubernetesInfrastructure, KubernetesPublishingContext, and KubernetesEnvironmentResource itself. Making this public commits to a new public API surface for a single internal use case. Consider making both the getter and setter internal.

Comment thread Directory.Packages.props
<PackageVersion Include="Azure.Provisioning.AppService" Version="1.3.1" />
<PackageVersion Include="Azure.Provisioning.ApplicationInsights" Version="1.1.0" />
<PackageVersion Include="Azure.Provisioning.ContainerRegistry" Version="1.1.0" />
<PackageVersion Include="Azure.Provisioning.ContainerService" Version="1.0.0-beta.3" />
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-release dependency: Azure.Provisioning.ContainerService at 1.0.0-beta.3 will ship to customers.

The ConfigureAksInfrastructure method now uses Azure.Provisioning SDK types directly (ContainerServiceManagedCluster, ManagedClusterAgentPoolProfile, etc.). This means customers who install Aspire.Hosting.Azure.Kubernetes will transitively depend on a beta package. Beta packages can have breaking changes between releases. Is there a plan to get a stable release of this package before shipping, or is the risk accepted for this preview?

var aks = builder.AddAzureKubernetesEnvironment("aks");

var myService = builder.AddProject<Projects.MyService>()
.WithComputeEnvironment(aks);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README usage example is missing .WithComputeEnvironment(aks).

Without .WithComputeEnvironment(aks), myService won't be deployed to the AKS cluster. Users copying this example will get a non-working setup.

Suggested change
.WithComputeEnvironment(aks);
var aks = builder.AddAzureKubernetesEnvironment("aks");
var myService = builder.AddProject<Projects.MyService>()
.WithComputeEnvironment(aks);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants