Skip to content

Surface silent integration-assembly load failures (#16729)#16733

Draft
IEvangelist wants to merge 1 commit intomicrosoft:mainfrom
IEvangelist:dapine/fix-resolver-silent-loader-failures-16729
Draft

Surface silent integration-assembly load failures (#16729)#16733
IEvangelist wants to merge 1 commit intomicrosoft:mainfrom
IEvangelist:dapine/fix-resolver-silent-loader-failures-16729

Conversation

@IEvangelist
Copy link
Copy Markdown
Member

@IEvangelist IEvangelist commented May 4, 2026

Description

This PR hardens the diagnostic chain so that when the integration-assembly load path does fail — for any reason: build-pipeline version skew, stale local NuGet cache, transitive dependency mismatch, partially populated probe directory, etc. — the failure is loud and self-explanatory instead of surfacing only as:

No code generator found for language: TypeScript
No language support found for: typescript/nodejs

The original report (#16729) was a cryptic CLI failure on aspire new with TypeScript templates. After investigation, that specific symptom traced to a stale local NuGet cache anomaly on the reporter''s machine (a mix of package contents from switching between PR / staging / stable 13.3.0 builds) and is not reproducible after clearing ~/.nuget and ~/.aspire. So this PR no longer attempts a runtime binding fix — it is diagnostic hardening only, which is the part that has lasting value regardless of which upstream condition produced the resolver miss.

Changes

  • CodeGeneratorResolver / LanguageSupportResolver now log ReflectionTypeLoadException at Warning level and include the LoaderExceptions text in the message (was previously LogDebug, which is below the file logger''s default Information threshold and never reached the user log).
  • When an assembly named like Aspire.Hosting.CodeGeneration.* is loaded but contributes zero ICodeGenerator / ILanguageSupport types, log a Warning so the silent-failure case is visible.
  • AssemblyLoader.LoadAssemblies performs an Aspire.TypeSystem version sanity check at startup against the libs directory and warns when the bundled and probed versions diverge — surfacing this lets us catch build-pipeline regressions early even if the runtime tolerates the skew.
  • LanguageService / CodeGenerationService error messages now list the available languages, or point at the apphost-server log + binary mismatch when zero are discovered.
  • PrebuiltAppHostServer promotes spawned-process stdout/stderr capture from Trace to Debug/Information, so warnings emitted by the apphost server actually reach the default file log.

Internal-only API additions

  • CodeGeneratorResolver.GetSupportedLanguages() so the service can list available languages in error messages.
  • An internal test-only constructor on both resolvers that takes a Func<IReadOnlyList<Assembly>>, used by the new tests.
  • IntegrationLoadContext.GetSharedAssemblyNames() so the assembly loader''s version sanity check stays in sync with the actual load context policy.

Tests

New coverage in tests/Aspire.Hosting.RemoteHost.Tests/:

  • ResolverDiagnosticsTests (4) — proves the resolvers log Warnings on zero-contributor assemblies and on ReflectionTypeLoadException.
  • ServiceErrorMessageTests (4) — proves the user-visible error messages list available languages or point at the binary-mismatch scenario.

All Aspire.Hosting.RemoteHost.Tests pass locally.

Notes

  • Re-targeted from release/13.3 to main. The branch was reset to upstream/main and the diagnostic commit was cherry-picked clean.
  • An earlier revision of this PR also contained an IntegrationLoadContext.Load change that resolved Aspire.TypeSystem by simple name from the default ALC, intended to tolerate 42.42.42.4213.3.0.0 version skew. That change has been dropped along with its regression test, since the originally-reported failure turned out to be a local cache anomaly rather than a build-pipeline skew, and shipping a binding-policy change without a confirmed root cause to defend against would be premature. The hardening here keeps that scenario visible if it ever does occur.

Closes #16729

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes — ResolverDiagnosticsTests (4) and ServiceErrorMessageTests (4) covering the new diagnostic paths.
  • Did you add public API?
    • No — CodeGeneratorResolver.GetSupportedLanguages(), the additional resolver constructors, and IntegrationLoadContext.GetSharedAssemblyNames() are all internal; the resolvers themselves are internal sealed.
  • Does the change make any security assumptions or guarantees?
    • No
  • Does the change require an update in our Aspire docs?
    • No

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16733

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16733"

// across the ALC boundary.
try
{
return Default.LoadFromAssemblyName(new AssemblyName(SharedAssemblyName));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. The TypeSystem assembly needs to be loaded into the default ALC. See the comment that was deleted.

Copy link
Copy Markdown
Member Author

@IEvangelist IEvangelist May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Eric — want to make sure I understand which invariant you're flagging, because I think we may actually agree and the comment in the diff is what's misleading.

The "TypeSystem must live in Default ALC" invariant is satisfied here. AssemblyLoadContext.Default.LoadFromAssemblyName(...) is an instance method on the Default ALC, so the assembly returned is owned by Default — it is not loaded into IntegrationLoadContext. The new regression test asserts this directly:

var resolved = alc.LoadFromAssemblyName(mismatched);
Assert.Same(hostTypeSystem, resolved);
Assert.Same(AssemblyLoadContext.Default, AssemblyLoadContext.GetLoadContext(resolved));

tests/Aspire.Hosting.RemoteHost.Tests/IntegrationLoadContextTests.csAspireTypeSystem_IsSharedFromDefaultContext_EvenWhenRequestedVersionDoesNotMatch. Same Assembly instance, same Default ALC ownership, so type identity for ICodeGenerator / ILanguageSupport / AtsContext is preserved across the ALC boundary just like before.

The only behavioral difference between this and the old return null path is the version constraint the runtime applies when it asks Default to resolve:

Path What happens for an integration that references Aspire.TypeSystem, Version=42.42.42.42 while the bundled copy is 13.3.0.0
Old (return null) CLR re-runs the bind through Default using the requested AssemblyName (carrying Version=42.42.42.42). Default's binder enforces strict version match → FileLoadExceptionReflectionTypeLoadException from GetTypes() → silently swallowed by the resolver. This is exactly #16729.
New (explicit Default.LoadFromAssemblyName(new AssemblyName("Aspire.TypeSystem"))) Default resolves by simple name only and returns its own 13.3.0.0 copy. Same Default ALC, same type identity — just no strict version gate. Canonical "host-shared assembly" pattern for plugin loaders.

So the change is not "load TypeSystem somewhere other than Default" — it's "stop letting Default's strict version match veto a bind we know is safe by simple name."

Could you confirm which of the following is the actual concern, so I can address it correctly?

  1. You read it as loading into IntegrationALC — in that case the test above should resolve it, and I'll rewrite the inline comment to make that explicit (frame it like the original deleted comment did).
  2. You want strict version match preserved as an invariant and prefer fixing the upstream build that ships Aspire.Hosting.CodeGeneration.* libs stamped against 42.42.42.42 while the bundled Aspire.TypeSystem.dll is 13.3.0.0. If that's the position, I'm happy to drop this commit, keep only the diagnostic hardening (warnings, better error messages, WarnIfSharedAssemblyMismatch, log-level promotions), and file a separate build-infra issue. That would still convert Harden diagnostics when integration-assembly resolver discovers zero contributors #16729 from a silent failure into a loud, self-explanatory one without the loader compensating for build drift.
  3. Something else entirely — happy to be told.

Either way I'll wait for your call before pushing more.

@joperezr
Copy link
Copy Markdown
Member

joperezr commented May 4, 2026

I'm still not fully following how does someone reach this in real life. What is the main scenario here of a realistic app that would be broken by this?

@IEvangelist IEvangelist changed the base branch from release/13.3 to main May 4, 2026 20:30
When `aspire new` is run with a TypeScript template against a CLI bundle whose
`Aspire.TypeSystem` version doesn't match the integration assemblies on disk,
`ReflectionTypeLoadException` thrown by the resolver is swallowed at LogDebug
level and the user sees only:

  No code generator found for language: TypeScript
  No language support found for: typescript/nodejs

This makes the binary mismatch invisible. This change improves the diagnostic
chain at every layer without altering the success path:

* `CodeGeneratorResolver` / `LanguageSupportResolver` now log
  `ReflectionTypeLoadException` at Warning level and include
  `LoaderExceptions` text in the message.
* When an assembly named `Aspire.Hosting.CodeGeneration.*` produces zero
  contributors, log a Warning so the silent-failure case is visible.
* `AssemblyLoader.LoadAssemblies` performs an `Aspire.TypeSystem` version
  sanity check at startup against the libs directory and warns on mismatch.
* `LanguageService` / `CodeGenerationService` error messages now list the
  available languages, or point at the apphost server log + binary mismatch
  when zero are discovered.
* `PrebuiltAppHostServer` promotes apphost-server stdout/stderr capture from
  Trace to Debug/Information, so warnings emitted by the apphost server reach
  the default file log.

Adds `CodeGeneratorResolver.GetSupportedLanguages()` and an internal
test-only constructor on both resolvers that takes a synthetic assembly list,
along with covering tests.

Closes microsoft#16729

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@IEvangelist
Copy link
Copy Markdown
Member Author

IEvangelist commented May 4, 2026

Status update — rebased onto main and scope reduced to diagnostic hardening only.

The originally-reported 13.3.0 failure is no longer reproducible after clearing ~/.nuget and ~/.aspire on my machine. The investigation we did in this thread (the 42.42.42.4213.3.0.0 Aspire.TypeSystem skew, the AssemblyRef/AssemblyDef metadata dumps, etc.) was real for the cached state on disk, but the underlying cause was a stale local NuGet cache anomaly — likely from switching between PR / staging / stable 13.3.0 builds — rather than a bug in any shipped build.

So I've:

  1. Re-targeted from release/13.3 to main.
  2. Dropped the IntegrationLoadContext.Load change (the simple-name resolution for Aspire.TypeSystem) and its regression test. Without a confirmed shipped-build failure to defend against, that''s a runtime binding-policy change without a justified payload.
  3. Kept the diagnostic hardening — Warning-level logging on ReflectionTypeLoadException, zero-contributor warnings, the Aspire.TypeSystem version sanity check at startup, error messages that list available languages, and TraceDebug/Information for apphost-server stdout/stderr.

Net effect: the branch is now a single commit on top of main (ac298d4), 10 files, +530/-9, no behavior change on the success path. PR description updated; #16729 retitled and re-scoped to track the hardening rather than the original 13.3 cache report.

Thanks @eerhardt for pushing on the binding question — the back-and-forth is what made the "this isn''t a real shipped-build bug" conclusion clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Harden diagnostics when integration-assembly resolver discovers zero contributors

3 participants