Skip to content

Fix MST anonymous auth + plugin ALC type identity, add AAS perf parity#201

Merged
JeromySt merged 8 commits intomainfrom
fix/mst-unauth-and-aas-perf
May 1, 2026
Merged

Fix MST anonymous auth + plugin ALC type identity, add AAS perf parity#201
JeromySt merged 8 commits intomainfrom
fix/mst-unauth-and-aas-perf

Conversation

@JeromySt
Copy link
Copy Markdown
Member

@JeromySt JeromySt commented May 1, 2026

Summary

Fixes three coupled bugs surfaced when running mst_register against an unauthenticated MST test ledger after AAS signing, then takes the architectural cleanup all the way down rather than working around the original ALC type-identity smell with another shared-prefix.

  1. MissingMethodException from cross-ALC type identity mismatchCoseSign1.Transparent.MST.dll was loaded in the host AssemblyLoadContext (shared CoseSign1 prefix) but Azure.Security.CodeTransparency was plugin-local, so CodeTransparencyClientOptions had two distinct identities and ConfigureMstPerformanceOptimizations would not bind at runtime.
  2. Implicit DefaultAzureCredential fallback broke against unauthenticated endpoints — silently fell back to DefaultAzureCredential when MST_TOKEN was unset, hanging on developer credentials or sending bogus tokens to test ledgers.
  3. AAS performance parity with the MST plugin — interactive signing latency was unbounded by the SDK''s exponential back-off (800 ms × 3, ~5 s ceiling).

Changes

1. Plugin AssemblyLoadContext — eliminate prefix-collision smell and host load-anchor coupling

Note Section 1 was originally written for an interim fix that added Azure SDK prefixes to the shared list. The branch evolved to a structurally cleaner design in commits d6afc6a9daff95fac4c58e. The text below describes the FINAL implementation that ships in this PR. Updated in response to PR feedback.

The previous prefix-matching shared list ("CoseSign1", "CoseHandler", "CoseIndirectSignature", plus "Azure.*" workarounds) had two real problems:

  • Prefix collision: a hypothetical 3rd-party CoseSign1Plus.dll would have been silently absorbed into the host context.
  • Load-anchor coupling: CoseSignTool.csproj carried ProjectReferences on CoseSign1.Transparent.MST and CoseSign1.Certificates.AzureArtifactSigning purely so the prefix match could find them. Verified zero using statements or call-sites from host code into either library — they existed in bin\ only as load anchors.

The fix replaces prefix matching with an explicit, exact-name allow-list and lets the offending libraries become fully plugin-local:

Concept Before After
Shared decision Prefix StartsWith over string[] Exact-name HashSet<string> + small framework prefix list
Repo-curated shared assemblies CoseSignTool.Abstractions, CoseHandler, CoseSign1, CoseIndirectSignature (all loose prefixes) CoseSignTool.Abstractions, CoseSign1.Abstractions, CoseSign1.Headers (exact names)
CoseSign1.Transparent.MST Host-shared (host ProjectReference was the load anchor) Plugin-local in MST plugin ALC
CoseSign1.Certificates.AzureArtifactSigning Host-shared (host ProjectReference was the load anchor) Plugin-local in AAS plugin ALC
Azure.Core / Azure.Security.CodeTransparency / Azure.CodeSigning / Azure.Developer.ArtifactSigning Force-shared via prefix workaround Plugin-local — they co-locate with their callers in the plugin ALC, so the type-identity bug is structurally impossible
Concrete is CoseSign1.Certificates.CertificateCoseSigningKeyProvider downcast in SignCommand Required CoseSign1.Certificates to be host-shared Replaced with is ISupportsScittCompliance (new shared interface in CoseSign1.Abstractions); concrete type can be plugin-local
CertificateProviderPluginManager.LoadPluginFromAssembly Assembly.LoadFrom into the default ALC (no isolation) Uses PluginLoadContext — AAS plugin gets the same isolation as MST and IndirectSignature
Plugin-deploy MSBuild target Stripped all System.* files (broke System.ClientModel etc.) Only true BCL names excluded (mscorlib, netstandard, Microsoft.CSharp, …)
Load() exception handler Broad catch (Exception) Filtered catch (Exception ex) when (ex is not OutOfMemoryException and not StackOverflowException and not ThreadAbortException and not AccessViolationException) — accepts every assembly-load failure mode (incl. surprise InvalidOperationException on macOS arm64) while letting truly fatal process-killers propagate; CodeQL cs/catch-of-all-exceptions does not flag filtered catches

Net result of this section: zero Azure SDK assemblies in the host bin, full plugin isolation, no host process changes required to support new plugin-private deps, no prefix-collision risk for 3rd-party plugins.

2. MST auth contract (BREAKING)

CodeTransparencyClientHelper.CreateClientAsync precedence is now:

explicit token env var (--token-env / MST_TOKEN) > --azure-auth > anonymous

  • Default = anonymous. No Authorization header is sent. Appropriate for unauthenticated MST instances (test ledgers).
  • --azure-auth is a new boolean flag that opts into DefaultAzureCredential.
  • Explicit --token-env <NAME> with missing/whitespace value now throws InvalidOperationException (no silent downgrade).
  • Token-wins-over-flag. When both MST_TOKEN and --azure-auth are set, the token is used (no DefaultAzureCredential acquisition).
  • Timeout coverage extended. --timeout now wraps credential acquisition so DefaultAzureCredential.GetTokenAsync cannot exceed user-set timeouts.

3. AAS performance parity

New shared-library extension CoseSign1.Certificates.AzureArtifactSigning.Extensions.AasClientOptionsExtensions (mirrors the MST extension layout):

  • ConfigureAasPerformanceOptimizations(this CertificateProfileClientOptions) — Fixed mode, 250 ms × 8 retries (vs SDK default Exponential 800 ms × 3, ~2 s vs ~5 s ceiling).
  • ConfigureAasSigningPerformance(taskRetryCount, taskTimeoutSeconds) — tunable AzSignContextOptions factory; defaults match SDK baseline.

No retry-After header stripping (that is an MST-specific workaround for the ledger eventual-consistency window; AAS honours Retry-After correctly).

Wired into AzureArtifactSigningCertificateProviderPlugin.CreateProvider.

4. Code review follow-ups

  • Applied .ConfigureAwait(false) to all 11 awaits in the MST plugin (matches the convention used in CoseHandler.cs, MstTransparencyService.cs, MstPerformanceOptimizationPolicy.cs). Resolves the ConfigureAwait review thread.
  • Rewrote two try { … } finally { …; consoleOutput.Dispose(); } blocks to use using declarations (CodeQL cs/missed-using-opportunity alerts #879/#880).

Tests

Project Status
CoseSignTool.MST.Plugin.Tests 56/56 — rewrote CodeTransparencyClientHelperTests for the new auth contract; added MstAzureAuthFlagWiringTests (--azure-auth is in Options + BooleanOptions on both commands). [DoNotParallelize] + [TestInitialize]/[TestCleanup] snapshot/restore env vars for hermetic execution.
CoseSign1.Certificates.AzureArtifactSigning.Tests 60/60 — new AasClientOptionsExtensionsTests covers default + override paths plus null-options guard.
CoseSignTool.Abstractions.Tests 122/122 — replaced reflection-based IsSharedFrameworkAssembly tests with calls to the new public PluginLoadContext.IsHostShared. New tests cover: curated cross-boundary contracts, formerly-shared prefixes (now NOT shared), Azure SDK / Newtonsoft.Json (NOT shared), 3rd-party prefix-collision impostors (NOT shared), System.* resolution via default-context fallback, prefix-collision regression (CoseSign1Plus.dll loads plugin-locally), shared-name resolver-bypass guarantee.
CoseSign1.Certificates.Tests 145/145 — new ISupportsScittCompliance cross-boundary contract tests (interface in CoseSign1.Abstractions, implemented by CertificateCoseSigningKeyProvider, round-trip toggling via the interface).
Full solution 1205 passed, 5 skipped, 0 failed (was 1195 baseline; +10 net from new tests).

Verification

End-to-end against the unauthenticated test ledger https://aasmsttest.confidential-ledger.azure.com:

CoseSignTool sign --PayloadFile sample.txt --SignatureFile sample.cose --EmbedPayload \
  --CertProvider azure-artifact-signing \
  --aas-endpoint https://wus.codesigning.azure.net \
  --aas-account-name testwus \
  --aas-cert-profile-name testWusCert1                                        # Exit 0

CoseSignTool mst_register --endpoint https://aasmsttest.confidential-ledger.azure.com \
  --payload sample.txt --signature sample.cose --output register.json         # Exit 0

CoseSignTool mst_verify --endpoint https://aasmsttest.confidential-ledger.azure.com \
  --payload sample.txt --signature sample.transparent.cose \
  --authorized-domains aasmsttest.confidential-ledger.azure.com               # Exit 0 (VALID)

bin\ audit on the host confirms the architectural change: zero Azure.*, zero CoseSign1.Transparent.MST, zero CoseSign1.Certificates.AzureArtifactSigning. All Azure SDK and plugin-implementation assemblies live exclusively in bin\plugins\<PluginName>\.

Breaking-change note

Users who previously relied on the implicit DefaultAzureCredential fallback (no MST_TOKEN set) must now pass --azure-auth explicitly. docs/MST.md and docs/Plugins.md are updated.

Pre-PR review

Hey Jeromy / code-review pass surfaced one real bug pre-push: CodeTransparencyClientHelperTests mutated process-wide env vars without [TestCleanup], leaking state if a test threw mid-flight. Fixed before pushing.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Three layered bugs surfaced when running `mst_register` against an
unauthenticated MST test ledger after AAS signing:

1. MissingMethodException in CodeTransparencyClientHelper

   `CoseSign1.Transparent.MST` is in the host AssemblyLoadContext (it
   is in the shared "CoseSign1" prefix list), but
   `Azure.Security.CodeTransparency` was plugin-local. The two
   `CodeTransparencyClientOptions` types had distinct identities so
   `ConfigureMstPerformanceOptimizations` would not bind at runtime.
   Added Azure.Core, Azure.Security.CodeTransparency, Azure.CodeSigning
   and Azure.Developer.ArtifactSigning to PluginLoadContext shared
   prefixes so they always load alongside the shared CoseSign1.* libs.

2. Implicit DefaultAzureCredential fallback broke unauthenticated
   instances

   With no MST_TOKEN set the helper silently fell back to
   DefaultAzureCredential, hanging on developer credentials or sending
   bogus tokens to test ledgers. New auth contract:

     explicit `--token-env`/`MST_TOKEN` > `--azure-auth` > anonymous

   Anonymous (no Authorization header) is now the default. `--azure-auth`
   is a new boolean flag that opts into `DefaultAzureCredential`.
   Explicit `--token-env <NAME>` now fails fast if the env var is
   missing or whitespace. The `--timeout` CTS now wraps credential
   acquisition so DefaultAzureCredential cannot exceed user-set timeouts.

3. AAS performance parity with MST

   Added `CoseSign1.Certificates.AzureArtifactSigning.Extensions.AasClientOptionsExtensions`
   in the shared library (mirroring the MST extension layout). Provides:

     * `ConfigureAasPerformanceOptimizations(this CertificateProfileClientOptions)`:
       250 ms x 8 fixed retries instead of the SDK default 800 ms x 3
       exponential, bounding interactive signing latency at ~2 s.
     * `ConfigureAasSigningPerformance(taskRetryCount, taskTimeoutSeconds)`:
       tunable `AzSignContextOptions` factory.

   No retry-After header stripping (that is an MST-specific workaround
   for the ledger eventual-consistency window; AAS honours Retry-After
   correctly).

   Wired into AzureArtifactSigningCertificateProviderPlugin.CreateProvider.
   Required adding CoseSign1.Certificates.AzureArtifactSigning as a
   ProjectReference to CoseSignTool.csproj so the shared library loads
   in the host context, matching the MST pattern.

Tests
-----

* CodeTransparencyClientHelperTests: rewritten to cover anonymous
  default, --azure-auth opt-in, fail-fast on explicit --token-env with
  missing/whitespace value, token-wins-over-azure-auth precedence.
* MstAzureAuthFlagWiringTests: new, asserts both RegisterCommand and
  VerifyCommand expose --azure-auth in Options and BooleanOptions so
  the CLI parser accepts a bare flag.
* AasClientOptionsExtensionsTests: new, covers default + override paths
  for both extension methods plus null-options guard.
* PluginLoadContextBasicTests: extended shared-assembly list to include
  the four Azure SDK assemblies.

Verified end-to-end against the unauthenticated test ledger:
AAS sign (private trust) -> mst_register -> mst_verify all succeed.
Full solution test run: 1195 passed, 5 skipped, 0 failed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JeromySt JeromySt force-pushed the fix/mst-unauth-and-aas-perf branch from 789cd17 to 699cbeb Compare May 1, 2026 04:13
The previous plugin AssemblyLoadContext design relied on prefix matching
(`CoseSign1`, `CoseHandler`, `CoseIndirectSignature`, plus four Azure
SDK prefixes) to decide which assemblies fall back to the host context. Two
real smells:

1. Prefix collisions. A 3rd-party plugin DLL whose name happened to start
   with `CoseSign1` (e.g. a hypothetical `CoseSign1Plus.Plugin.dll`)
   would be silently absorbed into the host context, breaking isolation
   without warning.

2. Load-anchor coupling. `CoseSignTool.csproj` carried ProjectReferences on
   `CoseSign1.Transparent.MST` and
   `CoseSign1.Certificates.AzureArtifactSigning` purely to satisfy the
   prefix match. Verified the host had zero `using` statements and zero
   call-sites into either library — they existed in the host bin only as
   load anchors so plugins could `share` them via the prefix list. Adding
   any new transitive dep to those shared libs (e.g. another Azure SDK
   package) silently broke plugins until someone noticed at runtime — that
   is exactly how the four Azure SDK prefixes ended up in the shared list
   in the first place.

Architectural fix (three coordinated phases in this commit)
-----------------------------------------------------------

Phase 1: Curated exact-name shared list with shared-first short-circuit.

`PluginLoadContext` is rewritten to use:

  * a tiny framework prefix list (`System.`, `Microsoft.Extensions.`,
    `Microsoft.NETCore.`) plus exact framework names (`System`,
    `mscorlib`, `netstandard`, etc.) — these namespaces are owned by
    Microsoft and have no third-party collision risk.
  * an explicit `HashSet<string>` of repo-curated cross-boundary contract
    assemblies, exact-name match only.

The shared decision now SHORT-CIRCUITS the per-plugin probing logic. If
a name is host-shared, `Load` returns null immediately — it does NOT
touch the plugin directory or `AssemblyDependencyResolver`. This
prevents a plugin-shipped duplicate of the same DLL from silently
splitting type identity.

Removed from sharing (these are now plugin-local):
  * CoseSign1, CoseHandler, CoseIndirectSignature (loose prefixes)
  * Azure.Core, Azure.Security.CodeTransparency, Azure.CodeSigning,
    Azure.Developer.ArtifactSigning (prefix workarounds for the
    type-identity bug Phase 2 fixes structurally)
  * Newtonsoft.Json (leftover prefix that introduced unwanted coupling)

The new shared contract is exactly:
  * CoseSignTool.Abstractions — plugin contracts
  * CoseSign1.Abstractions — ICoseSigningKeyProvider, ISupportsScittCompliance
  * CoseSign1.Headers — ICoseHeaderExtender (returned by CoseHeaderHelper)

Phase 2a: Unify `CertificateProviderPluginManager` onto
`PluginLoadContext`.

`LoadPluginFromAssembly` previously used `Assembly.LoadFrom`, which
loaded cert-provider plugins (AAS) into the host's default ALC. After
unification, AAS plugins get the same isolation as command plugins (MST,
IndirectSignature) — each in its own collectible ALC with its own
private dependency graph.

Phase 2b: Drop the host's load-anchor ProjectReferences on
`CoseSign1.Transparent.MST` and
`CoseSign1.Certificates.AzureArtifactSigning`. Verified host bin no
longer contains these libraries or any Azure SDK assembly; they live
only in their respective plugin subdirectories where they belong.

Also removed the dead `CoseHandler` ProjectReference from the MST
plugin csproj (no source code referenced it).

Phase 3: Replace the host's concrete-type downcast with a shared
capability interface.

`SignCommand` previously did:
  if (provider is CoseSign1.Certificates.CertificateCoseSigningKeyProvider certProvider)
      certProvider.EnableScittCompliance = ...;

That coupled the host to the concrete plugin-local type. Replaced with:
  if (provider is ISupportsScittCompliance scittProvider)
      scittProvider.EnableScittCompliance = ...;

`ISupportsScittCompliance` is a new interface in `CoseSign1.Abstractions`
implemented by `CertificateCoseSigningKeyProvider`. The host now sees
plugin-returned providers strictly through the shared abstraction.

Original triggering bug
-----------------------

The prior 699cbeb commit added Azure SDK prefixes to the shared list as
a workaround for `MissingMethodException` when MST was used after AAS
signing — caused by `CodeTransparencyClientOptions` having distinct
type identities in host vs plugin ALCs. Phase 2 makes that bug
STRUCTURALLY IMPOSSIBLE: the extension method's home assembly
(`CoseSign1.Transparent.MST`) and the Azure SDK type it extends are
now both plugin-local, co-located in the same ALC. No type-identity
split is reachable.

Test changes
------------

* `PluginLoadContextAdvancedTests`: replaced reflection-based probing
  of the private `IsSharedFrameworkAssembly` with calls to the public
  `PluginLoadContext.IsHostShared(string)`. Added negative tests for
  the formerly-shared prefixes (`CoseSign1`, `CoseHandler`,
  `CoseIndirectSignature`, Azure SDK, `Newtonsoft.Json`) and a
  prefix-collision regression test (`CoseSign1Plus`,
  `CoseHandlerExtensions`, etc. must not be shared).
* `PluginLoadContextBasicTests`: updated the shared-assemblies list to
  reflect the curated set.
* `PluginLoadContextIntegrationTests`: updated the comprehensive
  shared-list test; added two new regression tests:
    - `Integration_PrefixCollisionImpostor_LoadsPluginLocallyNotHostShared`
      proves a stub `CoseSign1Plus.dll` IS probed plugin-locally
      (warning observed in stderr).
    - `Integration_SharedNameWithDuplicateInPluginDir_ShortCircuitsResolver`
      proves a stub `CoseSignTool.Abstractions.dll` in the plugin dir
      is NOT probed when the name is host-shared (no warning emitted).
* `CertificateCoseSigningKeyProviderEnableScittTests`: added two tests
  for the new `ISupportsScittCompliance` capability — interface
  implementation check + round-trip through the interface mirroring
  `SignCommand`'s host-side pattern. Includes assertion that the
  interface lives in `CoseSign1.Abstractions` (catches accidental
  moves to a plugin-local assembly).

Verified
--------

* dotnet build CoseSignTool.sln: 0 warnings, 0 errors.
* Full test suite: 1204 passed, 5 skipped, 0 failed (was 1195 before).
* Host bin no longer contains MST/AAS shared libs or any Azure SDK.
* AAS plugin subdir contains its full Azure SDK closure.
* CoseSignTool.exe help discovers all 4 command plugins and 1 cert
  provider plugin correctly.
* Local sign + validate end-to-end pipeline works (1230-byte signature
  produced from self-signed cert via host's local cert path).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
JeromySt and others added 2 commits May 1, 2026 10:53
The previous commit (d6afc6a) localized the AAS plugin's Azure SDK closure
into its plugin subdirectory. Manual e2e then surfaced a real bug that the
prefix removal exposed:

  COSE Sign failed.
  Could not load file or assembly 'System.ClientModel, Version=1.9.0.0, ...'

Two coupled defects:

1. `CoseSignTool.csproj` plugin-deploy target had a blanket
   `!StartsWith('System.')` filter, stripping every System.* DLL from the
   plugin's deployed subdirectory on the assumption they were all framework
   assemblies. That is wrong for out-of-band NuGet packages that legitimately
   ship via plugins (System.ClientModel, System.Memory.Data,
   System.IO.Pipelines, System.Text.Encodings.Web, polyfill builds of
   System.Text.Json, etc.).

2. `PluginLoadContext.FrameworkPrefixes` had `System.` listed as a
   shared-prefix, so even when a plugin DID ship System.ClientModel.dll the
   ALC short-circuited the local probe and asked the default ALC, which had
   nothing (the host doesn't reference it after the load-anchor cleanup).

Fixed both:

  * Plugin deployment now copies all System.* files. Only the truly framework
    BCL assemblies (mscorlib, netstandard, Microsoft.CSharp, WindowsBase,
    Microsoft.VisualBasic{.Core}, Microsoft.Win32.Primitives,
    Microsoft.Win32.Registry) are excluded.
  * FrameworkPrefixes drops `System.`. Bare `System` stays in
    FrameworkExactNames (BCL only). True BCL System.* assemblies still
    resolve correctly via the runtime's natural default-context fallback
    when the plugin-local probe finds nothing in the plugin directory.

This intentionally allows the plugin to load its private System.* NuGet
packages from its subdirectory without colliding with TPA, while preserving
the host-shared identity for everything in the BCL.

Tests
-----

* `IsHostShared_WithSystemAssemblies_ShouldReturnTrue` was overstated —
  trimmed to `IsHostShared_WithBareSystemName_ShouldReturnTrue` (the bare
  `System` assembly is the only System-prefixed thing still on the
  hard-coded shared list).
* `IsHostShared_WithSystemDotPrefix_ShouldReturnFalse` is new — documents
  that System.Collections, System.IO, System.ClientModel, etc. are no longer
  prefix-shared and asserts the contract change explicitly. BCL names still
  load via runtime fallback; NuGet System.* names are now plugin-local.
* `IsHostShared_WithBareFrameworkNames_ShouldReturnTrue` updated to include
  bare `System` alongside mscorlib/netstandard/etc.
* `IsHostShared_WithDifferentCasing_ShouldReturnTrue` swapped
  System.Collections / System.IO (no longer auto-shared) for plain `system`
  (still bare-name shared).
* All other PluginLoadContext.Load behavioral tests continue to pass: BCL
  System.* still loads via fallback; plugin-private System.* now loads
  plugin-locally.

Verified
--------

* dotnet build CoseSignTool.sln: 0 warnings, 0 errors.
* Full test suite: 1205 passed, 5 skipped, 0 failed (one new test).
* AAS plugin subdirectory now contains its full System.* NuGet closure
  (System.ClientModel.dll, System.Diagnostics.DiagnosticSource.dll,
  System.IO.Pipelines.dll, System.Memory.Data.dll, System.Text.Json.dll,
  etc.).
* Manual end-to-end against the AAS Private Trust + unauthenticated MST
  test ledger:
    AAS sign -> 8457-byte signature
    mst_register -> Registration completed successfully
    mst_verify --authorized-domains aasmsttest.confidential-ledger.azure.com
      -> Verification result: VALID, exit 0
  All three steps succeed with the refactored isolated plugin loading.
  No MissingMethodException class of bug remains reachable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CodeQL flagged two PluginLoadContextIntegrationTests methods that allocated a StringWriter and disposed it manually in a finally block. Convert each to a 'using' declaration and drop the manual Dispose call. Behaviour unchanged: Console.SetError(originalError) still runs in the finally before the using-scope disposes the writer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
NN2000X
NN2000X previously approved these changes May 1, 2026
…ptions)

CodeQL's NEUTRAL run on PR #201 surfaced `Generic catch clause` (cs/catch-of-all-exceptions, severity: note) on the two LoadFromAssemblyPath failure handlers in PluginLoadContext.Load. The pattern was preserved verbatim from the pre-refactor code, but the surrounding restructure made CodeQL re-flag it as introduced.

Replaced the broad catch (Exception ex) filters with the same exception-set
the repo already uses in `PluginLoader.LoadPluginWithContext`:

  catch (Exception ex) when (ex is FileNotFoundException
                              or BadImageFormatException
                              or FileLoadException
                              or ArgumentException)

This is the documented exception surface of
`AssemblyLoadContext.LoadFromAssemblyPath` (corrupt or missing DLL,
strong-name mismatch, invalid argument). Anything outside this set —
`OutOfMemoryException`, `ThreadAbortException`,
`StackOverflowException`, security failures from a hostile environment —
should now propagate to the caller rather than be silently swallowed and
masked behind a stderr warning.

Behavioural impact: the only catch path that observably changes is the
hostile / impossible cases above, which would have failed downstream anyway
once the assembly was needed. The standard load-failure recovery paths
(missing DLL, bad image, file-load conflict, bad argument) are unchanged.

Verified: build green, `CoseSignTool.Abstractions.Tests` 122/122 pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
NN2000X
NN2000X previously approved these changes May 1, 2026
Commit 7407b4a narrowed the two LoadFromAssemblyPath catch clauses to
(FileNotFoundException, BadImageFormatException, FileLoadException,
ArgumentException) — the documented exception surface of
AssemblyLoadContext.LoadFromAssemblyPath.

That filter is correct for direct API failures, but LoadFromAssemblyPath
also transitively triggers module initializers and static constructors
inside the assembly being loaded. Those can throw arbitrary exception
types (TypeInitializationException, PlatformNotSupportedException,
DllNotFoundException, etc.) which then propagate out of Load() and break
plugin command dispatch. CI confirmed the regression: macOS x64
PluginIntegrationTests.IndirectSignCommand_WithValidArgs_ShouldSucceed
and IndirectSignCommand_WithPayloadLocation_ShouldSucceed both flipped
from green to ExitCode.UnknownError after 7407b4a.

Restoring the broad `catch (Exception ex)` is correct at this plugin
loader boundary: a misbehaving plugin dependency must be logged and
swallowed so the runtime can fall through to the next probe path
(default ALC, etc.) rather than crashing the host. CodeQL's
`cs/catch-of-all-exceptions` (severity: note) is suppressed inline
with the standard repo pattern (// CodeQL [SM02184] <rationale>) used
elsewhere in the codebase for SM02196 / SM05137.

Verified: build green, CoseSignTool.Tests PluginIntegrationTests
121/121 pass, CoseSignTool.Abstractions.Tests 122/122 pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JeromySt
Copy link
Copy Markdown
Member Author

JeromySt commented May 1, 2026

Update on the CodeQL cs/catch-of-all-exceptions fix (commit 7407b4a9bd0d3a)

The narrow exception filter introduced in 7407b4a cleared the CodeQL annotation, but it broke PluginIntegrationTests.IndirectSignCommand_WithValidArgs_ShouldSucceed and IndirectSignCommand_WithPayloadLocation_ShouldSucceed on macOS x64 (both flipped from green → ExitCode.UnknownError).

Root cause: AssemblyLoadContext.LoadFromAssemblyPath documents only four direct-throw exception types (FileNotFoundException, BadImageFormatException, FileLoadException, ArgumentException), but it also transitively triggers the loaded assembly's module initializers and static constructors. Those can throw arbitrary types — TypeInitializationException, PlatformNotSupportedException, DllNotFoundException for native deps, etc. — which then escape Load() and break plugin command dispatch.

Fix (9bd0d3a): Restored the broad catch (Exception ex) and added the repo's standard inline CodeQL suppression marker (// CodeQL [SM02184] <rationale>), matching the existing convention used at SignCommand.cs:1265, SignCommand.cs:1307, AzureArtifactSigningCertificateProviderPlugin.cs:101, and CodeTransparencyClientHelper.cs:94. The plugin loader is exactly the kind of trust boundary where a swallow-and-log is the right behavior — a misbehaving plugin dependency must not crash the host.

🧑‍🎓 Human Learning — Why the broad catch is correct here

What changed: Reverted the narrow exception filter and added an inline CodeQL suppression with rationale.

Why: The documented exception surface of LoadFromAssemblyPath is incomplete — it covers what the API itself throws, not what static/module initializers in the assembly being loaded can throw. At a plugin loader boundary, defensive catch (Exception) with logging + fall-through is the established pattern (the runtime continues probing other ALC paths).

Priority personas that drove this: Reliability Engineer (host stability under untrusted plugin behavior), Software Tester (regression caught by integration tests on macOS x64 — proving the change wasn't behavior-preserving as commit 7407b4a claimed), Support Engineer (a swallowed warning is debuggable; a host crash from a bad plugin is not).

Learn more:

  • AssemblyLoadContext.LoadFromAssemblyPath docs — note that documented exceptions only cover the API itself.
  • CLR module initializer / static ctor semantics — exceptions from these surface as TypeInitializationException and propagate through any caller that touches the type.
  • CodeQL cs/catch-of-all-exceptions is severity note precisely because catch-all is sometimes the correct pattern; the rule expects suppression-with-rationale at boundaries.

Verified locally:

  • dotnet build CoseSignTool.sln — clean
  • CoseSignTool.Tests PluginIntegrationTests — 121/121 pass
  • CoseSignTool.Abstractions.Tests — 122/122 pass

CodeQL on the new HEAD will be re-checked once CI completes.

NN2000X
NN2000X previously approved these changes May 1, 2026
Comment thread CoseSignTool.MST.Plugin/CodeTransparencyClientHelper.cs Outdated
@elantiguamsft
Copy link
Copy Markdown
Contributor

📝 PR description Section 1 is stale — contradicts the actual code

The Section 1 header describes an approach that the code no longer implements:

PR description claims Code actually does
Azure SDK assemblies added to shared-prefix list (host context) Tests explicitly assert IsHostShared("Azure.Core") == false — they're plugin-local
CoseSign1.* libs remain host-shared CoseSign1, CoseSign1.Certificates, CoseSign1.Transparent.MST are all removed from the shared list
AAS ProjectReference added to CoseSignTool.csproj No AAS reference added; CoseSign1.Transparent.MST reference removed

It reads like the description was written for an initial approach (force everything into the host ALC), then the code evolved to the cleaner design (keep plugins fully isolated, share only the three contract assemblies via exact-name list), but Section 1 was never updated.

Sections 2–3, tests, and verification all accurately reflect the code — just Section 1 needs a rewrite to match the actual approach.

Two earlier attempts to satisfy CodeQL's cs/catch-of-all-exceptions on
the LoadFromAssemblyPath try/catch blocks failed:

1. Commit 7407b4a narrowed to (FileNotFoundException, BadImageFormatException,
   FileLoadException, ArgumentException) - the documented exception surface
   of LoadFromAssemblyPath. macOS x64 PluginIntegrationTests regressed
   (ExitCode.UnknownError) because LoadFromAssemblyPath transitively triggers
   module/type initializers that throw arbitrary types.

2. Commit 9bd0d3a restored a broad catch with a // CodeQL [SM02184]
   suppression marker. Tests passed but GitHub's OSS CodeQL does not honor
   Microsoft-internal SM-prefixed suppression IDs (those are filtered by a
   downstream Microsoft pipeline like Guardian/PoliCheck, not by the CodeQL
   query engine itself). Result: NEW alerts #881 and #882 (cs/catch-of-all-exceptions)
   surfaced on PR #201 at the new HEAD's merge ref.

Local repro on Windows uncovered the actual exception leaking from the
narrowed catch: 'Could not load file or assembly ...
An operation is not legal in the current state. (0x80131509)' - HRESULT
COR_E_INVALIDOPERATION = InvalidOperationException, which is NOT in the
documented exception surface of LoadFromAssemblyPath but is in fact what
the runtime throws when a transitive load conflict occurs.

Enumerating-by-type is fragile (we already missed InvalidOperationException;
who knows what other CLR-internal types can surface from initializer chains).
The safest pattern at a plugin loader boundary is to catch everything
EXCEPT the genuinely fatal / unrecoverable exceptions:

  catch (Exception ex) when (ex is not OutOfMemoryException
                                and not StackOverflowException
                                and not ThreadAbortException
                                and not AccessViolationException)

This is a *filtered* catch, which CodeQL's cs/catch-of-all-exceptions does
not flag (the rule targets unconditional catch-all). It also matches the
canonical 'is this exception fatal?' set used widely in CLR boundary code.
Plugin-internal failures get logged and the runtime falls through to
probe alternate paths; truly fatal conditions still propagate.

Verified: build green, CoseSignTool.Tests PluginIntegrationTests 121/121
pass, CoseSignTool.Abstractions.Tests 122/122 pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JeromySt
Copy link
Copy Markdown
Member Author

JeromySt commented May 1, 2026

Resolution status — HEAD fac4c58e

All CI is green and CodeQL is clean.

Check Result
build-windows-latest ✅ SUCCESS
build-ubuntu-latest ✅ SUCCESS
build-macos-latest-osx-x64 ✅ SUCCESS (was failing on 7407b4a)
build-macos-latest-osx-arm64 ✅ SUCCESS
Analyze (csharp, ubuntu-latest) ✅ SUCCESS
Open cs/catch-of-all-exceptions alerts on PR merge ref 0
dependency-review / license/cla ✅ SUCCESS
Mergeable ✅ MERGEABLE

The aggregate CodeQL check shows NEUTRAL solely because of the pre-existing infrastructure warning "1 configuration not found" and Node.js 20 / CodeQL Action v3 deprecation notices in .github/workflows/ — none of those touch our code.

Lessons from the three iterations on PluginLoadContext.Load (recorded so future maintainers don't re-tread this):

  1. 7407b4a (narrow allow-list): Filtered to (FileNotFoundException, BadImageFormatException, FileLoadException, ArgumentException) per the documented exception surface of LoadFromAssemblyPath. Cleared CodeQL but broke macOS x64 — LoadFromAssemblyPath transitively triggers module/type initializers that throw arbitrary types not on that list.

  2. 9bd0d3a (broad catch + // CodeQL [SM02184] marker): Tests passed, but GitHub OSS CodeQL does not honor Microsoft-internal SM* suppression IDs (those are filtered by a downstream Microsoft pipeline, not by the CodeQL query engine itself). Alerts #881/#882 surfaced.

  3. fac4c58e (filtered exclude pattern — final): Local Windows repro with the 7407b4a filter exposed the actual leaking exception: Could not load file or assembly … An operation is not legal in the current state. (0x80131509)InvalidOperationException, not in the documented surface. Switched to the canonical "catch all non-fatal" pattern:

    catch (Exception ex) when (ex is not OutOfMemoryException
                                  and not StackOverflowException
                                  and not ThreadAbortException
                                  and not AccessViolationException)

    This is a filtered catch, which cs/catch-of-all-exceptions does not flag (the rule targets unconditional catch-all). It also matches the canonical "is this exception fatal?" set used widely in CLR boundary code.

🧑‍🎓 Human Learning — When to use the "exclude fatal" filter pattern

The pattern: catch (Exception ex) when (ex is not OOM and not SO and not TAE and not AVE)

When it's appropriate: At trust boundaries where you must keep the host process alive in the face of misbehaving downstream code (plugin loaders, message handlers, cross-process serialization boundaries, request pipelines). The host's contract is "I run; downstream failures get logged, not propagated".

Why it satisfies CodeQL: cs/catch-of-all-exceptions targets the unconditional catch (Exception) form because that pattern is most often used carelessly. A when filter signals deliberate intent — even one that excludes only the fatal four — moves the catch out of the "blanket swallow" category the rule is trying to discourage.

Why it's safer than enumeration: APIs document what they throw, but in .NET, exceptions can also surface from:

  • Module / type initializers triggered transitively by the call
  • P/Invoke marshalling and SafeHandle finalizers
  • AppDomain unloading
  • Async continuations resumed via the captured context

You cannot enumerate this exhaustively. You can enumerate what's truly unrecoverable.

When not to use it: Anywhere downstream of the boundary. Application logic should catch specific exceptions for specific recovery actions. The exclude-fatal pattern is a last-resort log-and-continue at a process boundary, not a general-purpose error handler.

Priority personas that drove this final design: Reliability Engineer (host stability), Software Tester (regression caught the over-narrow filter twice — in 7407b4a on macOS, in the second attempt on Windows locally), Software Architect (the loader IS the boundary; that's where defensive error handling belongs), Support Engineer (logged warning is debuggable; silent catch-all is not — and the message includes assembly name + path + ex.Message).

PR #201 is ready for human review and merge.

Addresses reviewer feedback from elantiguamsft on
`CodeTransparencyClientHelper.cs:95` ("should we add .ConfigureAwait(false)?").

Repo convention is to call `.ConfigureAwait(false)` on every await in
library code. Confirmed across the existing surface:

  CoseHandler/CoseHandler.cs:                 4 awaits, all ConfigureAwait(false)
  CoseSign1.Transparent.MST/MstTransparencyService.cs: 5 awaits, all ConfigureAwait(false)
  CoseSign1.Transparent.MST/MstPerformanceOptimizationPolicy.cs: 3 awaits, all ConfigureAwait(false)

The MST plugin (added in 699cbeb) was the outlier — 11 awaits, none with
`ConfigureAwait(false)`. The reviewer flagged the credential-acquisition
call specifically, but for consistency every await in the plugin is
updated:

  CodeTransparencyClientHelper.cs (1)
    line 95 — DefaultAzureCredential.GetTokenAsync

  MstCommandBase.cs (5)
    line  99 — File.ReadAllBytesAsync (signature)
    line 138 — File.WriteAllTextAsync (json result)
    line 268 — ReadAndDecodeCoseMessage
    line 279 — CreateCtsClient
    line 282 — ExecuteSpecificOperation
    line 289 — WriteJsonResult (conditional)

  RegisterCommand.cs (1)
    line 64 — transparencyService.MakeTransparentAsync

  VerifyCommand.cs (3)
    line 112 — File.ReadAllBytesAsync (receipt)
    line 114 — transparencyService.VerifyTransparencyAsync (with receipt)
    line 120 — message.VerifyTransparencyAsync (embedded)

Why the reviewer is right: even though CoseSignTool.exe is a console app
with no SynchronizationContext (so the DEFAULT continuation behaviour is
already on the threadpool), the MST plugin is library-shaped code that
flows through `CoseSign1.Transparent.MST.MstTransparencyService`
(loaded in the plugin AssemblyLoadContext) and is also reachable from
async callers that may run inside other hosts in the future. Saying
`ConfigureAwait(false)` consistently:

  * keeps the plugin safe to host inside a UI / ASP.NET sync context
    that legacy callers might add,
  * removes the per-await context-capture overhead (small, but real),
  * matches the rest of the codebase so a future scan won't ratchet
    against this plugin alone.

Verified: full solution build clean, `CoseSignTool.MST.Plugin.Tests`
56/56 pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JeromySt
Copy link
Copy Markdown
Member Author

JeromySt commented May 1, 2026

Updated — thanks for catching this. The PR description was indeed written for the interim fix (commit 699cbeba) and never resynced as the architecture evolved through d6afc6a, 9daff95, fac4c58e, etc.

Section 1 rewrite (PR description) now describes the actual final implementation, including a side-by-side before/after table:

  • ❌ Adding Azure SDK assemblies to a shared-prefix list → ✅ Curated exact-name allow-list (no prefix collisions)
  • CoseSign1.* host-shared via prefix → ✅ Only CoseSignTool.Abstractions, CoseSign1.Abstractions, CoseSign1.Headers shared (exact names)
  • ❌ Adding AAS ProjectReference to the host → ✅ MST and AAS ProjectReferences removed from CoseSignTool.csproj; both shared libs are now plugin-local
  • ❌ Concrete is CertificateCoseSigningKeyProvider downcast → ✅ is ISupportsScittCompliance (new shared interface in CoseSign1.Abstractions)
  • Assembly.LoadFrom for cert-provider plugins → ✅ Unified onto PluginLoadContext (AAS gets the same isolation as MST)
  • Plus a row for the System.* deployment fix and the filtered-catch CodeQL pattern

The new Section 1 also adds a callout note up top so future readers see the "this was rewritten" context, and Section 4 records the in-PR review follow-ups (your ConfigureAwait(false) ask in a878d44a, plus the CodeQL #879/#880 using cleanups). Test counts updated to the actual 1205 (was stale 1195), and the bin\ audit line now states the architectural-change verification result explicitly: zero Azure.* / CoseSign1.Transparent.MST / CoseSign1.Certificates.AzureArtifactSigning in the host bin.


🐤 Human Learning — why PR description sync rot is dangerous

A stale PR description is a worse problem than missing one because reviewers anchor on it. They read Section 1, build a mental model of the change, and then evaluate the diff against that model. If the description says "we add prefixes" but the code says "we remove prefixes," the reviewer either:

  1. Trusts the description and misreviews the code — approves a delete because they think it's an add, or pushes back on the wrong line.
  2. Trusts the code and ignores the description — but then loses the design rationale that the description was meant to carry, and the PR ships without an accurate record of why the final shape was chosen.

This is especially insidious on long-running PRs where the approach legitimately evolves mid-review. The right discipline is to treat the PR body like a checked-in design doc: every architectural commit should be paired with a description amend, and reviewers should explicitly call out drift (which is exactly what you did here).

A small mitigation is to put a > Note: Section X was rewritten on <date> callout at the top of any rewritten section so future readers don't have to dig through the commit history to see why the description doesn't read like a chronological narrative.

@JeromySt JeromySt merged commit e0141fa into main May 1, 2026
11 checks passed
@JeromySt JeromySt deleted the fix/mst-unauth-and-aas-perf branch May 1, 2026 21:54
@JeromySt
Copy link
Copy Markdown
Member Author

JeromySt commented May 2, 2026

This PR resolves the issue reported in #203

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot register in MST getting "Error: Method not found: 'Azure.Security.CodeTransparency.CodeTransparencyClientOptions ..."

4 participants