Roslyn-based .NET assembly walker that turns compiled .dll + .pdb + .xml triples into a strongly-typed API catalog (types, members, signatures, XML docs, inheritdoc, SourceLink) and hands it to a pluggable emitter for rendering.
The catalog is format-neutral. Emitters decide how to render it — Markdown for Zensical / mkdocs Material, or YAML for docfx ManagedReference, with room for other targets.
| Package | What it does |
|---|---|
SourceDocParser |
Core walker, merger, source-link resolution. Defines IAssemblySource, IDocumentationEmitter, IMetadataExtractor, the ICrefResolver cross-link seam, and the shared CatalogIndexes rollup (derived classes / extension methods / inherited members). |
SourceDocParser.NuGet |
IAssemblySource that fetches packages from nuget.org by owner / explicit list and exposes the per-TFM lib/ trees. |
SourceDocParser.Zensical |
IDocumentationEmitter that writes Markdown tuned for Zensical / mkdocs Material (admonitions, content tabs, mermaid). |
SourceDocParser.Docfx |
IDocumentationEmitter that writes docfx ManagedReference YAML pages (drop-in replacement for dotnet docfx metadata output) plus the docfx.json config-file shim that lets an existing docfx site drive the parser pipeline. |
Logging flows through Microsoft.Extensions.Logging.Abstractions source-generated [LoggerMessage] partials, so any host (Serilog, Console, NLog, …) plugs in without the libraries taking a dependency on a specific backend.
var loggerFactory = LoggerFactory.Create(b => b.AddConsole());
var source = new NuGetAssemblySource(
rootDirectory: "/path/to/repo", // contains nuget-packages.json
apiPath: "/path/to/api", // where lib/ + refs/ get extracted
logger: loggerFactory.CreateLogger<NuGetAssemblySource>());
var emitter = new ZensicalDocumentationEmitter();
var result = await new MetadataExtractor().RunAsync(
source,
outputRoot: "/path/to/markdown-output",
emitter,
loggerFactory.CreateLogger<MetadataExtractor>());
Console.WriteLine($"Emitted {result.PagesEmitted} pages across {result.CanonicalTypes} types.");The walker resolves NuGet packages against frameworks that the active .NET SDK still understands and that Microsoft is still shipping fixes for:
- Modern .NET (5.0+) —
net5.0,net6.0,net7.0,net8.0,net9.0,net10.0, plus thenet*-android,net*-ios,net*-maccatalyst,net*-windowsworkload variants. See the official .NET and .NET Core support policy. - netstandard —
netstandard1.0throughnetstandard2.1. Sticks around because the BCL targets it, even though no future netstandard releases are planned. - .NET Framework, net462 and newer —
net462,net47,net471,net472,net48,net481. net462 is the floor that supportsnetstandard2.0type forwards and ships ref packs in modern SDKs. See the .NET Framework support policy for which of those are still in mainstream / extended support.
Out of scope (legacy, not supported):
| Family | Examples | Why |
|---|---|---|
| Xamarin | xamarinios*, xamarinmac*, xamarintvos*, xamarinwatchos* |
Support ended 1 May 2024; the workloads moved to .NET MAUI under the modern net*-android / net*-ios / net*-maccatalyst / net*-tvos TFMs. |
| Legacy Mono profiles | MonoAndroid*, MonoTouch* |
Predecessors of the Xamarin workloads. Same end-of-support story. |
| .NET Framework < 4.6.2 | net20, net35, net40, net45, net451, net46, net461 |
Out of mainstream support, and pre-net462 doesn't carry netstandard 2.0 type forwards so the resolver can't reuse modern surface against them. |
| Silverlight | sl* |
Microsoft retired Silverlight on 12 October 2021. |
| Windows Phone | wp*, wpa* |
Windows Phone 8.1 end-of-support was 11 July 2017; the platform itself was discontinued. |
| Windows Store / UAP | win8, winrt*, uap* |
UWP apps are now expected to migrate to Windows App SDK / WinUI 3. |
| Portable Class Libraries | portable-* profiles |
Replaced by netstandard a decade ago. |
Packages that ship only legacy TFMs are skipped at fetch time. The fetcher used to log this at warning level — on real-world walks (ReactiveUI / Avalonia / Splat surfaces) that meant tens of warnings per run for packages like System.Net.Primitives or System.Globalization.Extensions (whose entire TFM list is MonoAndroid10, MonoTouch10, net45, xamarinios10, xamarinmac20, ...). Those skips are now logged at information level so genuine TFM mismatches still stay visible. Set your logger filter to Information if you want to see the legacy-skip list during a build.
Benchmark workload. Numbers below are from the BenchmarkDotNet suite under src/benchmarks/SourceDocParser.Benchmarks/, run on a Ryzen 7 5800X / .NET 10. The workload extracts three NuGet packages from nuget.org -- pulling each package's lib/ and ref/ trees and the matching reference assemblies, walking every public symbol across ~19 target-framework groups, parsing the shipped XML doc files, resolving <inheritdoc/> chains, and emitting roughly 600 canonical type pages after cross-TFM merge. The local NuGet cache is warmed once during global setup so per-iteration timings measure the walk + merge + emit pipeline, not the network leg.
End-to-end (MetadataExtractor.RunAsync):
| Phase | Wall time | Allocated |
|---|---|---|
Full pipeline (RunAsync) |
~1.5 s | ~525 MB |
| Discover (NuGet config + cache scan) | ~990 ms | ~258 MB |
| Load + walk (parallel, all groups) | ~509 ms | ~236 MB |
| Merge (cross-TFM dedup) | ~1 ms | ~380 KB |
| Emit (Zensical Markdown) | ~139 ms | ~39 MB |
The walk phase walks one Roslyn compilation per package -- one canonical TFM per equivalence class. Other TFMs whose public-API surface is a subset of the canonical's are folded in via a MetadataReader probe that only enumerates type tokens, no symbol tree, no constructed types. The merger then broadcasts the canonical's walked types into each subset TFM so ApiType.AppliesTo still records every TFM the type applies to.
Per-call hotspots:
| Operation | Time | Allocated |
|---|---|---|
XmlDocToMarkdown.Convert -- plain summary |
~24 ns | 176 B |
XmlDocToMarkdown.Convert -- tagged with <see> / <c> / <paramref> |
~916 ns | 456 B |
XmlDocToMarkdown.Convert -- code block + bullet list |
~1.2 µs | 440 B |
TfmResolver.FindBestRefsTfm -- exact match |
~3 ns | 0 B |
TfmResolver.FindBestRefsTfm -- platform-suffix strip |
~11 ns | 0 B |
TfmResolver.FindBestRefsTfm -- netstandard fallback |
~496 ns | 1 KB |
TypeMerger.Merge -- 600 types x 3 TFMs |
~115 µs | 358 KB |
Emitter cost per type page (no I/O, just markup formatting; baseline = Zensical Markdown):
| Workload (types x members/type) | Zensical Markdown | DocFx YAML | Time | Alloc |
|---|---|---|---|---|
| 100 x 5 | 72 µs / 288 KB | 618 µs / 1,366 KB | 8.6x | 4.7x |
| 100 x 30 | 263 µs / 763 KB | 5,432 µs / 6,338 KB | 20.7x | 8.3x |
| 600 x 5 | 437 µs / 1,730 KB | 3,605 µs / 8,198 KB | 8.3x | 4.7x |
| 600 x 30 | 1,505 µs / 4,580 KB | 17,122 µs / 38,025 KB | 11.4x | 8.3x |
DocFx YAML is heavier by design -- every member duplicates uid / commentId / parent / name / nameWithType / fullName, and the page-level references: list adds another mapping per cross-referenced type. The emitter hand-writes YAML through StringBuilder (no YamlDotNet runtime dependency), with a single-allocation fast path for qualified-name composites that round-trip identifiers as plain scalars when escape-safe.
- MetadataReader probe + canonical-only Roslyn walk. The walker only spins up one Roslyn compilation per package -- the canonical TFM picked by descending rank. Other TFMs whose public type set is a subset of the canonical's are detected via a
System.Reflection.Metadata.MetadataReaderprobe (no symbol binding, no constructed-type allocation) and folded intoApiType.AppliesTovia a synthetic broadcast catalog that reuses the canonical's already-walked types. TFMs whose surface is not a subset still get a full Roslyn walk so removed-in-newer-TFM types stay in the catalog. - Custom span-based XML scanner. A
ref struct DocXmlScannerwalks///doc fragments directly overReadOnlySpan<char>, implementing just the XML grammar doc comments use.XmlReader'sXmlTextReaderImplallocates multi-KB internal buffers (NodeData[],NamespaceManager, char buffers) per construction; the scanner avoids that. Both the per-symbol parser and the Markdown renderer drive it, so per-element XML processing is allocation-free apart from the result string. - Build-once-then-read-many
XmlDocSource. Each.xmldoc file is read once viaFile.ReadAllBytes+Encoding.UTF8.GetStringand indexed by per-member(offset, length)ranges; substrings materialise only when a consumer callsGet(memberId). Safe for concurrent reads from the parallel walker. - Eager per-group loader disposal. Each TFM group's
CompilationLoaderholds memory-mapped views of every reference DLL. An interlocked counter retires the loader as soon as its last assembly finishes; peak working set scales with the slowest-finishing group, not the total number of groups times their references. - Streaming type merger. The parallel walk feeds
ApiCatalogs intoStreamingTypeMergerone at a time and immediately drops the reference. Catalogs don't accumulate in aConcurrentBagwaiting for the walk phase to finish. - Capture-free parallel dispatch. The
Parallel.ForEachAsynclambda isstatic; every dependency it touches is bundled into aWalkContextrecord attached to each work item, so dispatch never allocates a closure object per assembly. - Lazy
RenderedDocfacade for emit-time conversion. Walker output carries raw inner-XML fragments. Each emitter constructs anXmlDocToMarkdown(ICrefResolver)and wraps each symbol's documentation in aRenderedDocthat converts each text-shaped field on first read, caches the result, and skips fields the page doesn't consume. Zensical and docfx pick their own cref form ([name][uid]autoref vs<xref:uid>/ Microsoft Learn URL) without the walker baking either in. - Thread-static
PageBuilderPool. Each emit thread reuses oneStringBuilderacross page composition calls via ausing-scoped rental; pages clear the builder between uses instead of allocating fresh. PageWriterstreams chunks to disk. The composedStringBuilderflushes viaGetChunks()straight through a UTF-8 encoder +ArrayPool<byte>buffer into an unbufferedFileStream. The whole-page string and the 64 KB BufferedFileStreamStrategy buffer never need to exist.- Shared
CatalogIndexesrollup. Derived-class lookup, reverse extension-method lookup, and per-type inherited-member uid lists are built once per emit run in a single O(N) sweep and frozen viaFrozenDictionary. Each emitter passes its ownSystem.Objectbaseline UIDs (docfx bare names, ZensicalM:-prefixed commentIds) so the algorithm stays shared while the wire format stays per-emitter. - Pre-sized buffers and stackalloc paths. nupkg zip entries size their backing
byte[]to the known uncompressed length up front. SourceLink URL rewriting andZensicalCrefResolver's Microsoft Learn link composer build their result strings viastackalloc+new string(span)so the only heap allocation is the returned string itself.
SourceDocParserLib/
src/
SourceDocParser/
SourceDocParser.NuGet/
SourceDocParser.Docfx/
SourceDocParser.Zensical/
tests/
SourceDocParser.Tests/ unit tests (TUnit)
SourceDocParser.IntegrationTests/ end-to-end + Zensical render-smoke
Directory.Build.props shared lib config
Directory.Packages.props central package versions
SourceDocParserLib.slnx
Directory.Build.props
version.json Nerdbank.GitVersioning
.editorconfig
stylecop.json
dotnet build from src/ packs every non-test project into artifacts/packages/ automatically (<GeneratePackageOnBuild>true</GeneratePackageOnBuild>). Consumers in other repos can wire that directory up as a local feed via nuget.config until the libraries are published.
The metadata extraction pipeline is inspired by — and lifts patterns from — dotnet/docfx (MIT licensed). docfx's Roslyn-based assembly walker, inheritdoc resolution, and overall metadata model shaped this library's design. See LICENSE for the original docfx attribution.
Built on:
- Roslyn (Microsoft.CodeAnalysis.CSharp) for compilation + symbol model
- ICSharpCode.Decompiler for transitive reference resolution
- NuGet.Frameworks + NuGet.Versioning for proper TFM compatibility and SemVer ordering
- Polly v8 for HTTP retry/rate-limit pipelines
MIT — see LICENSE for the full text and the docfx attribution.