Add --crash-sequence option to MTP CrashDump extension (#7262)#8526
Add --crash-sequence option to MTP CrashDump extension (#7262)#8526Evangelink wants to merge 2 commits into
Conversation
Adds a testhost-side sequence file that records test state transitions (STARTED/ENDED) so the controller can list the tests that were running at the time of a crash without requiring dump analysis. Mirrors HangDump's UX but uses file-based persistence because the testhost is dead by the time the controller inspects it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds crash-time test “sequence file” journaling to the Microsoft.Testing.Platform (MTP) CrashDump extension, controlled by a new --crash-sequence on|off option, so users can see which tests were still running when a testhost crashes (and the file is also published as an artifact).
Changes:
- Introduces testhost-side
CrashDumpSequenceLoggerto append STARTED/ENDED records to a controller-provided sequence file path. - Adds controller-side parsing/publishing/cleanup of the sequence file on crash vs. graceful exit, and wires the path via
TESTINGPLATFORM_CRASHDUMP_SEQUENCE_FILE. - Updates command-line validation, help/info expectations, resources/XLFs, and adds unit + acceptance coverage for the new option.
Show a summary per file
| File | Description |
|---|---|
| test/UnitTests/Microsoft.Testing.Extensions.UnitTests/CrashDumpTests.cs | Adds unit tests covering --crash-sequence validation and the “missing main option” rule. |
| test/IntegrationTests/Microsoft.Testing.Platform.Acceptance.IntegrationTests/HelpInfoAllExtensionsTests.cs | Updates --help / --info expectations to include --crash-sequence. |
| test/IntegrationTests/Microsoft.Testing.Platform.Acceptance.IntegrationTests/CrashDumpTests.cs | Adds acceptance tests for sequence generation, disabling, cleanup on graceful exit, and invalid usage; updates the CrashDump test asset to publish fake in-progress nodes. |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/CrashDumpResources.resx | Adds new localized strings for the option, banner text, and artifact metadata. |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.cs.xlf | Regenerates localization entries for new CrashDump sequence resources (Czech). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.de.xlf | Regenerates localization entries for new CrashDump sequence resources (German). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.es.xlf | Regenerates localization entries for new CrashDump sequence resources (Spanish). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.fr.xlf | Regenerates localization entries for new CrashDump sequence resources (French). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.it.xlf | Regenerates localization entries for new CrashDump sequence resources (Italian). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.ja.xlf | Regenerates localization entries for new CrashDump sequence resources (Japanese). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.ko.xlf | Regenerates localization entries for new CrashDump sequence resources (Korean). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.pl.xlf | Regenerates localization entries for new CrashDump sequence resources (Polish). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.pt-BR.xlf | Regenerates localization entries for new CrashDump sequence resources (Portuguese - Brazil). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.ru.xlf | Regenerates localization entries for new CrashDump sequence resources (Russian). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.tr.xlf | Regenerates localization entries for new CrashDump sequence resources (Turkish). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.zh-Hans.xlf | Regenerates localization entries for new CrashDump sequence resources (Chinese Simplified). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Resources/xlf/CrashDumpResources.zh-Hant.xlf | Regenerates localization entries for new CrashDump sequence resources (Chinese Traditional). |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/Microsoft.Testing.Extensions.CrashDump.csproj | Links in RoslynString helper from MTP to support shared helper usage in this extension. |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/CrashDumpSequenceLogger.cs | New testhost-side consumer/lifetime handler that writes the sequence journal file. |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/CrashDumpProcessLifetimeHandler.cs | Adds crash-time parsing/banner output and artifact publication; deletes sequence file on graceful exit/hangdump kill. |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/CrashDumpExtensions.cs | Wires the new composite CrashDumpSequenceLogger into the testhost extension pipeline. |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/CrashDumpEnvironmentVariableProvider.cs | Computes and exports the sequence file path env var; persists it into CrashDump configuration for controller-side handling. |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/CrashDumpConfiguration.cs | Adds SequenceFileName to share computed path with the process lifetime handler. |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/CrashDumpCommandLineProvider.cs | Adds --crash-sequence option and validates boolean-like arguments; treats it as requiring --crashdump/--crash-report. |
| src/Platform/Microsoft.Testing.Extensions.CrashDump/CrashDumpCommandLineOptions.cs | Adds the crash-sequence option name constant. |
Copilot's findings
- Files reviewed: 25/25 changed files
- Comments generated: 3
| } | ||
| } | ||
|
|
||
| // TODO: Validate that the file name ends with '.dmp'? |
| catch (IOException ex) | ||
| { | ||
| // Best-effort diagnostic. If we cannot read the sequence file we still publish it so the | ||
| // user can inspect it manually, but skip the friendly summary. | ||
| await _outputDisplay.DisplayAsync(this, new ErrorMessageOutputDeviceData(string.Format(CultureInfo.InvariantCulture, CrashDumpResources.CrashDumpSequenceFileReadError, sequenceFilePath, ex.Message)), cancellationToken).ConfigureAwait(false); | ||
| await _messageBus.PublishAsync(this, new FileArtifact(new FileInfo(sequenceFilePath), CrashDumpResources.CrashDumpSequenceArtifactDisplayName, CrashDumpResources.CrashDumpSequenceArtifactDescription)).ConfigureAwait(false); | ||
| return; |
| // MessageBus.PublishAsync only enqueues; the async consumer pipeline may not have run | ||
| // yet. Give the CrashDumpSequenceLogger a brief opportunity to drain and flush before | ||
| // we crash so the sequence file actually contains the STARTED entries this test asserts | ||
| // on. This is a test-asset workaround for what is, in production, a best-effort feature | ||
| // (the message bus is intentionally asynchronous for throughput reasons). | ||
| await Task.Delay(2000); | ||
| } |
🔍 Build Failure AnalysisSummary — The build failed due to analyzer violations in the newly added Root cause 1: Synchronous I/O operations in async context (VSTHRD103)The new Design consideration: The PR description explicitly documents that the sequence file uses Affected files / errors All errors occur in
Proposed fix Two options:
Recommendation: Option 1 (suppress) aligns with the PR's stated performance design goals. The synchronous writes are load-bearing for the "survives Root cause 2: Legacy lock object (IDE0330)On .NET 9.0 only, line 42 uses Affected files / errors
Proposed fix Use TFM-conditional compilation to choose the appropriate lock type: #if NET9_0_OR_GREATER
private readonly System.Threading.Lock _writeLock = new();
#else
private readonly object _writeLock = new();
#endifBuild overview
All MSBuild errors (14)
🤖 Generated by the Build Failure Analysis workflow using binlog analysis · commit 74e7850
|
Evangelink
left a comment
There was a problem hiding this comment.
Generated by Build Failure Analysis for issue #8526 · ● 1.2M
| private readonly IEnvironment _environment; | ||
| private readonly IClock _clock; | ||
| private readonly ILogger<CrashDumpSequenceLogger> _logger; | ||
| private readonly object _writeLock = new(); |
There was a problem hiding this comment.
🔧 IDE0330 — Use System.Threading.Lock for .NET 9.0+
The .NET 9 analyzer requires using the new System.Threading.Lock type instead of object for lock fields. Since this project multi-targets including netstandard2.0, use TFM-conditional compilation:
| private readonly object _writeLock = new(); | |
| #if NET9_0_OR_GREATER | |
| private readonly System.Threading.Lock _writeLock = new(); | |
| #else | |
| private readonly object _writeLock = new(); | |
| #endif |
| { | ||
| try | ||
| { | ||
| _writer?.Flush(); |
There was a problem hiding this comment.
🔧 VSTHRD103 — Synchronous Flush in async method
This final flush in OnTestSessionFinishingAsync should also use the synchronous version to match the design. Suppress:
| _writer?.Flush(); | |
| #pragma warning disable VSTHRD103 // Synchronous flush intentional for crash-safety design | |
| _writer?.Flush(); | |
| #pragma warning restore VSTHRD103 |
| // trace the failure and behave as if the feature were disabled — failing the test run | ||
| // for this would be worse than missing the diagnostic. | ||
| _logger.LogWarning($"Failed to open crash sequence file '{_sequenceFilePath}': {ex.Message}"); | ||
| _writer?.Dispose(); |
There was a problem hiding this comment.
🔧 VSTHRD103 — Synchronous Dispose in async method
The class already implements IAsyncDisposable for NETCOREAPP, but in this error path we're calling synchronous Dispose(). Suppress the warning since this is an exception handler where async disposal would complicate error recovery:
| _writer?.Dispose(); | |
| #pragma warning disable VSTHRD103 // Synchronous disposal acceptable in exception handler | |
| _writer?.Dispose(); | |
| #pragma warning restore VSTHRD103 |
| var fileStream = new FileStream(_sequenceFilePath, FileMode.Create, FileAccess.Write, FileShare.Read); | ||
| _writer = new StreamWriter(fileStream, Encoding.UTF8) { AutoFlush = true }; | ||
| _writer.WriteLine(FileHeader); | ||
| _writer.Flush(); |
There was a problem hiding this comment.
🔧 VSTHRD103 — Synchronous Flush in async method
Similar to WriteLine, this synchronous flush is intentional for the low-latency design. Suppress:
| _writer.Flush(); | |
| _writer.Flush(); | |
| #pragma warning restore VSTHRD103 |
|
|
||
| try | ||
| { | ||
| _writer.WriteLine(line); |
There was a problem hiding this comment.
🔧 VSTHRD103 — Synchronous WriteLine in async method
This is the hot path (called per test). Synchronous write is load-bearing for the performance guarantee. Suppress:
| _writer.WriteLine(line); | |
| #pragma warning disable VSTHRD103 // Synchronous writes intentional for sub-microsecond per-test overhead | |
| _writer.WriteLine(line); | |
| #pragma warning restore VSTHRD103 |
| // because a sequence file is not required to survive an OS-level crash or power loss. | ||
| var fileStream = new FileStream(_sequenceFilePath, FileMode.Create, FileAccess.Write, FileShare.Read); | ||
| _writer = new StreamWriter(fileStream, Encoding.UTF8) { AutoFlush = true }; | ||
| _writer.WriteLine(FileHeader); |
There was a problem hiding this comment.
🔧 VSTHRD103 — Synchronous WriteLine in async method
The VS Threading analyzer flags this synchronous I/O call. Since the PR design explicitly uses synchronous writes for sub-microsecond performance (documented as load-bearing for crash-safety), suppress the warning:
| _writer.WriteLine(FileHeader); | |
| #pragma warning disable VSTHRD103 // Synchronous writes intentional for crash-safety (see PR #8526 design) | |
| _writer.WriteLine(FileHeader); | |
| #pragma warning restore VSTHRD103 |
- Convert sync StreamWriter calls to async (WriteLineAsync/FlushAsync/DisposeAsync) to satisfy VSTHRD103. - Replace plain `lock` with SemaphoreSlim so we can await inside the critical section (also avoids IDE0330 push to System.Threading.Lock which would require multi-TFM gymnastics). - Suppress CA1416 narrowly on the sync Dispose path's SemaphoreSlim.Wait(), explaining that the path is unreachable on browser because the controller-side env var provider requires NETCOREAPP. - Fix CrashDump test asset to use the netfx-compatible char[] overload of String.Split. End-to-end validation: full `build.cmd -pack` now succeeds on origin/main; 19 CrashDumpTests acceptance tests pass (3 Windows-only skips for --crash-report); 9 HelpInfoAllExtensionsTests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes #7262.
What
Adds a
--crash-sequence on|offoption (defaulton) toMicrosoft.Testing.Extensions.CrashDumpthat generates a sequence file recording the start and end of each test as the run progresses. On a crash, the controller parses the file, prints a friendly "tests still running at the time of the crash" banner, and publishes the file as aFileArtifact.This addresses the long-standing pain point from the issue: today, on
--crashdump/--crash-report, users have to open the.dmpto know which test was running at crash time.--hangdumpalready produces an equivalent.logover IPC, but that path is unavailable for crashes because the testhost is dead.How
CrashDumpSequenceLogger(new testhostIDataConsumer + ITestSessionLifetimeHandler) appendsSTARTED\t<iso>\t<uid>\t<displayName>/ENDED\t<iso>\t<uid>\t<state>lines perTestNodeUpdateMessageto a sequence file. UsesStreamWriter.AutoFlush=true(nofsync) so records reach the OS cache and surviveEnvironment.FailFastat sub-microsecond cost per write.CrashDumpEnvironmentVariableProvidercomputes a path with a per-controller 8-char GUID token (avoids cross-host collisions for parallel testhost launches into the same results directory) and exposes it viaTESTINGPLATFORM_CRASHDUMP_SEQUENCE_FILE.CrashDumpProcessLifetimeHandleron crash: parses the file, computesstarted \ endedset, prints the friendly banner, publishes the file. On graceful exit / hangdump kill: deletes the file (no diagnostic value).--crash-sequencefollows the--ansistyle (on/off/true/false/enable/disable/1/0), validated by the existingCommandLineOptionArgumentValidator.Performance
The cost is two small writes per test (~10 µs each, no fsync). For 10 000 tests this is ~200 ms total — well under 1 % on huge suites. The user discussed and confirmed this in the design conversation; the default-on policy is justified by the very low cost.
Limitations (documented in source)
FailFasts within a few ms may not be recorded because the MTP message bus is asynchronous by design. This matches the same trade-off described in the issue ("mtp does not send the currently started test to testhost controller because of possible overhead"). The added value is for the much more common "test runs for some time, then crashes" pattern.fsync).Tests
--crash-sequencewithout--crashdump/--crash-report.FailFast(with a small sync delay to give the async consumer time to drain, since this is a deterministic test setup).UpdateXlf.Out of scope
This PR does not change
--hangdumpbehavior: it already generates an equivalent in-progress test list over IPC.