.NET: Python: Add dotnet integration test report to CI#5515
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds CI visibility for .NET integration test outcomes by publishing JUnit XML from the existing dotnet-test matrix legs and reusing the existing Python trend-aggregation script to generate a Job Summary report with cached history.
Changes:
- Update the .NET integration test step to emit JUnit XML into a dedicated
IntegrationTestResults/directory and upload those XML files as per-matrix artifacts. - Add a new
dotnet-integration-test-reportjob that downloads the artifacts, aggregates them into a trend report, posts it to the GitHub Actions Job Summary, and caches history. - Refactor
python/scripts/flaky_report/aggregate.pyto discover bothpytest.xmland*.junit.xml, derive dotnet “provider” labels, and avoid nodeid collisions across providers.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
python/scripts/flaky_report/aggregate.py |
Extends report discovery/parsing to support dotnet xUnit JUnit XML and multi-provider collision handling. |
.github/workflows/dotnet-build-and-test.yml |
Generates/uploads dotnet integration JUnit XML and adds a reporting job to aggregate and publish a trend report. |
- Add --report-junit flag to dotnet integration test step to generate
JUnit XML alongside TRX, with explicit --results-directory to
centralize output in IntegrationTestResults/
- Upload JUnit XML artifacts from each matrix leg (net10.0/ubuntu,
net472/windows) as dotnet-test-results-{framework}-{os}
- Add dotnet-integration-test-report job that downloads artifacts,
runs the existing aggregate.py script, posts markdown to Job Summary,
and saves trend history via actions/cache
- Refactor aggregate.py to discover JUnit XML files recursively,
supporting both pytest (pytest.xml) and xunit (*.junit.xml) layouts
- Handle provider name derivation for dotnet artifact naming convention
- Fix nodeid collision when same test runs under multiple frameworks
by qualifying keys with provider when collisions are detected
- Improve module extraction for dotnet C# classnames (recognizes
IntegrationTests/UnitTests namespace segments)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
584a2ae to
450eab4
Compare
xUnit v3 generates files with .junit extension, not .junit.xml. Update upload glob and aggregate.py discovery to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Always prefix dotnet test keys with provider (e.g. net10.0 (ubuntu)::TestName) to ensure stable, comparable counts across runs regardless of file parse order. Also show Executed (passed+failed) instead of Total in summary table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The LongRunningToolsSampleValidationAsync test in the AzureFunctions integration tests was failing in CI with TimeoutException at the 'Content published notification is logged' step. The 90-second timeouts are too tight for CI environments where LLM calls and orchestration overhead can be slow. Increased all three WaitForConditionAsync timeouts from 90s to 180s: - Waiting for human feedback notification - Waiting for publish notification (the step that was failing) - Waiting for orchestration completion Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Merge upstream/main which renamed scripts/flaky_report/ to scripts/integration_test_report/ (from Python PR #5454). Update the dotnet-build-and-test workflow to reference the new path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These tests interact with LLMs via stdin/stdout (DurableTask) or HTTP (AzureFunctions) and are inherently non-deterministic. Unlike the Python side which uses pytest-retry, the dotnet tests had no retry mechanism and a single transient failure would fail the entire CI run. Changes: - Switch [Fact] to [RetryFact(2, 5000)] on all LLM-dependent tests across ConsoleAppSamplesValidation, ExternalClientTests, WorkflowConsoleAppSamplesValidation, and AzureFunctions SamplesValidation - Add re-prompt mechanism to LongRunningToolsSampleValidationAsync: if the LLM doesn't invoke the tool within 60s, re-send the prompt (up to 2 retries) instead of burning the full timeout - Reduce LongRunningTools timeout from 240s to 180s (re-prompt makes the extra buffer unnecessary) - Leave simple/deterministic tests as [Fact] (SingleAgent, unit tests) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Matches the convention used by other checkout steps in this workflow to avoid leaving GITHUB_TOKEN credentials in the local git config. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
lokitoth
approved these changes
May 5, 2026
westey-m
reviewed
May 6, 2026
westey-m
approved these changes
May 6, 2026
westey-m
approved these changes
May 6, 2026
peibekwe
approved these changes
May 6, 2026
peibekwe
approved these changes
May 6, 2026
This was referenced May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
Improve .NET integration test reliability and add visibility into test results across CI runs. This PR:
[RetryFact]) to flaky LLM-dependent integration tests to reduce spurious failuresDescription
Integration Test Report (
dotnet-build-and-test.yml)--report-junitflag to integration test steps to generate JUnit XML output--results-directory ../IntegrationTestResults/to centralize output (separate from unit test TRX results)net10.0/ubuntu,net472/windows)dotnet-integration-test-reportjob that aggregates results, generates a trend report, and posts to Job Summarypersist-credentials: falseto checkout step for security consistencyThe report job is not in the merge gate (
dotnet-build-and-test-checkdoesn't depend on it) and only runs on non-PR events.RetryFact for Flaky Integration Tests
ConsoleAppSamplesValidation.cs,ExternalClientTests.cs,WorkflowConsoleAppSamplesValidation.cs): All LLM-dependent tests now use[RetryFact(2, 5000)]— retries once after 5s delay on transient LLM failuresSamplesValidation.cs): Same retry pattern applied to all 7 active tests; LongRunningTools timeouts increased from 90s to 180sAggregate Script (
python/scripts/integration_test_report/aggregate.py)pytest.xml) and xunit (*.junit) layouts via_discover_xml_files()dotnet-test-results-{framework}-{os})Key Design Decisions
--report-junitinstead of parsing TRX — xunit v3 supports native JUnit generation, allowing reuse of the existing Python report scriptdotnet-integration-report-history-) prevents dotnet/Python history from interleavingContribution Checklist