ci: split DDS coverage shards and parallelize merge-tree farm tests by frankmueller-msft · Pull Request #26586 · microsoft/FluidFramework

frankmueller-msft · 2026-02-27T17:12:08Z

Summary

Split the single DDS mocha coverage shard (17m 04s) into 5 parallel jobs, reducing the coverage critical path by 52%. This is the primary bottleneck on the build-client pipeline's wall-clock time.

Changes:

Split ci:test:mocha:dds into 5 shards: tree, merge-tree:farm, merge-tree:unit, other, non-dds
Add mocha --parallel to the merge-tree farm shard (farm/fuzz tests with 5-minute timeouts dominate execution time)
Add dedicated .mocharc.farm.cjs and .mocharc.unit.cjs configs in packages/dds/merge-tree/
Add timeoutInMinutes and timing budget enforcement to coverage jobs (consistent with test jobs)
Skip coverage artifact publishing and Merge Coverage Reports job in the Internal project (where testCoverage is disabled)

Performance Results

Pipeline total: 41m 19s → 36m 15s (12% faster)

Baseline pipeline (critical path in bold)

gantt
    title Baseline — 41m 19s
    dateFormat mm-ss
    axisFormat %M:%S

    section Build
    Build ~18m                       :done, b, 00-00, 18m

    section Coverage
    MochaTestDds 17m 04s             :crit, done, c1, after b, 17m
    MochaTestNonDds 9m 08s           :done, c2, after b, 9m
    RealsvcLocalTest 6m 08s          :done, c3, after b, 6m
    Merge Coverage 5m 27s            :crit, done, m, after c1, 5m

    section Task Tests
    RealsvcTinyliciousTest 16m 47s   :done, t1, after b, 17m
    JestTest ~5m 30s                 :done, t2, after b, 6m
    StressTinyliciousTest ~5m        :done, t3, after b, 5m

Optimized pipeline (build #380986, critical path in bold)

gantt
    title Optimized — 36m 15s
    dateFormat mm-ss
    axisFormat %M:%S

    section Build
    Build 17m 48s                         :done, b, 00-00, 18m

    section Coverage (changed)
    MochaTestDdsMergeTreeUnit 8m 08s      :crit, done, c1, after b, 8m
    MochaTestDdsTree 6m 51s               :done, c2, after b, 7m
    MochaTestNonDds 6m 41s                :done, c3, after b, 7m
    MochaTestDdsMergeTreeFarm 6m 26s      :done, c4, after b, 6m
    RealsvcLocalTest 6m 08s               :done, c5, after b, 6m
    MochaTestDdsOther 4m 45s              :done, c6, after b, 5m
    Merge Coverage 5m 37s                 :crit, done, m, after c1, 6m

    section Task Tests (unchanged)
    RealsvcTinyliciousTest 14m 42s        :done, t1, after b, 15m
    JestTest 4m 52s                       :done, t2, after b, 5m
    StressTinyliciousTest 3m 29s          :done, t3, after b, 4m

Coverage shard breakdown

Baseline Shard	Time		Optimized Shard(s)	Time
MochaTestDds (all 16 DDS packages)	17m 04s	→	MochaTestDdsMergeTreeUnit	8m 08s
			MochaTestDdsTree	6m 51s
			MochaTestDdsMergeTreeFarm (--parallel)	6m 26s
			MochaTestDdsOther (13 packages)	4m 45s
MochaTestNonDds	9m 08s	→	MochaTestNonDds	6m 41s
RealsvcLocalTest	6m 08s	→	RealsvcLocalTest	6m 08s
Merge Coverage Reports	5m 27s	→	Merge Coverage Reports	5m 37s

Why not split tinylicious tests?

The single tinylicious shard (14m 42s) finishes before the coverage path completes (8m 08s + 5m 37s merge = 13m 45s after build), so it is not on the critical path. Splitting it would add complexity without reducing pipeline time.

Internal project behavior

This pipeline runs in both the Public and Internal ADO projects. Coverage instrumentation (c8) is only enabled in Public (testCoverage: ${{ eq(variables['System.TeamProject'], 'public') }}).

In the Internal project:

The 5 coverage test jobs still run the tests (without c8), so tests execute faster without instrumentation overhead
Coverage artifact publishing is skipped (no nyc/.nyc_output data to publish)
The Merge Coverage Reports job is skipped entirely (no coverage data to merge)

This avoids wasting an agent slot on the Merge Coverage job in Internal, where it would only do setup and then fail on empty data.

Files changed

File	Change
`package.json`	Add DDS shard scripts
`packages/dds/merge-tree/package.json`	Add farm/unit mocha scripts
`packages/dds/merge-tree/.mocharc.farm.cjs`	New: farm/fuzz test config with `parallel: true`
`packages/dds/merge-tree/.mocharc.unit.cjs`	New: unit test config (excludes farm tests)
`tools/pipelines/build-client.yml`	1 coverage entry → 5 entries
`tools/pipelines/templates/build-npm-client-package.yml`	Add timeouts, timing budgets; condition coverage artifacts + merge job on `testCoverage`

Coverage verification

The shard split is purely organizational — it changes which CI job runs which tests, not which tests are run. Every test that ran before still runs exactly once:

All 16 DDS packages are covered across the 4 DDS shards with no gaps or overlaps:
- tree shard: @fluidframework/tree (184 test files — the single largest DDS package)
- merge-tree:farm shard: 9 farm/fuzz test files (*Farm*, beastTest*) run with --parallel
- merge-tree:unit shard: 47 unit test files (everything in merge-tree except farm tests)
- other shard: remaining 13 DDS packages (cell, counter, map, matrix, sequence, etc.)
Non-DDS packages (non-dds shard) and real-service local tests are unchanged
The pnpm --filter expressions are complementary: ./packages/dds/tree + ./packages/dds/merge-tree (farm + unit configs) + ./packages/dds/** !tree !merge-tree = all of ./packages/dds/**
Merge Coverage Reports successfully merges all 5 shard artifacts (confirmed in build #380986)

Test plan

CI pipeline passes with all 5 coverage shards running in parallel
Merge Coverage Reports successfully merges all shard artifacts
Pipeline wall-clock time reduced vs baseline (41m 19s → 36m 15s)
No test coverage gaps — all 16 DDS packages covered exactly once across shards
Verify Merge Coverage Reports job is skipped in Internal project

Supersedes #26559, #26562, #26571.

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR combines three pipeline parallelization optimizations (#26559, #26562, #26571) to significantly reduce the client build pipeline execution time. The changes introduce parallel coverage test jobs, shard mocha tests into DDS and non-DDS groups, move post-build work (docs, bundle analysis, devtools) into parallel jobs, and add timing budget enforcement to catch performance regressions.

Changes:

Parallelized coverage tests with individual jobs for each test type and a merge job to combine results
Sharded mocha tests into DDS (packages/dds/**) and non-DDS (!packages/dds/**) groups using pnpm filters
Extracted docs build, bundle analysis, and devtools build from the main build job into parallel post-build jobs that run concurrently with test jobs
Folded AreTheTypesWrong check into the build job (eliminating a separate test job)
Added timing budget enforcement template that warns when jobs exceed their expected duration
Increased npm pack concurrency from 1 to 4

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`tools/pipelines/templates/include-steps-timing-budget.yml`	New template for recording job start time and checking if elapsed time exceeds budget, emitting ADO warnings for performance regressions
`tools/pipelines/templates/build-npm-client-package.yml`	Major refactoring: parallelized coverage jobs with merge step; extracted docs/bundle/devtools into parallel jobs; added timing budgets to build, merge_coverage, test, and post-build jobs; integrated AreTheTypesWrong into build job
`tools/pipelines/build-client.yml`	Updated coverage test configuration to use sharded mocha tests (dds/non-dds); enabled taskCheckAreTheTypesWrong parameter
`scripts/pack-packages.sh`	Increased flub exec concurrency from 1 to 4 for pack operations to speed up npm pack step
`package.json`	Added 6 new scripts for DDS/non-DDS mocha test variants (test:mocha:dds, test:mocha:non-dds, and their CI/coverage equivalents)

Copilot · 2026-02-27T17:17:54Z

tools/pipelines/templates/build-npm-client-package.yml

+          - job: Merge_coverage
+            displayName: "Merge Coverage Reports"
+            dependsOn:
+              - build
+              - ${{ each test in parameters.coverageTests }}:
+                - Coverage_${{ test.jobName }}
+            condition: succeededOrFailed()


The Merge_coverage job is missing a timeoutInMinutes setting. Other jobs in the pipeline have explicit timeouts: the build job has 120 minutes, Test jobs have 45 minutes, and parallel post-build jobs have 30 minutes. Consider adding timeoutInMinutes: 45 to ensure the merge job doesn't hang indefinitely if something goes wrong.

Fixed — added timeoutInMinutes: 45 to the Merge_coverage job.

Copilot · 2026-02-27T17:17:54Z

tools/pipelines/templates/build-npm-client-package.yml

+          - ${{ each test in parameters.coverageTests }}:
+            - job: Coverage_${{ test.jobName }}
+              displayName: "Coverage ${{ test.jobName }}"
+              dependsOn: build


The individual Coverage jobs are missing a timeoutInMinutes setting. The Test jobs have timeoutInMinutes: 45 (line 777), and the parallel post-build jobs have timeoutInMinutes: 30. Coverage jobs should also have an explicit timeout to prevent them from running indefinitely if something goes wrong. Based on the PR description's timing budget table showing coverage tests at 35 minutes, consider adding timeoutInMinutes: 45 to align with other test jobs.

Suggested change

dependsOn: build

dependsOn: build

timeoutInMinutes: 45

Fixed — added timeoutInMinutes: 45 to the Coverage jobs.

Copilot · 2026-02-27T17:17:55Z

tools/pipelines/templates/build-npm-client-package.yml

+              steps:
+                # Setup
+                - checkout: self
+                  path: $(FluidFrameworkDirectory)
+                  clean: true
+                  lfs: '${{ parameters.checkoutSubmodules }}'
+                  submodules: '${{ parameters.checkoutSubmodules }}'
+
+                - script: |
+                    echo "commit: $(COMMIT_SHA)"
+                    git fetch origin $(COMMIT_SHA)
+                    git checkout $(COMMIT_SHA)
+                  displayName: "Checkout build commit"
+
+                - template: /tools/pipelines/templates/include-use-node-version.yml@self
+
+                - template: /tools/pipelines/templates/include-install.yml@self
+                  parameters:
+                    packageManager: '${{ parameters.packageManager }}'
+                    buildDirectory: '${{ parameters.buildDirectory }}'
+                    packageManagerInstallCommand: '${{ parameters.packageManagerInstallCommand }}'
+
+                - task: DownloadPipelineArtifact@2
+                  inputs:
+                    artifact: build_output_archive
+                    targetPath: $(Build.StagingDirectory)
+
+                - script: |
+                    echo "Extracting build output archive contents..."
+                    tar --extract --gzip --file $(Build.StagingDirectory)/build_output_archive.tar.gz --directory $(Pipeline.Workspace)/${{ parameters.buildDirectory }}
+                  displayName: Extract Build Output Contents
+
+                # Set variable startTest if everything is good so far and we'll start running tests,
+                # so that the steps to process/upload test coverage results only run if we got to the point of actually running tests.
+                - script: |
+                    echo "##vso[task.setvariable variable=startTest]true"
+                  displayName: Start Test
+
+                - template: /tools/pipelines/templates/include-test-task.yml@self
+                  parameters:
+                    taskTestStep: '${{ test.name }}'
+                    buildDirectory: '${{ parameters.buildDirectory }}'
+                    testCoverage: '${{ parameters.testCoverage }}'
+
+                - task: Npm@1
+                  displayName: 'npm run test:copyresults'
+                  condition: and(succeededOrFailed(), eq(variables['startTest'], 'true'))
+                  inputs:
+                    command: custom
+                    workingDir: '$(Pipeline.Workspace)/${{ parameters.buildDirectory }}'
+                    customCommand: 'run test:copyresults'
+
+                # Process test result, include publishing and logging
+                - template: /tools/pipelines/templates/include-process-test-results.yml@self
+                  parameters:
+                    buildDirectory: '${{ parameters.buildDirectory }}'
+                    testResultDirs: '${{ parameters.testResultDirs }}'
+


The individual Coverage jobs are missing timing budget enforcement steps. The Merge_coverage job has timing budget checks (lines 609-613 for start, 755-759 for check), and all Test jobs have them (lines 794-798 for start, 914-918 for check). For consistency and to catch performance regressions in individual coverage shards, consider adding timing budget steps to the Coverage jobs. Based on the PR description, a budget of 35 minutes would be appropriate.

Fixed — added timing budget start/check steps to Coverage jobs with a 25-minute budget, matching the Test jobs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

frankmueller-msft · 2026-03-03T04:45:27Z

Closing in favor of #26624, which takes a simpler approach: enabling mocha parallel mode on the merge-tree suite (4 lines) instead of splitting CI jobs. Achieved a 36% reduction in coverage test time (22 min → 14 min) without adding pipeline complexity.

Copilot AI review requested due to automatic review settings February 27, 2026 17:12

Copilot started reviewing on behalf of frankmueller-msft February 27, 2026 17:12 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

frankmueller-msft force-pushed the ci/combined-pipeline-parallelization branch 7 times, most recently from 5924b5c to 65dc86f Compare February 28, 2026 06:00

frankmueller-msft changed the title ~~ci: combined pipeline parallelization (#26559 + #26562 + #26571)~~ ci: split DDS coverage shards and parallelize merge-tree farm tests Feb 28, 2026

This was referenced Feb 28, 2026

Parallelize coverage tests in client build pipeline #26559

Closed

Shard mocha coverage into DDS and non-DDS shards #26562

Closed

ci: optimize client build pipeline with parallelization #26571

Closed

frankmueller-msft force-pushed the ci/combined-pipeline-parallelization branch 2 times, most recently from 1cf21d3 to 3e4234a Compare February 28, 2026 06:56

Split DDS coverage shards and parallelize merge-tree farm tests

8ba9b8a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

frankmueller-msft force-pushed the ci/combined-pipeline-parallelization branch from 3e4234a to 8ba9b8a Compare February 28, 2026 07:24

frankmueller-msft requested review from alexvy86 and tylerbutler February 28, 2026 15:33

This was referenced Feb 28, 2026

test: increase ReconnectFarm timeout from 30s to 60s #26583

Closed

ci: skip redundant CJS api-extractor lint checks in ci:build #26592

Open

Enable incremental build caching in CI pipeline #26593

Closed

ci: parallelize merge-tree mocha tests #26624

Open

frankmueller-msft closed this Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: split DDS coverage shards and parallelize merge-tree farm tests#26586

ci: split DDS coverage shards and parallelize merge-tree farm tests#26586
frankmueller-msft wants to merge 1 commit intomicrosoft:mainfrom
frankmueller-msft:ci/combined-pipeline-parallelization

frankmueller-msft commented Feb 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

frankmueller-msft Feb 28, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

frankmueller-msft Feb 28, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

frankmueller-msft Feb 28, 2026

Uh oh!

frankmueller-msft commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

frankmueller-msft commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance Results

Baseline pipeline (critical path in bold)

Optimized pipeline (build #380986, critical path in bold)

Coverage shard breakdown

Why not split tinylicious tests?

Internal project behavior

Files changed

Coverage verification

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

frankmueller-msft Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

frankmueller-msft Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

frankmueller-msft Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

frankmueller-msft commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frankmueller-msft commented Feb 27, 2026 •

edited

Loading