Testing reliability & perf 2026 — workstream

Tracking the user-reported testing regressions and performance complaints that emerged after the project-structure / `[test-by-project]` rework, plus the older follow-ups that ride alongside them.

The work is grouped into four problem areas. Each item below points at the existing user-filed issue (or a new engineering issue where there is no good match) so progress is visible in one place.

---

## Problem areas (what users are reporting)

1. **Test discovery is much slower than CLI pytest.**
   Multiple users on large parametrized suites — 30k tests take 40s in the Test Explorer vs ~2s on the CLI; 328k tests take 66s vs ~10s. Profiled to `O(n²)` list scans plus an oversized JSON payload.

2. **Test tree is rebuilt from scratch on every change.**
   Saving any `.py` file re-discovers the whole workspace and wipes the existing tree. While re-discovery is in flight, users can't re-run or debug a test because the items have been cleared. "Debug: Restart" breaks for the same reason.

3. **Run / debug pipeline regressions.**
   Tests appear as "skipped" even though they ran; debug runs lose results because the result pipe is cancelled the moment the subprocess exits; `pytest-subtests` failures get reported as success; the env selected via the Python Environments API is not always honored.

4. **Hard discovery failures still open.**
   Smaller correctness bugs in the pytest plugin (`HIDDEN_PARAM`, pipe writer broken by `mock.patch("builtins.open")`).

---

## Order of execution (impact × effort)

- [ ] **Narrow `python.testing.autoTestDiscoverOnSavePattern` default** so saving a non-test file does not trigger a full re-discovery. Default today is `**/*.py` (verified in `package.json`); should match test files only. — #25866
- [ ] **Replace list-membership dedup with dict/set in `vscode_pytest`.** `process_parameterized_test` and `build_test_tree` use `if x not in children` against plain lists, which is `O(n²)` as parametrize cases pile up under one function. Reporter on #25973 has a profile + a candidate PR. — #25973
- [ ] **Drain the result pipe before disposing it on cancellation.** `startRunResultNamedPipe` in `common/utils.ts` calls `disposable.dispose()` from `onCancellationRequested`, which closes the reader while data is still buffered. The debug path triggers cancellation as soon as the debug session terminates, so any results not yet drained are lost. — #25872
- [ ] **Surface an error payload when the env-extension subprocess exits non-zero.** Legacy path already does this in `pytestExecutionAdapter.ts`; the env-extension path resolves the deferred silently. With no results and no error, every test defaults to skipped. — #25892
- [ ] **Fix `pytest-subtests` dedup in `pytest_report_teststatus`.** The `collected_tests_so_far` set in `vscode_pytest/__init__.py` keys on `nodeid` and drops every report after the first. `pytest-subtests` emits multiple `call`-phase reports for the same `nodeid`, so the first one wins and any later failure (or correction) is silently lost. Community workarounds exist on the issue. — #25824
- [ ] **Slim the pytest discovery JSON payload.** Today every node carries the full absolute file path independently; for a 328k-test suite that's tens of MB of redundant strings flowing through the pipe and back into the test controller. Store the root path once and use relative paths underneath. — #25948
- [ ] **Incremental tree updates in `populateTestTree`.** Today `processDiscovery` does `testItemIndex.clear()` and `populateTestTree` always rebuilds. Diff old vs new test trees and only insert / remove / update changed items. Largest item; biggest user-visible fix; should land last. — #25822
- [ ] **`pytest.HIDDEN_PARAM` discovery crash.** `process_parameterized_test` does `parent_part, parameterized_section = test_node["name"].split("[", 1)`, which raises `ValueError` when pytest emits a node id without `[...]` (i.e. when `HIDDEN_PARAM` is used). One-line guard. — #25795
- [ ] **Telemetry step 2 (Python-side).** Add a `meta` block to the `vscode_pytest` discovery payload with `subprocessDurationMs`, `pluginDurationMs`, `payloadBytes`, `parametrizedTestCount`, and fold it into the new `UNITTEST.DISCOVERY.DONE` event so we can split slow discoveries into "subprocess vs plugin vs JS overhead". — *new issue*
- [ ] **Make `vscode_pytest` pipe writer immune to mocked `open`.** When user test code does `mock.patch("builtins.open")`, the pipe-writer `open()` call gets intercepted and serialization breaks (surfaces as `unsupported operand type(s) for +=: 'int' and 'NoneType'`). Capture `open` at import time before any test code can monkeypatch it. (#25793 closed without the fix landing.) — *new issue or reopen #25793*
- [ ] **Verify `useEnvExtension()` actually fires in VS Code 1.106+.** Reporter pins a regression to that VS Code release. Telemetry-first: add an `envSource` field to the discovery/run events, look at the live data, then fix. — #25718

---

## Closing as already resolved

- **#25802** — Unittest discovery on Python 3.15. User confirmed fixed in extension 2026.4.0 on May 9.
- **#25807** — Unwanted unittest debug output. The offending `print` has already been removed from `python_files/unittestadapter/execution.py`.

---

## How we measure progress

Baseline telemetry has already been wired up (TS-side). Each fix above has a corresponding metric so dashboards can verify the change actually moves the needle (and catch any unintended regressions):

| Area | Primary metric | What "fixed" looks like |
|---|---|---|
| Discovery perf | `UNITTEST.DISCOVERY.DONE.totalDurationMs` p50/p90 sliced by `testCount` bucket × `mode` | Large-suite p90 drops by an order of magnitude; `mode='project'` converges to `mode='legacy'`. |
| Tree rebuilt every save | `UNITTEST.DISCOVERY.TRIGGER.fileKind='non-test'` share; `UNITTEST.TREE.UPDATE.rebuiltFromScratch` share; `msSinceLastTrigger` p50 | `non-test` share drops to ~0%; `rebuiltFromScratch=false` share grows as incremental updates land. |
| Run / debug pipeline | `UNITTEST.RUN.DONE.missingCount > 0` share; `pipeClosedEarly` share; `failureCategory` distribution | `missingCount>0` and `pipeClosedEarly` shares drop to near-0 on `mode='project'` and `debugging=true`. |
| Discovery hard failures | `UNITTEST.DISCOVERY.DONE.failureCategory` distribution | Each individual fix shrinks its corresponding bucket. |

Per-area success criteria are checked off as each fix ships and the telemetry confirms the change.

---

## Out of scope (deliberately, for now)

- Per-test or per-file names in telemetry — privacy-sensitive, not needed for the questions above.
- True `added` / `removed` counts in `UNITTEST.TREE.UPDATE` — needs an `O(n)` set diff per discovery; revisit only if `beforeCount`/`afterCount` + `rebuiltFromScratch` aren't enough signal.
- Migrating off named pipes entirely — out of scope for this workstream.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing reliability & perf 2026 — workstream #25978

Problem areas (what users are reporting)

Order of execution (impact × effort)

Closing as already resolved

How we measure progress

Out of scope (deliberately, for now)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Area	Primary metric	What "fixed" looks like
Discovery perf	`UNITTEST.DISCOVERY.DONE.totalDurationMs` p50/p90 sliced by `testCount` bucket × `mode`	Large-suite p90 drops by an order of magnitude; `mode='project'` converges to `mode='legacy'`.
Tree rebuilt every save	`UNITTEST.DISCOVERY.TRIGGER.fileKind='non-test'` share; `UNITTEST.TREE.UPDATE.rebuiltFromScratch` share; `msSinceLastTrigger` p50	`non-test` share drops to ~0%; `rebuiltFromScratch=false` share grows as incremental updates land.
Run / debug pipeline	`UNITTEST.RUN.DONE.missingCount > 0` share; `pipeClosedEarly` share; `failureCategory` distribution	`missingCount>0` and `pipeClosedEarly` shares drop to near-0 on `mode='project'` and `debugging=true`.
Discovery hard failures	`UNITTEST.DISCOVERY.DONE.failureCategory` distribution	Each individual fix shrinks its corresponding bucket.

Testing reliability & perf 2026 — workstream #25978

Description

Problem areas (what users are reporting)

Order of execution (impact × effort)

Closing as already resolved

How we measure progress

Out of scope (deliberately, for now)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions