Skip to content

Extremely slow test discovery when using pytest for large test suite #25973

@vaclavHala

Description

@vaclavHala

Type: Bug

Behaviour

We are trying to use vscode-python to interact with our existing test suite that uses pytest and heavily relies on parameterized tests.
There can be over 50_000 test cases generated for a single file, and these take extremely long (on order of several minutes) to load in the test explorer.

The test collection done by pytest itself completes in a few seconds, and then all time is spent in code of the vscode-python discovery integration.

Steps to reproduce:

  1. Clone the following repro repo https://github.com/vaclavHala/pytest-discovery-perf
  2. Open the test explorer

Expected

The test collection takes about 3 seconds when run outside of vscode on my PC, so I expect the explorer to show the tests in about that time

>> python -m pytest  --collect-only tests

...
======================= 30000 tests collected in 2.21s =======================

Actual

The explorer takes about 40 seconds to show the tests (Python Output)

2026-05-25 11:22:45.966 [info] Started pytest discovery subprocess (environment extension) for workspace /home/hala/git/vscode-repros/pytest-discovery-perf
...
======================= 30000 tests collected in 39.91s ========================
...
2026-05-25 11:23:27.053 [info] Pytest discovery completed for workspace /home/hala/git/vscode-repros/pytest-discovery-perf

You can easily control how many tests the repro simulates in tests/conftest.py. For the reproduction I picked 30_000 so the problem is apparent but the discovery still finishes in reasonable time. If you go up to about 50_000 to simulate our real environment the discovery will take minutes to complete.

Investigation:

After some investigation I narrowed the problem down to python_files/vscode_pytest/__init__.py which uses plain lists to hold child nodes.
It then traverses over these every time new child node of same parent is added, which is a O(n^2) complexity operation that starts causing problems as N (number of child nodes of given parent) grows large.

I created a PR which proposes a fix to this problem by replacing the lists by dicts indexed by id_ of each test item. This takes the complexity of the expensive operation from O(N^2) to O(N). The repo with repro contains instructions for how I measured this by py-spy and demonstrates the impact of the change. For posterity I'm also attaching the before and after graphs here:

Image Image

Extension version: 2026.4.0
VS Code version: Code 1.100.2 (848b80aeb52026648a8ff9f7c45a9b0a80641e2e, 2025-05-14T21:47:40.416Z)
OS version: Linux x64 6.1.0-18-amd64

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions