python-lsp-compare is a small benchmark and regression harness for Python Language Server Protocol implementations.
This repository provides a vendor-neutral comparison harness and test corpus for Python language servers. It is intended to surface behavioral and performance differences across implementations and does not define a specification or normative behavior.
It focuses on four things:
- Running LSP servers over stdio with raw JSON-RPC messages.
- Executing repeatable scenarios against those servers.
- Capturing request/notification timings, payload sizes, and results.
- Producing machine-readable reports that are easy to diff across servers.
Benchmark suites are package-oriented, not just API-oriented. That means testing LSP behavior against realistic dependency surfaces like SQLAlchemy-heavy code, web frameworks, and data-science imports.
Benchmark runs are intentionally deterministic: each suite creates or reuses its own .venv, installs the suite requirements there, writes temporary workspace configuration for language servers, and then runs every selected server against that same suite-local environment.
- Pure Python implementation of LSP framing over stdio.
- Built-in Python scenarios for hover, completion, and document symbols.
- Config-driven benchmark suites under
benchmarks/with package-specific fixtures andrequirements.txtfiles. - Per-call metrics including latency, bytes sent, bytes received, success, and errors.
- Aggregate stats for benchmark points including mean, median, min, max, and p95.
- JSON report output for later aggregation.
- MIT licensed from the start.
Create a virtual environment, install the package in editable mode, and point it at an LSP server command.
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -e .
python -m python_lsp_compare list-scenarios
python -m python_lsp_compare list-benchmarks
python -m python_lsp_compare list-servers
python -m python_lsp_compare run --server-command pylsp --scenario hover --scenario completion
python -m python_lsp_compare run --server-command pyright-langserver --server-arg=--stdio
python -m python_lsp_compare run-benchmark --server-command pyright-langserver --server-arg=--stdio
python -m python_lsp_compare run-servers --scenario hover --scenario completion
python -m python_lsp_compare bench-servers
python -m python_lsp_compare render-report --summary results/bench-servers/summary-20260319T000000Z.json --baseline-server pyrightThe default report path is created under results/.
List the bundled scenarios:
python -m python_lsp_compare list-scenariosList the benchmark suites:
python -m python_lsp_compare list-benchmarksList the locally configured servers:
python -m python_lsp_compare list-servers
python -m python_lsp_compare list-servers --config path/to/lsp_servers.jsonRun one or more scenarios:
python -m python_lsp_compare run \
--server-command pyright-langserver \
--server-arg=--stdio \
--scenario hover \
--scenario completion \
--output results/pyright.jsonRun one or more package-oriented benchmark suites:
python -m python_lsp_compare run-benchmark \
--server-command pyright-langserver \
--server-arg=--stdio \
--output results/pyright-benchmarks.jsonRun the same scenarios across all servers (downloads from GitHub releases by default):
python -m python_lsp_compare run-servers \
--scenario hover \
--scenario completion \
--output-dir results/serversRun against servers from a local config file:
python -m python_lsp_compare run-servers \
--config .python-lsp-compare/lsp_servers.json \
--scenario hover \
--scenario completion \
--output-dir results/serversRun benchmark suites across all servers (downloads from GitHub releases by default):
python -m python_lsp_compare bench-servers \
--baseline-server pyright \
--output-dir results/bench-serversRender or re-render a markdown comparison report from an existing multi-server summary JSON file:
python -m python_lsp_compare render-report \
--summary results/bench-servers/summary-20260319T000000Z.json \
--baseline-server pyright \
--output results/bench-servers/comparison.mdArguments:
--server-command: executable to launch.--server-arg: additional argument, repeatable.--scenario: scenario name, repeatable. If omitted, all scenarios run.--timeout-seconds: per-request timeout.--output: JSON report path.
Configured server arguments:
--config: path to a local server config JSON file. When omitted, servers are automatically downloaded from GitHub releases.--server: server id to run, repeatable. If omitted, all servers run.--output-dir: directory for per-server JSON reports and the summary JSON file.--summary-output: optional path for the combined multi-server summary file.--markdown-output: optional path for the combined markdown comparison report. If omitted, a markdown report is written next to the summary JSON.--csv-output: optional path for the combined CSV comparison report. If omitted, a CSV report is written next to the summary JSON.--baseline-server: configured server id or display name to use as the comparison baseline in markdown and CSV reports.
Configured benchmark runner arguments:
--timeout-seconds: override the per-request timeout for all benchmark calls.--output-dir: directory for per-server JSON reports and the combined summary/report outputs.--summary-output,--markdown-output,--csv-output: override report destinations.--baseline-server: choose the comparison baseline for the rendered reports.
Benchmark arguments:
run-benchmarkruns all bundled suites underbenchmarks/.bench-serversruns that same full bundled suite set for each selected server.
Benchmark runs always use an isolated suite environment and install each suite's declared requirements before launching the server. That keeps runs reproducible and avoids depending on whatever happens to be installed in the caller's active environment.
Reports include:
- Server command and run timestamp.
- One entry per scenario.
- One entry per benchmark suite when using
run-benchmark. - One summary JSON file plus one report per server when using
run-servers. - One markdown comparison report when using
run-serversorbench-servers. - One CSV comparison report when using
run-serversorbench-servers. - One metric per LSP call, including initialize/shutdown.
- Scenario success/failure and any captured error message.
- Aggregate duration summaries for each benchmark point and method.
- Structured result summaries per request, including whether the result was empty and method-specific counts such as completion items, symbol count, hover text length, or location count.
- Benchmark-point validation results, including whether semantic result checks passed and how many measured iterations failed validation.
The markdown comparison report is intended for check-ins, PRs, or ongoing benchmark notes. It shows both total wall-clock time and average request time, and highlights semantic result differences for matching benchmark points, such as hover length, completion count, and definition count relative to the baseline server.
The CSV comparison report flattens the same run into spreadsheet-friendly rows with server id, suite or scenario name, point label, preferred result metric, delta versus the chosen baseline server, and validation status.
By default, run-servers, bench-servers, and list-servers automatically download the latest server binaries from GitHub releases. No config file is needed for this default mode.
To use custom local builds instead, pass --config with a path to a server config JSON file:
python -m python_lsp_compare bench-servers --config .python-lsp-compare/lsp_servers.jsonYou can create a config file by copying the example:
- Copy
configs/lsp_servers.example.jsonto.python-lsp-compare/lsp_servers.json. - Fill in the local executable paths for each server.
- The
.python-lsp-compare/directory is ignored by Git.
The local config is intentionally minimal: it identifies where each server executable or launcher lives on the current machine. Scenario selection, benchmark selection, benchmark environment creation, and package installation are handled by the runner so the same suite runs the same way for everyone.
By default, run-servers, bench-servers, and list-servers automatically download the latest releases of Pyright, Ty, and Pyrefly from GitHub, and install pylsp-mypy from PyPI. Just run:
python -m python_lsp_compare bench-serversGitHub release binaries are cached under .python-lsp-compare/servers/ so subsequent runs skip the download. PyPI-based servers are installed into isolated venvs under the same cache directory. The only prerequisite for Pyright is that Node.js is installed and node is on your PATH.
You can also pre-download servers explicitly:
python -m python_lsp_compare download-servers
python -m python_lsp_compare download-servers --server ty
python -m python_lsp_compare download-servers --forceDownload-servers arguments:
--server: download a specific server id (pyright,ty,pyrefly,pylsp-mypy). Repeatable. Downloads all if omitted.--force: re-download even if already cached.
If you prefer to use local builds or custom server paths, pass --config to point at a JSON config file:
- Copy
configs/lsp_servers.example.jsonto.python-lsp-compare/lsp_servers.json. - Replace each placeholder path with the local path to that server on your machine.
- Keep only the servers you actually want to run, or set
"enabled": falseon entries you want to leave in the file but skip. - Run
python -m python_lsp_compare list-serversto confirm the config is valid and the servers are visible.
Example config shape:
{
"version": 1,
"baselineServer": "pyright",
"servers": [
{
"id": "pyright",
"displayName": "Pyright",
"enabled": true,
"sourcePath": "C:/path/to/pyright/packages/pyright/langserver.index.js",
"launch": {
"command": "C:/Program Files/nodejs/node.exe",
"args": [
"C:/path/to/pyright/packages/pyright/langserver.index.js",
"--stdio"
]
}
},
{
"id": "ty",
"displayName": "Ty",
"enabled": true,
"sourcePath": "C:/path/to/ty.exe",
"launch": {
"command": "C:/path/to/ty.exe",
"args": ["server"]
}
},
{
"id": "pyrefly",
"displayName": "Pyrefly",
"enabled": true,
"sourcePath": "C:/path/to/pyrefly.exe",
"launch": {
"command": "C:/path/to/pyrefly.exe",
"args": [
"lsp",
"--indexing-mode",
"lazy-blocking",
"--build-system-blocking"
]
}
}
]
}Linux/macOS example config shape:
{
"version": 1,
"baselineServer": "pyright",
"servers": [
{
"id": "pyright",
"displayName": "Pyright",
"enabled": true,
"sourcePath": "/home/you/src/pyright/packages/pyright/langserver.index.js",
"launch": {
"command": "node",
"args": [
"/home/you/src/pyright/packages/pyright/langserver.index.js",
"--stdio"
]
}
},
{
"id": "ty",
"displayName": "Ty",
"enabled": true,
"sourcePath": "/home/you/bin/ty",
"launch": {
"command": "/home/you/bin/ty",
"args": ["server"]
}
},
{
"id": "pyrefly",
"displayName": "Pyrefly",
"enabled": true,
"sourcePath": "/home/you/bin/pyrefly",
"launch": {
"command": "/home/you/bin/pyrefly",
"args": [
"lsp",
"--indexing-mode",
"lazy-blocking",
"--build-system-blocking"
]
}
}
]
}Notes on the fields:
id: stable identifier used by--serverand--baseline-server.displayName: friendly label shown in reports.enabled: optional; defaults to enabled if omitted.sourcePath: optional but useful in generated reports so you can see which build or binary was measured.launch.command: the actual executable to start.launch.args: extra arguments passed to that executable. Relative paths insideargsare resolved relative to the config file directory.- On Linux and macOS,
launch.commandmay be either an absolute path such as/home/you/bin/tyor a command onPATHsuch asnode. launch.benchmarkArgs: optional advanced field for servers that need extra arguments only duringbench-servers. Most setups should omit it.
Pyright is a Node-hosted server that supports --stdio natively. You need:
- a local
node.exe(ornodeon Linux/macOS) - the Pyright
langserver.index.jsentry point
In the config:
launch.commandshould point tonode.exelaunch.argsshould include the path tolangserver.index.jsfollowed by--stdio
Ty is configured as a native executable. Point both sourcePath and launch.command at the built ty.exe, and use:
"args": ["server"]because Ty exposes LSP mode through the server subcommand.
Pyrefly is also configured as a native executable. Point both sourcePath and launch.command at pyrefly.exe, and use:
"args": [
"lsp",
"--indexing-mode",
"lazy-blocking",
"--build-system-blocking"
]Those flags match the settings used in the benchmark runs documented in this repository.
pylsp-mypy is not a standalone type checker with its own LSP server. It is the python-lsp-server (pylsp) with the pylsp-mypy plugin enabled. This means:
- Hover and completion results come from pylsp's built-in Jedi provider, not from mypy.
- Diagnostics (errors, warnings) are provided by mypy through the pylsp-mypy plugin.
- Benchmark results for hover, completion, and document symbols reflect Jedi's performance through the pylsp layer, not mypy's type analysis.
This server is included to show how a Jedi-based LSP endpoint compares in latency and result quality to purpose-built type-checker LSP servers.
pylsp-mypy is installed automatically from PyPI into an isolated venv under .python-lsp-compare/servers/pylsp-mypy/.
The automated tests do not load your checked-in example config or your personal local config. The test suite builds temporary config JSON files inline inside the test cases and passes them with --config, which is why tests continue to pass even if the example file and your local file drift apart.
In practice:
- tests/test_server_configs.py creates temporary config files and verifies the loader and CLI behavior directly.
- tests/test_cli.py and tests/test_reporting.py also create temporary config files for isolated test runs.
That means the example config is documentation, not a fixture that the test suite executes verbatim.
After editing the config, use these commands to verify everything is wired correctly:
python -m python_lsp_compare list-servers --config .python-lsp-compare/lsp_servers.json
python -m python_lsp_compare run-servers --config .python-lsp-compare/lsp_servers.json --scenario hover
python -m python_lsp_compare bench-servers --config .python-lsp-compare/lsp_servers.jsonIf list-servers shows the expected ids and run-servers can complete a small scenario run, the same config is ready for bench-servers.
On Linux or macOS, the same verification commands apply:
python -m python_lsp_compare list-servers --config .python-lsp-compare/lsp_servers.json
python -m python_lsp_compare run-servers --config .python-lsp-compare/lsp_servers.json --scenario hover
python -m python_lsp_compare bench-servers --config .python-lsp-compare/lsp_servers.jsonEach suite folder follows the same internal structure:
config.jsondescribes request points and iteration counts.requirements.txtdescribes the dependency surface to benchmark.src/contains the Python files to open and query..venv/is created automatically per suite and reused across servers.
Each benchmark point can also define an optional validation block in config.json to enforce semantic expectations on measured results:
{
"label": "query completion",
"file": "src/models.py",
"line": 16,
"character": 22,
"validation": {
"minCompletionItems": 1,
"requireNonEmpty": true
}
}Supported validation keys:
requireNonEmptyminCompletionItemsminHoverTextCharsminSymbolCountminLocationCountminSizeChars
If a measured iteration fails these checks, the benchmark point is marked failed and the report records the validation failure details.
Bundled examples:
benchmarks/sqlalchemybenchmarks/webbenchmarks/data_sciencebenchmarks/djangobenchmarks/pandasbenchmarks/transformers
Benchmark suites run inside per-suite virtual environments. This keeps third-party dependencies isolated from the interpreter used for development or testing and ensures every selected server sees the same package set.
Example:
python -m python_lsp_compare run-benchmark \
--server-command python \
--server-arg=-m \
--server-arg=pylspWhen the server is launched with a Python executable, the runner swaps that executable for the suite's virtual environment interpreter. For non-Python launchers such as Node-based servers, the runner still isolates PATH, VIRTUAL_ENV, and related Python environment variables for the server process, writes a temporary pyrightconfig.json at the suite root, and serves workspace configuration requests so Pyright-style servers can resolve the suite interpreter consistently.
The repository uses Python's built-in unittest runner and a fake stdio LSP server in tests/fixtures/fake_lsp_server.py to exercise the CLI, runner, reporting, configuration loading, and benchmark environment behavior without depending on a real language server during test runs.
Run the full suite:
python -m unittest discover -s tests -vThat command is the main test entry point for the repository. It currently runs all test modules under tests/ and validates the following areas:
tests/test_benchmarks.py: benchmark suite discovery, benchmark execution, validation failures on empty results, suite-local virtual environment creation, temporarypyrightconfig.jsongeneration, current-modepython.pythonPathhandling, andworkspace/configurationlogging/round-trips.tests/test_cli.py: basic CLI behavior forlist-scenarios,list-benchmarks,list-servers, andrun.tests/test_reporting.py: markdown and CSV report generation,latest-results.mdupdates, report re-rendering from summary JSON, result-difference reporting, and markdown table sorting by fastest average time.tests/test_runner.py: built-in scenario execution through the raw JSON-RPC runner and fast failure for unknown scenarios.tests/test_server_configs.py: local server config loading, relative argument resolution, baseline selection,run-servers, andbench-serversbehavior when using configured servers.
Run an individual test module when you only need one area:
python -m unittest tests.test_benchmarks -v
python -m unittest tests.test_cli -v
python -m unittest tests.test_reporting -v
python -m unittest tests.test_runner -v
python -m unittest tests.test_server_configs -vThe benchmark-oriented tests use the fixture suite in tests/fixtures/benchmark_suite/. Those tests intentionally create or reuse a suite-local .venv inside the fixture directory so they can verify the same isolated-environment behavior used by real benchmark runs.
If you only changed markdown or CSV rendering, tests.test_reporting is usually the fastest targeted check. If you changed benchmark environment setup, suite discovery, or benchmark-point execution, start with tests.test_benchmarks and then run the full suite before finishing.