Skip to content

Tighten benchmark methodology and publish corrected results#77

Merged
justrach merged 37 commits intomainfrom
pg-bench
Mar 23, 2026
Merged

Tighten benchmark methodology and publish corrected results#77
justrach merged 37 commits intomainfrom
pg-bench

Conversation

@justrach
Copy link
Copy Markdown
Owner

This PR tightens the benchmark methodology and keeps the published numbers aligned with the earlier clean 3-run median benchmark pass, not the later single rerun.

What is included:

  • fixes to the pgbench Turbo runner and validator
  • corrected pgbench docs/results
  • corrected HTTP+DB benchmark docs/frontend copy based on the prior 3-run median run
  • native execute_many plumbing and benchmark helper
  • CI benchmark workflow cleanup so Docker state is reset before and after runs

Notes:

  • branch docs intentionally keep the older 3-run median HTTP+DB results
  • the latest one-off rerun was not used to rewrite the published numbers
  • benchmark workflows upload raw logs as artifacts in CI

justrach and others added 30 commits March 21, 2026 22:52
Dockerized pgbench runner comparing asyncpg, psycopg3-async, and
TurboAPI+pg.zig on MagicStack's standard queries (Postgres 18).

Run: cd benchmarks/pgbench && docker compose up --build

Results (concurrency=10, 30s):
  SELECT 1+1: asyncpg 95k, psycopg3 35k, turbopg 20k (incl HTTP overhead)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 3 queries now run successfully:
- SELECT 1+1: 19,659 q/s
- pg_type (619 rows): 19,599 q/s (cached)
- generate_series (1000): 217 q/s (JSON serialization bottleneck)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. TURBO_DISABLE_DB_CACHE=1 disables the DB result cache at runtime.
   Enables honest benchmarking where every request hits Postgres.

2. _db_query_raw now accepts Python list params directly (no json.dumps)
   and returns Python list[dict] directly (no json.loads). Eliminates
   double JSON serialization overhead.

3. Warmup in bench.py reduced from 200 requests to 3 (pool priming only).

Known issue: _db_query_raw holds the GIL during I/O, blocking concurrent
threads. Next step: release GIL around pg.zig query, reacquire for dict
building.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… results

- _db_query_raw builds Python list[dict] directly in Zig (no JSON)
- Params passed as Python list (no json.dumps/json.loads)
- Results buffered into flat array for future GIL release
- TURBO_DISABLE_DB_CACHE=1 env var for honest benchmarking
- Proper column name handling via PyUnicode_FromStringAndSize

TODO: Release GIL around Postgres I/O (PyEval_SaveThread needs C shim
for opaque PyThreadState type in Zig cimport)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added C shim (py_gil_shim.c) for PyEval_SaveThread/RestoreThread to
work around opaque PyThreadState in Zig cimport.

_db_query_raw now releases GIL during:
- Connection acquire from pool
- pg.zig query execution
- Result row buffering

GIL reacquired before building Python dicts.

Results:
  Before: 10 threads = 1,358 q/s (no scaling, GIL blocked)
  After:  10 threads = 10,416 q/s (7.7x scaling)
  vs asyncpg: 1.25x faster concurrent (10,416 vs 8,302)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrote pgbench_zig to use TurboPG's native path directly:
- db.query() calls _db_query_raw (pg.zig binary protocol)
- ThreadPoolExecutor for concurrency (GIL released during I/O)
- No HTTP server, no aiohttp client, no JSON serialization
- Apples-to-apples comparison with asyncpg

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Native driver comparison (no HTTP, Postgres 18, Docker):
  SELECT 1+1:           123,944 vs 90,430 (1.37x faster)
  pg_type (619 rows):    45,712 vs  5,856 (7.8x faster)
  generate_series(1000): 20,356 vs  8,339 (2.4x faster)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Python's str(True) produces "True" (capital T) but pg.zig's bool
handler only matches lowercase "true". The pg_type benchmark was
returning 0 rows because the bool param wasn't matching.

Now checks Py_IsTrue/Py_IsFalse before str() conversion and
writes lowercase "true"/"false" directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…arams

- flat_buf: 256KB -> 2MB (pg_type 619 rows * 12 cols overflowed)
- MAX_RAW_CELLS: 4096 -> 32768 (619*12 = 7428 > 4096)
- Handle Python True/False -> lowercase "true"/"false" for pg.zig

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2MB flat_buf + 32k RawCell array on the stack caused crashes with
10 concurrent threads (20MB+ stack). Now heap-allocated with proper
cleanup on error paths.

Verified: pg_type (619 rows * 12 cols) and generate_series(1000) both
return correct row counts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SELECT 1+1: 124,944 vs 93,961 (1.33x, confirmed no caching)
pg_type: crashes under sustained 30s concurrent load in Docker
  (works for individual queries, issue tracked at pg.zig#59)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pg_type columns like typdelim contain raw bytes (0xe8) that aren't
valid UTF-8. PyUnicode_FromStringAndSize crashed on these.

Now uses PyUnicode_DecodeUTF8 with "replace" error handler which
substitutes invalid bytes with U+FFFD instead of crashing.

Verified: 30s sustained, 10 threads, 57k queries, 0 errors, 1.9k q/s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MagicStack pgbench suite (Postgres 18, Docker, concurrency=10, 30s):
  SELECT 1+1:           128,309 vs 88,351 (1.45x faster)
  pg_type (619 rows):     4,543 vs  5,803 (0.78x, writeJsonValue overhead)
  generate_series(1000): 20,665 vs  8,160 (2.53x faster)

Wins 2/3 queries. No caching, no tricks. Honest numbers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of: binary -> writeJsonValue -> string -> parse -> PyObject
Now:         binary -> OID switch -> PyLong/PyFloat/PyBool/PyUnicode

Handles OIDs: int2(21), int4(23), int8(20), float4(700), float8(701),
bool(16), oid(26), text(25), varchar(1043), name(19), char(1042).
Everything else decoded as UTF-8 string with 'replace' error handler.

Also copies column names into owned storage (fixes dangling pointer
after result.deinit()).

Local benchmark: pg_type 619 rows, 1909 -> 2551 q/s (1.34x faster)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Direct OID-based binary decode skips writeJsonValue entirely:
  int2/4/8 -> readInt -> PyLong
  float4/8 -> bitCast -> PyFloat
  bool -> data[0] -> Py_True/Py_False
  text/varchar -> PyUnicode_DecodeUTF8

Also fixes column name dangling pointer (owned copy into storage).

Results (Postgres 18, Docker, concurrency=10, 30s):
  SELECT 1+1:     125,964 vs 88,888 (1.42x)
  pg_type:          4,954 vs  5,812 (0.85x, up from 0.78x)
  generate_series: 20,285 vs  7,867 (2.58x)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two optimizations for wide result sets:
1. Column name PyUnicode keys created once, reused for all rows
   (saves 7,416 PyUnicode_FromStringAndSize calls for 619 rows * 12 cols)
2. _PyDict_NewPresized(num_cols) avoids dict rehashing

Local: 2,551 -> 3,432 q/s (34% faster on pg_type)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Final results (Postgres 18, Docker, concurrency=10, 30s, no caching):
  SELECT 1+1:     124,951 vs 92,415 (1.35x)
  pg_type:          7,124 vs  5,838 (1.22x) -- was 0.85x before optimizations
  generate_series: 21,259 vs  8,131 (2.61x)

Includes optimization history table and full decode method docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Queries 3 (large_object) and 4 (arrays) need setup tables created
before benchmarking. Now reads setup/teardown from query JSON.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added _db_exec_raw C function that uses pg.zig simple query protocol
for DDL and multi-statement SQL (CREATE TABLE...; INSERT INTO...;).
Used by pgbench_zig runner for setup/teardown of test tables.

Now all 7 pgbench queries can run (3-4 need setup tables, 5-6 are
COPY/batch which turbopg skips with exit code 3).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@justrach justrach merged commit c716d7d into main Mar 23, 2026
6 checks passed
@justrach justrach deleted the pg-bench branch April 14, 2026 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant