Tighten benchmark methodology and publish corrected results by justrach · Pull Request #77 · justrach/turboAPI

justrach · 2026-03-22T16:34:26Z

This PR tightens the benchmark methodology and keeps the published numbers aligned with the earlier clean 3-run median benchmark pass, not the later single rerun.

What is included:

fixes to the pgbench Turbo runner and validator
corrected pgbench docs/results
corrected HTTP+DB benchmark docs/frontend copy based on the prior 3-run median run
native execute_many plumbing and benchmark helper
CI benchmark workflow cleanup so Docker state is reset before and after runs

Notes:

branch docs intentionally keep the older 3-run median HTTP+DB results
the latest one-off rerun was not used to rewrite the published numbers
benchmark workflows upload raw logs as artifacts in CI

Dockerized pgbench runner comparing asyncpg, psycopg3-async, and TurboAPI+pg.zig on MagicStack's standard queries (Postgres 18). Run: cd benchmarks/pgbench && docker compose up --build Results (concurrency=10, 30s): SELECT 1+1: asyncpg 95k, psycopg3 35k, turbopg 20k (incl HTTP overhead) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All 3 queries now run successfully: - SELECT 1+1: 19,659 q/s - pg_type (619 rows): 19,599 q/s (cached) - generate_series (1000): 217 q/s (JSON serialization bottleneck) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. TURBO_DISABLE_DB_CACHE=1 disables the DB result cache at runtime. Enables honest benchmarking where every request hits Postgres. 2. _db_query_raw now accepts Python list params directly (no json.dumps) and returns Python list[dict] directly (no json.loads). Eliminates double JSON serialization overhead. 3. Warmup in bench.py reduced from 200 requests to 3 (pool priming only). Known issue: _db_query_raw holds the GIL during I/O, blocking concurrent threads. Next step: release GIL around pg.zig query, reacquire for dict building. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… results - _db_query_raw builds Python list[dict] directly in Zig (no JSON) - Params passed as Python list (no json.dumps/json.loads) - Results buffered into flat array for future GIL release - TURBO_DISABLE_DB_CACHE=1 env var for honest benchmarking - Proper column name handling via PyUnicode_FromStringAndSize TODO: Release GIL around Postgres I/O (PyEval_SaveThread needs C shim for opaque PyThreadState type in Zig cimport) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Added C shim (py_gil_shim.c) for PyEval_SaveThread/RestoreThread to work around opaque PyThreadState in Zig cimport. _db_query_raw now releases GIL during: - Connection acquire from pool - pg.zig query execution - Result row buffering GIL reacquired before building Python dicts. Results: Before: 10 threads = 1,358 q/s (no scaling, GIL blocked) After: 10 threads = 10,416 q/s (7.7x scaling) vs asyncpg: 1.25x faster concurrent (10,416 vs 8,302) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rewrote pgbench_zig to use TurboPG's native path directly: - db.query() calls _db_query_raw (pg.zig binary protocol) - ThreadPoolExecutor for concurrency (GIL released during I/O) - No HTTP server, no aiohttp client, no JSON serialization - Apples-to-apples comparison with asyncpg Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Native driver comparison (no HTTP, Postgres 18, Docker): SELECT 1+1: 123,944 vs 90,430 (1.37x faster) pg_type (619 rows): 45,712 vs 5,856 (7.8x faster) generate_series(1000): 20,356 vs 8,339 (2.4x faster) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Python's str(True) produces "True" (capital T) but pg.zig's bool handler only matches lowercase "true". The pg_type benchmark was returning 0 rows because the bool param wasn't matching. Now checks Py_IsTrue/Py_IsFalse before str() conversion and writes lowercase "true"/"false" directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…arams - flat_buf: 256KB -> 2MB (pg_type 619 rows * 12 cols overflowed) - MAX_RAW_CELLS: 4096 -> 32768 (619*12 = 7428 > 4096) - Handle Python True/False -> lowercase "true"/"false" for pg.zig Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2MB flat_buf + 32k RawCell array on the stack caused crashes with 10 concurrent threads (20MB+ stack). Now heap-allocated with proper cleanup on error paths. Verified: pg_type (619 rows * 12 cols) and generate_series(1000) both return correct row counts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SELECT 1+1: 124,944 vs 93,961 (1.33x, confirmed no caching) pg_type: crashes under sustained 30s concurrent load in Docker (works for individual queries, issue tracked at pg.zig#59) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pg_type columns like typdelim contain raw bytes (0xe8) that aren't valid UTF-8. PyUnicode_FromStringAndSize crashed on these. Now uses PyUnicode_DecodeUTF8 with "replace" error handler which substitutes invalid bytes with U+FFFD instead of crashing. Verified: 30s sustained, 10 threads, 57k queries, 0 errors, 1.9k q/s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MagicStack pgbench suite (Postgres 18, Docker, concurrency=10, 30s): SELECT 1+1: 128,309 vs 88,351 (1.45x faster) pg_type (619 rows): 4,543 vs 5,803 (0.78x, writeJsonValue overhead) generate_series(1000): 20,665 vs 8,160 (2.53x faster) Wins 2/3 queries. No caching, no tricks. Honest numbers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of: binary -> writeJsonValue -> string -> parse -> PyObject Now: binary -> OID switch -> PyLong/PyFloat/PyBool/PyUnicode Handles OIDs: int2(21), int4(23), int8(20), float4(700), float8(701), bool(16), oid(26), text(25), varchar(1043), name(19), char(1042). Everything else decoded as UTF-8 string with 'replace' error handler. Also copies column names into owned storage (fixes dangling pointer after result.deinit()). Local benchmark: pg_type 619 rows, 1909 -> 2551 q/s (1.34x faster) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Direct OID-based binary decode skips writeJsonValue entirely: int2/4/8 -> readInt -> PyLong float4/8 -> bitCast -> PyFloat bool -> data[0] -> Py_True/Py_False text/varchar -> PyUnicode_DecodeUTF8 Also fixes column name dangling pointer (owned copy into storage). Results (Postgres 18, Docker, concurrency=10, 30s): SELECT 1+1: 125,964 vs 88,888 (1.42x) pg_type: 4,954 vs 5,812 (0.85x, up from 0.78x) generate_series: 20,285 vs 7,867 (2.58x) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two optimizations for wide result sets: 1. Column name PyUnicode keys created once, reused for all rows (saves 7,416 PyUnicode_FromStringAndSize calls for 619 rows * 12 cols) 2. _PyDict_NewPresized(num_cols) avoids dict rehashing Local: 2,551 -> 3,432 q/s (34% faster on pg_type) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Final results (Postgres 18, Docker, concurrency=10, 30s, no caching): SELECT 1+1: 124,951 vs 92,415 (1.35x) pg_type: 7,124 vs 5,838 (1.22x) -- was 0.85x before optimizations generate_series: 21,259 vs 8,131 (2.61x) Includes optimization history table and full decode method docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Queries 3 (large_object) and 4 (arrays) need setup tables created before benchmarking. Now reads setup/teardown from query JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Added _db_exec_raw C function that uses pg.zig simple query protocol for DDL and multi-statement SQL (CREATE TABLE...; INSERT INTO...;). Used by pgbench_zig runner for setup/teardown of test tables. Now all 7 pgbench queries can run (3-4 need setup tables, 5-6 are COPY/batch which turbopg skips with exit code 3). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

justrach and others added 30 commits March 21, 2026 22:52

fix: log first error in pgbench worker for debugging

4022086

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: show stderr on pgbench failure for debugging

87dc8e4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bench: run all 7 pgbench queries (was 3)

84eeb51

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: add setup/teardown support to pgbench_zig runner

624d86e

Queries 3 (large_object) and 4 (arrays) need setup tables created before benchmarking. Now reads setup/teardown from query JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(faster-boto3): wire zig hooks into botocore hot paths

beeda4c

Clarify pgbench validation results

90c28bb

Expand pgbench benchmark tables

53da28e

Add pgbench median validation workflow

b8d4307

Fix pre-commit Python selection for tests

7bc64f1

Tighten benchmark methodology and native DB paths

b7f5084

Tune postgres benchmark pool sizing

b1d32d7

Add split pool sizing for postgres benchmarks

07633cc

justrach added 7 commits March 22, 2026 23:42

Correct benchmark results and pgbench harness

ac2c8f3

Remove stale about page artifact

fe0121c

Reset benchmark Docker state in CI

551e474

Link GIL shim in zig test target

66b9ca7

Clarify local vs CI pgbench results

7e0d715

Link pgbench docs to HTTP+DB benchmark

dc63ca9

Bump turboapi and turbopg versions

76b736d

justrach merged commit c716d7d into main Mar 23, 2026
6 checks passed

justrach deleted the pg-bench branch April 14, 2026 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tighten benchmark methodology and publish corrected results#77

Tighten benchmark methodology and publish corrected results#77
justrach merged 37 commits intomainfrom
pg-bench

justrach commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant