Conversation
Dockerized pgbench runner comparing asyncpg, psycopg3-async, and TurboAPI+pg.zig on MagicStack's standard queries (Postgres 18). Run: cd benchmarks/pgbench && docker compose up --build Results (concurrency=10, 30s): SELECT 1+1: asyncpg 95k, psycopg3 35k, turbopg 20k (incl HTTP overhead) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All 3 queries now run successfully: - SELECT 1+1: 19,659 q/s - pg_type (619 rows): 19,599 q/s (cached) - generate_series (1000): 217 q/s (JSON serialization bottleneck) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. TURBO_DISABLE_DB_CACHE=1 disables the DB result cache at runtime. Enables honest benchmarking where every request hits Postgres. 2. _db_query_raw now accepts Python list params directly (no json.dumps) and returns Python list[dict] directly (no json.loads). Eliminates double JSON serialization overhead. 3. Warmup in bench.py reduced from 200 requests to 3 (pool priming only). Known issue: _db_query_raw holds the GIL during I/O, blocking concurrent threads. Next step: release GIL around pg.zig query, reacquire for dict building. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… results - _db_query_raw builds Python list[dict] directly in Zig (no JSON) - Params passed as Python list (no json.dumps/json.loads) - Results buffered into flat array for future GIL release - TURBO_DISABLE_DB_CACHE=1 env var for honest benchmarking - Proper column name handling via PyUnicode_FromStringAndSize TODO: Release GIL around Postgres I/O (PyEval_SaveThread needs C shim for opaque PyThreadState type in Zig cimport) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added C shim (py_gil_shim.c) for PyEval_SaveThread/RestoreThread to work around opaque PyThreadState in Zig cimport. _db_query_raw now releases GIL during: - Connection acquire from pool - pg.zig query execution - Result row buffering GIL reacquired before building Python dicts. Results: Before: 10 threads = 1,358 q/s (no scaling, GIL blocked) After: 10 threads = 10,416 q/s (7.7x scaling) vs asyncpg: 1.25x faster concurrent (10,416 vs 8,302) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrote pgbench_zig to use TurboPG's native path directly: - db.query() calls _db_query_raw (pg.zig binary protocol) - ThreadPoolExecutor for concurrency (GIL released during I/O) - No HTTP server, no aiohttp client, no JSON serialization - Apples-to-apples comparison with asyncpg Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Native driver comparison (no HTTP, Postgres 18, Docker): SELECT 1+1: 123,944 vs 90,430 (1.37x faster) pg_type (619 rows): 45,712 vs 5,856 (7.8x faster) generate_series(1000): 20,356 vs 8,339 (2.4x faster) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Python's str(True) produces "True" (capital T) but pg.zig's bool handler only matches lowercase "true". The pg_type benchmark was returning 0 rows because the bool param wasn't matching. Now checks Py_IsTrue/Py_IsFalse before str() conversion and writes lowercase "true"/"false" directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…arams - flat_buf: 256KB -> 2MB (pg_type 619 rows * 12 cols overflowed) - MAX_RAW_CELLS: 4096 -> 32768 (619*12 = 7428 > 4096) - Handle Python True/False -> lowercase "true"/"false" for pg.zig Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2MB flat_buf + 32k RawCell array on the stack caused crashes with 10 concurrent threads (20MB+ stack). Now heap-allocated with proper cleanup on error paths. Verified: pg_type (619 rows * 12 cols) and generate_series(1000) both return correct row counts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SELECT 1+1: 124,944 vs 93,961 (1.33x, confirmed no caching) pg_type: crashes under sustained 30s concurrent load in Docker (works for individual queries, issue tracked at pg.zig#59) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pg_type columns like typdelim contain raw bytes (0xe8) that aren't valid UTF-8. PyUnicode_FromStringAndSize crashed on these. Now uses PyUnicode_DecodeUTF8 with "replace" error handler which substitutes invalid bytes with U+FFFD instead of crashing. Verified: 30s sustained, 10 threads, 57k queries, 0 errors, 1.9k q/s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MagicStack pgbench suite (Postgres 18, Docker, concurrency=10, 30s): SELECT 1+1: 128,309 vs 88,351 (1.45x faster) pg_type (619 rows): 4,543 vs 5,803 (0.78x, writeJsonValue overhead) generate_series(1000): 20,665 vs 8,160 (2.53x faster) Wins 2/3 queries. No caching, no tricks. Honest numbers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of: binary -> writeJsonValue -> string -> parse -> PyObject Now: binary -> OID switch -> PyLong/PyFloat/PyBool/PyUnicode Handles OIDs: int2(21), int4(23), int8(20), float4(700), float8(701), bool(16), oid(26), text(25), varchar(1043), name(19), char(1042). Everything else decoded as UTF-8 string with 'replace' error handler. Also copies column names into owned storage (fixes dangling pointer after result.deinit()). Local benchmark: pg_type 619 rows, 1909 -> 2551 q/s (1.34x faster) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Direct OID-based binary decode skips writeJsonValue entirely: int2/4/8 -> readInt -> PyLong float4/8 -> bitCast -> PyFloat bool -> data[0] -> Py_True/Py_False text/varchar -> PyUnicode_DecodeUTF8 Also fixes column name dangling pointer (owned copy into storage). Results (Postgres 18, Docker, concurrency=10, 30s): SELECT 1+1: 125,964 vs 88,888 (1.42x) pg_type: 4,954 vs 5,812 (0.85x, up from 0.78x) generate_series: 20,285 vs 7,867 (2.58x) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two optimizations for wide result sets: 1. Column name PyUnicode keys created once, reused for all rows (saves 7,416 PyUnicode_FromStringAndSize calls for 619 rows * 12 cols) 2. _PyDict_NewPresized(num_cols) avoids dict rehashing Local: 2,551 -> 3,432 q/s (34% faster on pg_type) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Final results (Postgres 18, Docker, concurrency=10, 30s, no caching): SELECT 1+1: 124,951 vs 92,415 (1.35x) pg_type: 7,124 vs 5,838 (1.22x) -- was 0.85x before optimizations generate_series: 21,259 vs 8,131 (2.61x) Includes optimization history table and full decode method docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Queries 3 (large_object) and 4 (arrays) need setup tables created before benchmarking. Now reads setup/teardown from query JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added _db_exec_raw C function that uses pg.zig simple query protocol for DDL and multi-statement SQL (CREATE TABLE...; INSERT INTO...;). Used by pgbench_zig runner for setup/teardown of test tables. Now all 7 pgbench queries can run (3-4 need setup tables, 5-6 are COPY/batch which turbopg skips with exit code 3). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR tightens the benchmark methodology and keeps the published numbers aligned with the earlier clean 3-run median benchmark pass, not the later single rerun.
What is included:
Notes: