-
Notifications
You must be signed in to change notification settings - Fork 1
Home
vanilla is a thread‑per‑core HTTP/1.1 server for V, built around a Linux epoll reactor with an optional async runtime and a native (no‑libpq) Postgres client. It is tuned to be allocation‑free on its hot paths so it can run under -prod -gc none (no garbage collector) without leaking memory under sustained load.
This wiki documents the architecture, the performance‑engineering campaign behind it, the profiling tools used to get there, and the lessons learned (including a few that became upstream V issues).
| Page | What it covers |
|---|---|
| Architecture | Thread‑per‑core epoll, SO_REUSEPORT, the flat fd‑indexed reactor, the async park/resume runtime, the connection lifecycle |
| Async Postgres and Pipelining | The native pg_async wire client, cross‑request pipelining, max_inflight, the reactor FIFO queue, persistent pooled connections, and backpressure/shedding |
| Memory Management under gc none | Why -gc none is used, why per‑request allocations leak under it, the zero‑allocation patterns, and the staged campaign that drove the DB path to ~0 bytes/request |
| Profiling and Benchmarking | The reproducible callgrind + RSS‑slope harnesses, how to run them, and the methodology (measure excess over the GC floor, drive concurrent load) |
| Gotchas and Lessons Learned | The non‑obvious things — measurement pitfalls, V language quirks, the load‑generator Content‑Length finding, shed semantics |
V's default build links the Boehm GC. vanilla instead ships -prod -gc none because, on a thread‑per‑core server, allocation throughput does not scale across cores under a shared GC — in a microbenchmark on an 8‑core/16‑thread machine, default‑GC aggregate allocation throughput stayed flat at ~18 M allocs/s from 1 to 16 threads (16 cores ≈ 1 core), while -gc none (libc malloc) scaled ~2.5× before plateauing (see Gotchas and Lessons Learned and vlang/v#27488).
The price of -gc none is that every heap allocation is permanent — nothing is ever reclaimed. So the hot paths must be genuinely allocation‑free. Getting there is what most of this wiki is about.
Measured locally (16‑core dev box) and confirmed at scale in the MDA2AV/HttpArena benchmark:
- Plaintext / pipelined: ~39.8 M req/s at ~68 MiB RSS (a precomputed‑response fast path; zero allocation).
- async‑db (Postgres read, 20 rows): the per‑request DB‑path leak — measured as RSS growth in excess of the Boehm GC floor — went from 181 → ~0 bytes/request across the campaign; arena RSS dropped from 27–44 GiB → ~1.2 GiB while throughput rose dramatically.
-
Connection churn: a per‑worker
ConnStatefree‑list took the per‑reconnect cost from 3 allocations → ~0 (load generators reconnect tens of thousands of times per run).
See Memory Management under gc none for the full story and method.
-
http_server/backend_epoll/— the epoll backend: accept loop, connection state, the async reactor. -
http_server/http1_1/request_parser/— the zero‑copy HTTP/1.1 request framer/parser. -
pg_async/— the native Postgres wire protocol client (SCRAM‑SHA‑256 auth, extended query protocol, pipelining).
These docs describe the server as of the zero‑allocation campaign (PRs #45–#50). Code references may drift; treat the patterns and reasoning as the durable part.