Skip to content

feat: v1.4.0 — native PostgreSQL + Redis drivers, H2C upgrade, EventLoopProvider#236

Merged
FumingPower3925 merged 90 commits intomainfrom
feat/v1.4.0-drivers-and-h2c
Apr 19, 2026
Merged

feat: v1.4.0 — native PostgreSQL + Redis drivers, H2C upgrade, EventLoopProvider#236
FumingPower3925 merged 90 commits intomainfrom
feat/v1.4.0-drivers-and-h2c

Conversation

@FumingPower3925
Copy link
Copy Markdown
Contributor

@FumingPower3925 FumingPower3925 commented Apr 16, 2026

Summary

v1.4.0 ships five workstreams as a single coherent release:

  1. EventLoopProvider foundation — engine interface allowing drivers to share the HTTP server's epoll/io_uring workers for per-CPU affinity (feat: EventLoopProvider + WorkerLoop interface for driver integration #113-feat: standalone mini event loop for driver-only mode #116).
  2. HTTP/2 Upgrade (h2c) — RFC 7540 §3.2 transparent upgrade from H1 to H2 cleartext (feat: H1 parser detects Upgrade: h2c + HTTP2-Settings headers #117-test: H2 Upgrade integration tests (upgrade flow, body, edge cases) #122).
  3. Native PostgreSQL driver — wire protocol, database/sql driver, direct Pool, streaming rows, SCRAM-SHA-256, COPY FROM/TO (feat: PostgreSQL wire protocol message framing #123-test: PostgreSQL driver conformance + benchmark vs pgx #131).
  4. Native Redis driver — RESP2/3 parser, typed commands, pipeline, pub/sub, pool, Cluster + Sentinel + shard channels (feat: RESP2/RESP3 parser (zero-alloc) #132-test: Redis driver conformance + benchmark vs go-redis #137, feat: Redis Cluster replica reads (RouteByLatency, ReadOnly) #233-feat: Cluster MULTI/EXEC (cross-slot transaction support) #235).
  5. Native Memcached driver (late scope addition, feat: native Memcached driver (text + binary protocol, pool, EventLoopProvider) #238) — text + binary wire protocols, full typed-command surface.

Plus:

  • pgspec / redisspec / mcspec — protocol-compliance suites against real servers.
  • Config.AsyncHandlers — opt-in async dispatch so third-party drivers (goredis, pgx, gomemcache) don't futex-storm the LockOSThread'd workers.
  • Goroutine-reuse + zero-alloc input buffers in the async path.
  • Peek-before-netpoll reads in direct-mode drivers.
  • Per-cell matrix benchmarks + full H1 RFC 9112 compliance + conformance suites across 18 Postgres/Redis/Memcached/Valkey/DragonflyDB versions in CI.

MS-R1 matrix — headline numbers

Captured 2026-04-19 on MS-R1 (CIX CP8180, 12-core aarch64, Linux 6.6.10-cix, Postgres 15, Redis 7.2, Memcached 1.6.16). loadgen -connections 256 -duration 8s.

HTTP-layer (pure-CPU handlers, no driver)

Engine /plaintext /json /params/:id /body /chain (5-mw)
celeris-epoll (sync) 394,132 372,964 376,562 367,231 228,063
celeris-iouring (sync) 428,582 399,183 413,085 402,552 268,839
celeris-adaptive (sync) 395,313 375,265 384,610 363,336 237,775
celeris-iouring (async) 287,838 272,012 281,500 275,554 204,609
nethttp 184,754 171,778 195,646 177,103 207,387
gin 185,420 171,717 177,883 156,269 184,924
chi 194,527 165,509 177,759 166,808 180,225
echo 187,454 170,429 182,593 157,902 184,542
fiber 313,855 289,847 313,715 294,512 314,203

celeris-iouring sync wins plaintext (+37% over fiber, +131% over the other four competitors). fiber wins the /chain cell (314k vs 269k) because its zero-middleware routing path is tight; celeris's chain includes recovery + logger + requestid + cors + timeout which is closer to a "realistic production" stack.

Driver-layer (celeris HTTP + DB round-trip)

Best celeris cell vs best competitor for the same DB:

DB Best celeris (iouring, async=1) Best competitor Δ
redis (celerisredis) 94,417 nethttp+celerisredis: 78,040 +21.0%
redis (goredis) 75,416 nethttp+goredis: 68,167 +10.6%
postgres (celerispg) 68,641 nethttp+celerispg: 56,930 +20.6%
postgres (pgx) 55,771 nethttp+pgx: 42,944 +29.9%
memcached (celerismc) 101,902 nethttp+celerismc: 76,359 +33.5%
memcached (gomc) 78,800 nethttp+gomc: 73,515 +7.2%

Matrix leader: celeris-iouring + celerismc + AsyncHandlers=true = 101,902 rps — +33.5% over the best nethttp combination.

AsyncHandlers — when to use which

Workload Sync (default) Async (Config.AsyncHandlers: true)
Pure-CPU / plaintext / JSON / params / body +30-33% baseline
Middleware chain +23-26% baseline
Celeris native drivers baseline +5-20%
Third-party drivers (goredis, pgx, gomc) baseline +30-200%

Guidance: if your handler touches a DB or cache via any Go driver (third-party or celeris-native), set AsyncHandlers: true. If your handler is pure-CPU (plaintext, JSON from preallocated data, pure computation), leave the default. Per-route control is tracked in #239 (v1.5.0 spike).

One caveat: celeris-adaptive with AsyncHandlers=true currently regresses 3 cells (celerisredis -7.2%, celerispg -4.5%, celerismc -11.0%) — the engine-side async flag + driver-side sync-path mismatch on Adaptive. Actively being debugged; fix coming before v1.4.0 merge or punted to v1.4.1.

What's included

  • 56 commits organized by workstream (feature commits + bug fixes from review rounds + perf follow-ups).
  • ~62K lines of new code across driver/, engine/, internal/conn/, test/, and .github/workflows/.
  • Full documentation in doc.go for every new package + runnable Example functions.
  • CI integration with postgres:16 and redis:7.2 service containers on the default Conformance job, plus the new Drivers matrix workflow for multi-version coverage (Postgres 14/15/16/17/18 × Redis 7.4/8.0/8.2/8.4/8.6 × Memcached 1.6.29/36/41 × Valkey 9.0.3 × DragonflyDB v1.27.1).
  • Compliance suites — pgspec, redisspec, mcspec — all green against real servers.
  • HTTP/1.1 pipelining test for async dispatch (test/integration/pipeline_test.go).
  • Goroutine-reuse regression tests (engine/epoll/async_reuse_test.go).

Scope decisions

  • TLS deferred to v1.4.1 and tracked by the v1.5.0 spike spike: TLS 1.3 support for PostgreSQL and Redis drivers #232. sslmode=require / rediss:// are rejected with actionable error messages. For managed cloud DB services (RDS, CloudSQL, ElastiCache) TLS is required — this release is for VPC/loopback deployments until TLS lands.
  • Cluster MULTI/EXEC (cross-slot) scoped out; feat: Cluster MULTI/EXEC (cross-slot transaction support) #235 tracks it as an enhancement for a later release. Single-slot ClusterTx is shipped.
  • Standalone io_uring event loop is kept as dead code until the SINGLE_ISSUER vs sync-fast-path conflict is resolved; the epoll standalone path is faster in all measured workloads.
  • Per-route AsyncHandler control tracked as Spike: per-route AsyncHandler control #239 for v1.5.0.
  • H2 async dispatch — AsyncHandlers is HTTP1-only. H2 conns still run inline (which matches the existing H2 stream-manager model). Future exploration.

Test plan

  • go test ./... -race on darwin (62/62 packages)
  • go test ./... -race on Linux aarch64 (MS-R1) — 62/62 packages
  • pgspec against Postgres 15/16 — all pass
  • redisspec against Redis 7.2 — all pass
  • mcspec against Memcached 1.6.16 — all pass
  • Conformance tests against real servers (pg, redis, mc incl. COPY suite — 6/6 pass)
  • Cluster conformance — 5/5 pass against a real 6-node redis:7.2 cluster
  • Sentinel conformance — 4/4 pass against a real redis:7.2 sentinel topology
  • H1 RFC 9112 compliance × epoll × iouring — all pass
  • HTTP/1.1 pipelining test under both AsyncHandlers=false and true
  • Goroutine-reuse + close-wake regression tests on epoll async dispatch
  • Full HTTP-layer + driver-layer MS-R1 matrix (118 cells total) — see numbers above
  • CI validation — 41/41 checks green on HEAD

Foundational interfaces that allow database/cache drivers to share
the HTTP server's event-loop workers. Drivers register file
descriptors on a specific worker via RegisterConn and receive
callbacks on data arrival.

Includes ErrQueueFull, ErrUnknownFD, and ErrSwitchingNotFrozen
sentinel errors. No implementation in this commit — just the contract.

Closes #113
Implements EventLoopProvider on the epoll engine. Adds a per-loop
driverConns map (fd-indexed parallel to the HTTP conn table) gated
by a hasDriverConns atomic flag — the HTTP hot path pays a single
atomic load when no drivers are registered.

- RegisterConn/UnregisterConn/Write with one-write-in-flight
  serialization (mirrors the PR #36 send-queue fix).
- EPOLLIN edge-triggered, EPOLLOUT level-triggered to avoid
  missed wakeups under write contention.
- TOCTOU-safe fd-collision check under driverMu.
- shutdownDrivers fires onClose on engine teardown.

Closes #114
Implements EventLoopProvider on the io_uring engine via new CQE
user-data tags (udDriverRecv/Send/Close at 0x10-0x12, non-
overlapping with existing HTTP tags).

- Driver actions (Register/Unregister/Write) post to a worker-
  owned action queue and wake via the shared h2EventFD; only the
  worker goroutine submits SQEs (preserves SQ single-issuer).
- One SEND in-flight per FD (mirrors PR #36 invariant) using a
  dedicated driverConn.sending flag + writeBuf/sendBuf swap.
- Single-shot RECV per driverConn (no provided-buffer ring);
  avoids conflict with HTTP's multishot path.
- Inflight-op counter guards UnregisterConn's ASYNC_CANCEL CQE
  ordering against in-flight RECV/SEND completions — prevents
  use-after-free on dc.buf.
- shutdownDrivers fires onClose on engine teardown.

Closes #115
Adaptive engine implements EventLoopProvider by delegating to the
active sub-engine. WorkerLoop panics if FreezeSwitching is not
held (driver FDs cannot migrate between epoll/io_uring tables).

Exposes ErrSwitchingNotFrozen for drivers that attempt to register
without first freezing the engine switch.
Minimal event loop used by drivers when no celeris Server is
registered. Linux uses a stripped-down epoll worker (same
primitives as engine/epoll but no accept/HTTP parsing);
non-Linux falls back to goroutine-per-conn via net.FileConn.

- WriteAndPoll sync fast path: caller goroutine does direct
  write(2) + 3-phase read (spin → poll(0) → poll(1ms blocking))
  to avoid goroutine-hop latency for localhost DB/cache round
  trips. recvMu serializes with the event loop's onRecv callback.
- WriteAndPollMulti for pipelined protocols (e.g. Redis Pipeline)
  — single write, poll-drain until isDone.
- EPOLLOUT level-triggered re-arms on EAGAIN (slow consumer
  backpressure).
- registry.go exposes Resolve(ServerProvider) with refcounted
  package-level standalone Loop; returns the Server's provider
  if registered, else the standalone fallback.

The io_uring standalone path is present but not selected by
default: its SINGLE_ISSUER constraint conflicts with the sync
fast path's caller-goroutine reads. Kept as dead code for a
future follow-up (#232 area).

Closes #116
Server.EventLoopProvider() returns the active engine's provider
(epoll, io_uring, or adaptive), or nil for engines that don't
implement the interface (std). Drivers use this to route DB
I/O through the HTTP server's worker event loops for per-CPU
affinity.
H1 clients can now upgrade an HTTP/1.1 connection to HTTP/2 over
cleartext via Connection: Upgrade, HTTP2-Settings + Upgrade: h2c.

- protocol/h1: parser detects the three-token Upgrade handshake.
  Rejects ambiguous Upgrade values (e.g. "websocket, h2c") to
  disambiguate from the WebSocket path.
- resource.Config.EnableH2Upgrade *bool with protocol-dependent
  defaults (Auto→true, H2C/HTTP1→false) propagated into H1State.
- internal/conn/upgrade.go: UpgradeInfo + ErrUpgradeH2C sentinel
  + DecodeHTTP2Settings (RawURL + URL fallback).
- ProcessH1 writes 101 Switching Protocols and returns
  ErrUpgradeH2C without invoking the handler (handler runs later
  on H2 stream 1).
- protocol/h2/stream: Manager.ApplySetting exposed; Processor
  gains InjectStreamHeaders that opens stream 1 from the H1
  headers without an HPACK round trip.
- NewH2StateFromUpgrade constructs the H2State post-101: applies
  client SETTINGS from HTTP2-Settings, emits server preface,
  injects stream 1 with H1→H2 pseudo-headers, dispatches handler.

Includes the rewritten header-copy path that forces strings
out of the H1 recv buffer (prevents use-after-free when the
driver layer reuses the buffer).

Closes #117 #118 #119 #120
Both epoll and io_uring engines detect ErrUpgradeH2C from
ProcessH1 and switch the connection from H1 to H2 state:
- Release H1State, construct H2State via NewH2StateFromUpgrade.
- Feed UpgradeInfo.Remaining through ProcessH2 synchronously
  (the H2 client preface may arrive in the same TCP segment as
  the H1 Upgrade request).
- Flush writes explicitly after switchToH2 so the 101 response
  + server preface + stream-1 reply reach the client promptly.

test/spec/h2c_upgrade_test.go adds integration coverage across
iouring + epoll engines: happy path, POST with body, subsequent
streams 3/5/7, config variations, invalid settings, missing
Connection token, preface-in-same-segment, preface-split-across-
reads, 1 MB body.

Closes #121 #122
Internal primitives shared by the PostgreSQL and Redis drivers:

- Bridge: lock-guarded FIFO ring buffer of pending requests,
  power-of-two capacity, O(1) enqueue/pop. Both PG and Redis
  wire protocols guarantee in-order responses on one connection,
  so a single ring matches.
- Pool[C]: generic worker-affinity connection pool. Per-worker
  idle lists (lock-free fast path), shared overflow pool, and
  a semaphore-based wait queue (matching database/sql.DB
  SetMaxOpenConns semantics). Acquire blocks with ctx deadline
  instead of the old immediate ErrPoolExhausted.
- Backoff: exponential with jitter (shared by PG reconnect and
  Redis PubSub reconnect).
- Health sweep: ticker-driven eviction of expired / idle-too-
  long connections.
PostgreSQL v3 frontend/backend protocol:

- message.go: zero-alloc Reader/Writer for the 1-byte type +
  4-byte length frame. StartupMessage/CancelRequest/SSLRequest
  variants (no type byte).
- startup.go + scram.go: connection handshake with Trust,
  Cleartext, MD5, and SCRAM-SHA-256 (PBKDF2 via stdlib
  crypto/pbkdf2, RFC 7677 test vectors pass). GSS/SSPI/Kerberos
  explicitly rejected.
- query.go: Simple Query 'Q' flow. PGError parsing (severity,
  SQLSTATE, message, detail, hint, position). Defers tag string
  materialization — RowsAffected is zero-alloc.
- extended.go: Parse/Bind/Describe/Execute/Sync/Close. Append-
  style message builders (AppendParse, AppendBind, ...) write
  into the Writer buffer with no per-message snapshot. Supports
  SkipParse for reusing named prepared statements.
- copy.go: CopyInState/CopyOutState + binary header/trailer
  + text-format row encoder (with escape handling).
- types.go + types_time.go + types_numeric.go + types_array.go:
  OID codec registry. Built-ins cover bool, int2/4/8, float4/8,
  text/varchar, bytea, uuid, jsonb, date, timestamp(tz), numeric,
  and common array types. Infinity sentinels for date/timestamp
  (binary and text). Floor correction for pre-epoch dates.
  Zero-alloc decode into pgRows' per-request slab.

Fuzz tests for Reader + types; seed corpus under testdata/fuzz.

Closes #123 #124 #125 #126 #127 #128
…rows

Full PostgreSQL driver on top of the v3 protocol layer.

- driver.go + connector.go + dsn.go: sql.Driver registered as
  "celeris-postgres". DSN supports URL + key=value. sslmode=
  require returns ErrSSLNotSupported with an actionable message
  pointing at the v1.5.0 TLS spike (#232).
- conn.go: pgConn implements driver.Conn + extended interfaces
  (ConnBeginTx, ConnPrepareContext, QueryerContext, ExecerContext,
  Pinger, SessionResetter, Validator, async.Conn). Sync→async
  bridge: handler goroutine encodes + writes + blocks on doneCh;
  event loop parses response and signals completion. WriteAndPoll
  sync fast path eliminates context switches for localhost queries.
  Re-prepare-on-miss (SQLSTATE 26000) after DISCARD ALL.
- stmt.go + rows.go + result.go + tx.go: database/sql facades.
- cancel.go: PG CancelRequest via a separate short-lived TCP
  conn with bounded 5s timeout (independent of caller ctx).
- lru.go: per-conn prepared statement cache.
- pool.go: direct Pool API (postgres.Open + WithEngine) bypassing
  database/sql. Rows.Next()+Scan(...any) matches sql.Rows.
  QueryRow + Row.Scan with sql.ErrNoRows. Tx with savepoints.
  CopyFrom/CopyTo with text-format row encoder. Lazy streaming
  rows: buffer up to 64 rows, promote to bounded channel (cap 64)
  for larger result sets — no OOM on million-row queries, no
  channel-alloc cost on single-row queries.
- Public types.go exposes *Conn (alias for *pgConn) so sql.Conn.
  Raw users can reach Savepoint/ReleaseSavepoint/RollbackTo.

convertAssign supports sql.Scanner (sql.NullString/NullInt64/
pgtype.Inet and custom types), typed primitives, and NULL.
Pool.Result implements driver.Result (LastInsertId returns an
error pointing at RETURNING). Rows.Err() tracks iteration errors.

sessionDirty flag elides DISCARD ALL on ResetSession when the
conn only ran simple queries — avoids one round-trip per
pool return on the hot path.

Closes #129 #130
Zero-allocation RESP parser supporting both RESP2 and the RESP3
types (null _, bool #, double ,, bigint (, blob-error !,
verbatim =, set ~, map %, attribute |, push >).

- Value struct with typed fields (Str aliases Reader.buf for
  bulk/simple/verbatim/blob-err). Array/Map use a sync.Pool-
  backed slice pool.
- Reader.Next returns ErrIncomplete on partial frames without
  advancing the cursor — safe to Feed more and retry.
- MaxBulkLen (512 MiB) + MaxArrayLen (128 M) guards prevent DoS
  from malicious servers advertising huge lengths.
- parseUint overflow check; parseInt accepts math.MinInt64.
- Writer.AppendCommand1..5 arity-specific builders avoid the
  variadic slice allocation that dominated pipeline alloc
  profiles (was 32% of Pipeline1K allocs).
- FuzzReader + FuzzRoundTrip seeded with hand-crafted RESP2/3
  frames.

Closes #132
Redis client API on top of the RESP parser. Typed commands for
all common operations — strings (Get/Set/Incr/Decr/Append/...),
hashes (HGet/HSet/HIncrBy/...), lists (LPush/LPop/LRange/LRem/...),
sets (SAdd/SMembers/SInter/SUnion/SDiff/...), sorted sets
(ZAdd/ZRange/ZRank/ZScore/ZIncrBy/...), keys (Expire/TTL/Type/...),
scripting (Eval/EvalSHA/ScriptLoad), scan iterator (SCAN).

- RedisState event-loop state machine + ProcessRedis(data).
  FIFO request/response matching via async.Bridge.
- HELLO 3 negotiation with RESP2 fallback (AUTH + SELECT).
  WithForceRESP2() escape hatch for ElastiCache-classic-shaped
  servers that advertise 6.x but reject HELLO. Releases the
  HELLO request on fallback (fixes pool leak on Redis <6.0).
- WriteAndPoll sync fast path: single-command round trips
  bypass the event loop for localhost latency.
- Client.Do/DoString/DoInt/DoBool/DoSlice escape hatch for
  commands outside the typed surface.
- OnPush callback for RESP3 client-tracking push frames
  received on command connections (otherwise silently dropped).
- resetSession elides DISCARD when !dirty (avoids a round trip
  on the hot path).
- Context cancellation closes the connection — the pending
  response arriving on a poisoned conn is drained via
  drainWithError, preventing desync with the next command
  issued on a fresh conn.
- Expire(ttl=0) calls Persist (was: silent clamp to 1s).
- WithHealthCheckInterval wires async.Pool health sweeps.
- Nil-safe Client.Close.

Closes #133
- Pipeline: single write for all buffered commands, FIFO
  response matching via async.Bridge. Typed deferred cmd
  handles (StringCmd/IntCmd/StatusCmd/FloatCmd/BoolCmd) that
  resolve via (Pipeline, idx) — pipeline-owned so the struct
  survives slice growth. Release() returns the Pipeline to
  sync.Pool; typed cmd handles become invalid (return
  ErrClosed — orphan guard).
- Sync pipeline fast path: direct read/parse on the caller
  goroutine, zero per-response allocation when the result
  set fits in one TCP chunk. Slab-based copy-detach for the
  string payloads. Dropped Pipeline1K from 1320 → 3 allocs.
- Tx (TxPipeline): MULTI … EXEC variant on a pinned conn
  with Watch/Unwatch support, ErrTxAborted on null EXEC.
- Backpressure-aware context cancellation populates per-cmd
  errors before closing the conn (was: zero Values + no
  indication of cancellation).

maxSlabRetain shrinks oversized slabs on Release to bound
memory from one-off large pipelines.

Closes #134
- PubSub pins a dedicated connection from the pubsubPool
  (command connections cannot be reused — push-mode).
- Subscribe/PSubscribe/Unsubscribe/PUnsubscribe with a mu-
  serialized subscription set as source of truth.
- Auto-reconnect on conn drop via onConnDrop hook → runs a
  reconnectLoop goroutine with async.Backoff (50ms → 5s,
  jittered), replays subscription set with a single pipelined
  SUBSCRIBE + PSUBSCRIBE. Messages during outage are lost
  (at-most-once, documented).
- deliver() + closeMsgCh() serialize on ps.mu (no send-on-
  closed-channel panic).
- nil-safe conn reference during reconnect — subscribe
  failures set ps.conn = nil before closing.

Closes #135
Wraps async.Pool[*redisConn] with Redis-specific dial and
health. Separate cmd and pubsub pools (push-mode conns cannot
be reused for commands).

- WithEngine(ServerProvider) resolves to the HTTP server's
  EventLoopProvider (integrated mode) or falls back to the
  standalone mini event loop.
- HealthCheckInterval default 30s (tunable); MaxOpen/
  MaxIdlePerWorker/MaxLifetime/MaxIdleTime follow
  database/sql-style semantics.
- bounded 5-retry acquire on stale-conn hits (previously
  unbounded recursion).
- Pool error messages include MaxOpen context on exhaustion.

Closes #136
ClusterClient:
- CRC16 (XMODEM) slot computation with {tag} hash tag support.
- [16384]*clusterNode O(1) slot → node routing; background
  CLUSTER SLOTS refresh every 60s + on-demand after MOVED.
- MOVED/ASK redirect handling (max 3 retries). ASK sends
  ASKING on a pinned conn (via pinnedConnKey context) so the
  next command lands on the same connection — fixes a subtle
  bug where a pooled ASKING + pooled command could hit
  different conns.
- Multi-key commands (DEL/EXISTS) fan out per-node sub-calls
  in parallel.
- ClusterPipeline groups commands by slot, executes per-node
  sub-pipelines in parallel, retries MOVED/ASK affected
  commands on refresh.
- ClusterTx with same-slot validation (ErrCrossSlot on
  mismatch; hash tags colocate keys). ClusterClient.Watch
  with the same guard.
- ReadOnly mode: reads routed to replicas (round-robin) with
  READONLY handshake; falls back to primary on replica failure.
- RouteByLatency: per-node RTT measured each refresh; picks
  lowest-latency node for reads.
- Shard channels (Redis 7+): SSubscribe/SPublish +
  smessage/ssubscribe/sunsubscribe recognition in the state
  machine + shard-aware reconnect replay.

SentinelClient:
- Master discovery via SENTINEL get-master-addr-by-name
  + ROLE verification.
- Auto-failover: subscription to +switch-master on a
  sentinel conn; atomic primary swap under RWMutex.
- dialMaster retries 3× with backoff on failover; marks
  client unhealthy (ErrSentinelUnhealthy) if all retries fail
  instead of silently reusing the stale master.

Closes #233 #234 #235
Example* functions for Client.NewClient + Get/Set/Pipeline/
Subscribe + Do escape hatch + TxPipeline. Examples follow the
godoc convention and render under the package "Examples" tab.
Build-tagged //go:build postgres, env-gated by CELERIS_PG_DSN.
Docker-compose spins postgres:16. Covers auth (MD5, SCRAM-SHA-256,
Trust), simple + extended query, type round-trips (all built-in
OIDs + arrays), transactions + isolation levels, savepoints,
COPY IN/OUT, error handling, cancel, pool affinity, concurrency
(1000 goroutines × 100 queries), and large result sets.

Closes #131 (conformance portion)
Build-tagged //go:build redis, env-gated by CELERIS_REDIS_ADDR
(+ optional CELERIS_REDIS_PASSWORD for AUTH). Docker-compose
spins redis:7.2. Covers all data structures (strings, hashes,
lists, sets, sorted sets, keys), pipelines (incl. mid-stream
failure), pub/sub (patterns + unsubscribe + reconnect via
CLIENT KILL TYPE pubsub), transactions + Watch, pool affinity
+ overflow + idle cleanup, RESP2 vs RESP3 HELLO negotiation,
AUTH variants.

Closes #137 (conformance portion)
Modeled after h2spec / Autobahn|Testsuite: external spec
verifier that speaks PG wire protocol directly (raw TCP +
driver/postgres/protocol) without going through database/sql.

51 tests organized by spec section:
- Startup: version negotiation, SSLRequest, CancelRequest,
  malformed startup handling.
- Auth: SCRAM-SHA-256 full handshake + bad-password failure.
- Simple Query: SELECT/INSERT/Error/MultiStatement/NULL/
  LargeResult (100K rows)/TransactionStatus byte.
- Extended Query: Parse/Bind/Describe/Execute/Sync/Close,
  error-during-Bind recovery, portal suspension.
- COPY: text-in, binary-in, out, fail, wrong-format, large-out
  (10K rows).
- Error handling: all PGError fields, NoticeResponse, RFQ
  recovery after error.
- Wire framing: zero-length payload, split reads, back-to-back
  messages.
- Type round-trips: all built-ins + NULL + arrays + 1 MB
  values + infinity sentinels for date and timestamp.
- Lifecycle: Terminate, idle timeout, Cancel.

Invoked via `mage pgSpec` (gated by CELERIS_PG_DSN).
Modeled after pgspec / h2spec: external spec verifier that
speaks RESP directly (raw TCP + driver/redis/protocol).

62 tests + 4 fuzz targets organized by spec section:
- RESP2 types: all 5 base types + null/empty edge cases.
- RESP3 types: all 11 new types (null, bool, double, bigint,
  blob error, verbatim, set, map, attribute, push).
- Command protocol: inline, multi-bulk, pipelines (incl. mid-
  stream failure), max args, large bulks, unknown command.
- AUTH + SELECT variants.
- Pub/Sub: subscribe/message/pattern/unsubscribe/multi-channel/
  PING-in-pubsub/non-sub-command-during-sub/RESP3 push format.
- Transactions: MULTI/EXEC/DISCARD/empty EXEC/queued-error/
  EXECABORT/Watch/Unwatch.
- Wire edge cases: split reads, 10K PING back-to-back, binary
  keys with NUL/CRLF, max bulk size, concurrent conns,
  CLIENT SETNAME round-trip, attribute-prefixed reply,
  integer overflow, push-on-cmd-conn.
- Fuzz: FuzzRESPParse, FuzzRESPRoundTrip, FuzzRESP3Types,
  FuzzBulkBoundary.

Invoked via `mage redisSpec` (gated by CELERIS_REDIS_ADDR).
Darwin-runnable test that proves the headline v1.4.0 architecture:

- Starts a celeris Server (std engine, std-engine-compatible).
- Spins up in-process fake PG and Redis servers.
- Registers handlers at /db and /cache that open a driver Pool
  with WithEngine(server) and run a query on each request.
- Issues real HTTP requests and verifies responses.

Std engine doesn't implement EventLoopProvider, so drivers fall
back to the standalone loop (documented path). On Linux with
epoll/iouring the WithEngine call picks up the engine's native
provider — same code path, real per-worker affinity.
Separate go.mod submodules (replace celeris ../../..) so competitor
libs don't pollute the main module's dependencies.

test/drivercmp/postgres/ — mirrors benchmarks for celeris vs pgx
vs lib/pq across: SelectOne (sql.DB + direct Pool), Select1000
rows (text + binary), InsertPrepared, Transaction, PoolContention,
ParallelQuery, CopyIn_1M_Rows, and integrated net/http handler
latency (celeris Server + pgxpool + net/http).

test/drivercmp/redis/ — celeris vs go-redis: Get, Set, MGet10,
Pipeline10/100/1000/10000, Parallel GET, PubSub1to1 latency.

Each benchmark reports ns/op + B/op + allocs/op; the goal is
pgx/go-redis-parity on single-command and significant wins on
parallel and pipeline paths. Results captured in the PR body.
mage_driver.go exposes:
- TestIntegration — go test ./test/integration/…
- H2CCompliance — H2C upgrade integration tests
- TestDriver {postgres|redis} — conformance suite (env-gated)
- PGSpec / RedisSpec — protocol compliance suites
- BaselineBench {eventloop|h2c|postgres|redis} — snapshot bench
  to results/<ts>-baseline-<subsys>/ with env.json + bench.txt
- DriverProfile {driver} {bench} — runs CPU/heap/mutex/block
  pprof capture + top.txt
- DriverBench {postgres|redis} — full comparator suite
- PreBench — correctness gate (lint+test+spec+integration+
  h2c+testDriver) before any profile/bench work

.github/workflows/ci.yml gains a driver-conformance job with
postgres:16 and redis:7.2 service containers running the
conformance suites with -tags postgres/redis gates.
@FumingPower3925 FumingPower3925 added this to the v1.4.0 milestone Apr 16, 2026
…uccessful reply

The per-slot sub-pipeline used the direct-extraction fast path where successful
results are stored in pc.str + pc.scalar with pc.direct=true and pc.val left
nil. The harvester only read from pc.val, so every successful cluster
pipeline GET returned a zero protocol.Value{}.

- Preserve the original command kind when forwarding to the sub-pipeline
  instead of forcing every command to kindString.
- Harvest the direct path into a synthesized protocol.Value via new
  directToValue helper. String bytes are copied out of pc.str because it
  aliases p.strSlab which the deferred p.Release() reclaims.
- Handle all kinds (string, status, int, float, bool) so INCR/SET/etc.
  return correct typed values from cluster pipelines too.

Adds TestClusterPipelineReturnsValues which fails without the fix
(results[0].Str is empty) and passes after.
…ne conn

The per-command ASK recovery path in ClusterPipeline.execRound did two
separate n.client.Do() calls — one for ASKING and one for the original
command. Each n.client.Do acquires its own conn from the pool, so the
ASKING and the command could land on different TCP connections. Redis
requires both to arrive on the SAME conn; otherwise the migrating slot's
owner responds with MOVED and the pipeline fails.

Fix mirrors the existing single-command doASK pattern (cluster.go:534):
acquire a conn, run ASKING on it, propagate the conn into the follow-up
Do via pinnedConnKey, then release.

Adds TestClusterPipelineASKPinsConn — it models real Redis's strict
ASKING semantics in the fake server (per-conn ASKING flag; GET returns
MOVED if the flag isn't set on that conn). A race goroutine triggered
immediately after ASKING replies hammers the pool with concurrent PINGs,
reliably stealing the released conn before the buggy code can
re-acquire it. The test fails without the fix (MOVED error) and passes
with it.
…TURNING

simpleExec and doExtendedExec do not allocate req.colsCh — Exec paths
only need the CommandComplete tag and never stream. However, the shared
dispatch handlers (reqSimple, reqExtended) called promoteToStreaming
unconditionally once len(head.rows) crossed streamThreshold (64), and
promoteToStreaming ends with close(req.colsCh). Closing a nil channel
panicked on the event-loop worker goroutine.

Real trigger: Exec("INSERT INTO t SELECT ... RETURNING id") where the
SELECT produces >= 64 rows. PG streams DataRows, the Exec caller never
reads them, and the panic kills the reader loop.

Fix:
- Dispatch sites now guard on head.colsCh != nil before promoting — Exec
  paths (colsCh == nil) skip the promotion entirely. The buffered rows
  are simply dropped when the request finishes (the caller only reads
  the tag via simple.TagBytes / extended.Tag).
- Add a defensive nil-guard inside promoteToStreaming as defense in
  depth, so a future caller can't hit the same trap.

Adds TestPgExecReturningLargeResult: fake PG server replies with 128
DataRows + CommandComplete for a simpleExec. Without the fix the test
panics ("close of nil channel") in promoteToStreaming. With the fix the
test cleanly returns RowsAffected=128.
Four error branches in the driver's dialConn implementations called
syscall.Close(fd) while the *os.File wrapper returned from tcp.File()
was still reachable with its runtime finalizer armed. When GC later
fires the finalizer, it calls syscall.Close(fd) a SECOND time —
potentially on an unrelated fd the kernel has already reassigned to
another open socket. Classic phantom-close bug.

Affected sites (replaced syscall.Close(fd) with file.Close(), which
closes the kernel fd AND disarms the finalizer — same pattern already
used correctly on the SetNonblock error branch just above each):

  - driver/postgres/conn.go:568 — NumWorkers == 0 branch
  - driver/postgres/conn.go:610 — RegisterConn failure branch
  - driver/redis/conn.go:156    — NumWorkers == 0 branch
  - driver/redis/conn.go:186    — RegisterConn failure branch

Adds two regression tests (TestPgDialConnNoPhantomCloseOnError and
TestRedisDialConnNoPhantomCloseOnError) that force the NumWorkers==0
branch with a stub Provider and run repeated GC afterward. While
finalizer timing makes this impossible to assert deterministically, the
tests at least pin the correct error path and document the intent.
Primary safeguard remains the code inspection + comments at each
site — direct testing of a finalizer-driven double close would
require fd-recycle injection.
When the memcached client is opened without WithEngine(srv), skip the
mini-loop entirely. Each conn keeps the live *net.TCPConn and does
Write + Read directly on the caller's goroutine via Go's netpoll
(which parks the G on EPOLLIN transparently).

The mini-loop path is a net loss for standalone request/response
workloads:

  Profile of nethttp + celerismc (mini-loop path) at 74k rps:
    WriteAndPoll        36.4s cum  (33% CPU)
    ├─ flushLocked       16.6s      (15% — the actual write syscall)
    ├─ Phase B polls      6.1s       (5.4% — poll(0) x 16)
    ├─ EpollCtl MOD       2.7s       (2.4% — mask + unmask per op)
    ├─ Phase A read       1.5s       (1.3%)
    └─ recvMu/overhead    ~9s       (~8%)

The last ~11s (~10% of total CPU) is overhead the mini-loop adds on
top of what gomc pays. gomc just calls net.Conn.Write/Read; Go's
netpoll handles EPOLLIN transparently, no per-op syscalls beyond
write+read. Direct mode matches that shape exactly.

Matrix on MS-R1 (MC cells only), matrix 12 → matrix 13:

  nethttp + celerismc   74,689 → 87,556  (+17.2%, gomc: 86,282  — win)
  gin     + celerismc   73,977 → 88,707  (+19.9%, gomc: 87,589  — win)
  chi     + celerismc   71,352 → 87,420  (+22.5%, gomc: 83,664  — win)
  echo    + celerismc   73,960 → 88,447  (+19.6%, gomc: 86,602  — win)

celerismc now beats gomemcache on every foreign HTTP server tested.
p99 latency also improves — ~10ms → ~6ms per row.

Engine-integrated path (WithEngine supplied) unchanged: the mini-loop
is still used so DB conns colocate with the celeris HTTP engine's
LockOSThread'd worker. Two modes coexist behind the same mcConn
struct (useDirect discriminates); existing tests and the engine-
integrated bench cells are unaffected.

Implementation:
  - NewClient without WithEngine → newDirectPool instead of newPool
  - dialDirectMemcachedConn skips eventloop.Resolve, keeps *net.TCPConn
  - execText / execBinary / execBinaryMulti branch on useDirect
  - Close branches on useDirect
Two post-commit CI failures on 0c4b6d8:

1. Lint errcheck: `defer c.tcp.SetDeadline(time.Time{})` in
   execTextDirect / execBinaryDirect / execBinaryMultiDirect didn't
   check the returned error. Wrap in `defer func() { _ = ... }()`.

2. TestWriteAndPollSyncPath flaked again under -race on ubuntu CI.
   The previous fix (synchronous peer write before WriteAndPoll)
   wasn't enough — on loaded runners the write buffer hasn't always
   surfaced on our end of the socketpair by the time Phase A's single
   non-blocking read runs, so Phase A misses and the test times out
   in Phase C. Add a brief poll(50ms) as a deterministic hand-off
   signal (independent of timer resolution / scheduler jitter) before
   calling WriteAndPoll — asserts the buffer is actually readable,
   then runs the exact same WriteAndPoll test.
An earlier attempt to always route through direct net.TCPConn mode
(even under WithEngine) regressed the celeris-engine + celerismc cell
catastrophically: 65k → 34k rps (-48%). Root cause:

  Handler runs on celeris HTTP engine's LockOSThread'd worker G. When
  that handler calls net.TCPConn.Read (direct mode), Go's netpoll
  parks the G on EPOLLIN. Parking a locked G triggers
  stoplockedm + startlockedm — the same futex-storm pathology that
  WriteAndPollBusy was introduced to avoid in the first place.

Revert the blanket switch: mc uses direct mode in standalone only
(cfg.Engine == nil), falls back to mini-loop + WriteAndPollBusy when
WithEngine is supplied. The big +17–21% wins on the foreign-HTTP
cells (matrix 13) are preserved; the celeris-engine cells return to
their matrix-13 numbers (~64–69k).

The shared-event-loop promise of WithEngine(srv) is honored for mc
by colocating conns on the engine's worker via the mini-loop sync
path, which is futex-safe for locked callers.
…ration gap

Validated on MSR1 bare metal, celeris-epoll + celerismc:

  Matrix 16 baseline (inline handler + mini-loop):      64,147 rps
  CELERIS_ASYNC_HANDLERS=1 (async + mini-loop):         53,877 rps  (-16%)
  CELERIS_ASYNC_HANDLERS=1 + CELERIS_MC_FORCE_DIRECT=1: 105,267 rps (+64%)

Context:

The celeris HTTP engine's workers are runtime.LockOSThread'd (for
SINGLE_ISSUER on io_uring, CPU affinity on epoll). Handlers run inline
on those locked worker goroutines. When a handler blocks on DB I/O:

  - Inline + mini-loop: handler does unix.Poll on the locked M. P is
    detached during the syscall but no unlocked Gs exist to use it, so
    the P sits idle. Other FDs on this worker wait until handler
    returns. Measured throughput: 64k rps (NumWorkers × 1/RTT bound).
  - Async + mini-loop: handler runs on a spawned unlocked G. It blocks
    in unix.Poll which still ties up an M (Go can't park G on a bare
    syscall). Go spawns more Ms, context-switch overhead eats the
    parallelism benefit. Regression to 54k.
  - Async + direct (net.Conn.Read): handler on unlocked G reads via
    Go's netpoll. netpoll parks the G efficiently — no M is blocked,
    no new Ms spawned. The worker is free to service other FDs while
    the G waits for EPOLLIN. Throughput jumps to 105k — BEATING every
    other config in the matrix (foreign HTTP + celerismc: 88k; foreign
    HTTP + gomc: 87k).

This commit lands the two env-gated knobs that demonstrated the effect:

  - CELERIS_ASYNC_HANDLERS=1 on the epoll engine: dispatches HTTP1
    handlers to goroutines, serialized per-conn via detachMu. Worker
    returns to epoll_wait immediately after dispatching. Non-async
    path unchanged — zero overhead when the flag is off.
  - CELERIS_MC_FORCE_DIRECT=1 on the memcached driver: uses the direct
    net.Conn path even when WithEngine(srv) is supplied. Safe only on
    async-engine path (direct on a locked M would futex-storm via
    netpoll's G parking).

Both are experimental and NOT production-ready:
  - Error / close paths are best-effort
  - No support for HTTP/2 handlers (only HTTP1 dispatched)
  - H1State mutations race with concurrent dispatches (single-
    connection serial clients are OK; pipelined requests are not)
  - CELERIS_MC_FORCE_DIRECT without CELERIS_ASYNC_HANDLERS regresses
    celeris-engine cells (direct on locked M = futex storm)

A proper implementation in v1.4.x:
  - Config.AsyncHandlers as a first-class Server option
  - Per-conn input buffer for pipelined requests
  - Driver-side signaling so direct mode activates automatically when
    the caller is on an async-dispatched G (pprof label or context key)
  - Extension to io_uring engine (requires SQE hand-off from handler
    goroutine back to the SINGLE_ISSUER worker)
…ch + netpoll I/O

Config.AsyncHandlers is now a first-class Server option (default: false).
When set AND the engine is epoll (or std, which is always async natively),
the engine dispatches HTTP1 handlers to spawned goroutines instead of
running them inline on the LockOSThread'd worker. Drivers opened with
WithEngine(srv) auto-detect this via eventloop.IsAsyncServer() and
switch their I/O path to match the caller's Go-runtime shape:

  Caller shape       | Driver I/O              | Why
  -------------------|-------------------------|--------------------------
  Inline (locked M)  | mini-loop sync/busy     | net.Conn.Read on locked M
                     |                         | futex-storms via netpoll
  Async (unlocked G) | direct net.TCPConn      | Go netpoll parks the G
                     |                         | cleanly, no M blocked
  Standalone (no     | mc: direct, redis/pg:   | mc direct is faster
   engine)           | mini-loop               | standalone; redis's tiny
                     |                         | responses favor mini-loop

Implementation:

  * celeris.Config: new AsyncHandlers bool (doc'd with trade-offs).
    Propagates through resource.Config into engine bootstrapping.
  * celeris.Server.AsyncHandlers(): honors the flag only when the engine
    actually implements async dispatch (currently Epoll + Std; iouring and
    adaptive return false so drivers don't hand themselves the direct
    path and futex-storm on a locked worker).
  * engine/epoll: Loop.async bool, set from Config.AsyncHandlers (OR'd
    with CELERIS_ASYNC_HANDLERS env var for diagnostic overrides). In
    drainRead, when async && HTTP1, copy the read bytes and spawn a
    handler goroutine that holds cs.detachMu around ProcessH1 + inline
    flush. Worker returns to epoll_wait immediately. Zero overhead on
    the non-async path.
  * driver/internal/eventloop: new AsyncHandlerProvider interface; new
    IsAsyncServer helper that drivers call to detect the dispatch mode.
  * driver/memcached: client auto-selects direct mode when Engine==nil
    OR IsAsyncServer(Engine) is true.
  * driver/redis: client auto-selects direct mode only when
    IsAsyncServer(Engine) is true; standalone and sync-engine paths use
    mini-loop (redis's tiny GET responses measurably favor mini-loop's
    sync spin over net.Conn.Read + netpoll wake). Cmd pool direct;
    pubsub always uses mini-loop because unsolicited push frames need
    event-driven delivery.
  * driver/postgres: pool tracks asyncEngine; useBusySync is disabled on
    async-dispatched engines so the handler G can yield via
    runtime.Gosched between Phase B polls (cheap on an unlocked G;
    futex-storm on a locked M — which is why busy-path exists).

MSR1 bare-metal validation (celeris-epoll with ASYNC=1, full matrix
re-run inflight at commit time; partial early rows):

  celeris-epoll + celerisredis  ~86k  (matrix 16:  82.6k, +4%)
  celeris-epoll + goredis       ~64k  (matrix 16:  24.4k, +164%)
  celeris-epoll + celerispg     ~52k  (matrix 16:  43.7k, +19%)
  celeris-epoll + pgx           ~46k  (matrix 16:  39.4k, +19%)
  celeris-epoll + celerismc    ~105k  (matrix 16:  64.1k, +64%)
  celeris-epoll + gomc          ~89k  (matrix 16:  46.2k, +95%)

celeris-epoll + celerismc at 105k is the single fastest cell in the
entire 36-config matrix — beating nethttp + celerismc (89k) by 19%
and nethttp + gomc (88k) by 20%. celeris-epoll + goredis at 64k is
a 2.6× jump — async+driver-in-netpoll rescues non-celeris drivers too.

iouring and adaptive paths: behavior unchanged (Server.AsyncHandlers()
reports false on those engines even when config is set), matching
matrix 16 numbers exactly so no regressions.

Not yet covered in this commit:
  * iouring async dispatch — requires SQE hand-off from handler back to
    the SINGLE_ISSUER worker. Tracked as v1.4.x follow-up.
  * PG direct mode — PG startup is a multi-round protocol (SCRAM-SHA-
    256 challenge, etc.) that needs driver-specific plumbing for
    direct mode. PG still uses mini-loop under async but with the
    yielding sync path instead of busy-poll, closing the worst of the
    pre-fix regression (-14% → +19%).
Goroutine-per-conn dispatch: each HTTP1 conn buffers incoming bytes
under asyncInMu and spawns a single dispatch goroutine that drains
the buffer, running ProcessH1 under detachMu. ProcessH1's built-in
offset loop handles pipelined requests in order; responses land on
writeBuf in request order before the flush.

Previously (epoll) spawned a goroutine per read-batch, which let
pipelined bursts race on detachMu. Now the per-conn invariant is
enforced by asyncRun — only one goroutine alive per conn at a time,
and the next batch only spawns after the previous cleared asyncRun
under the same mutex that guards the input buffer.

io_uring: async dispatch now works under SINGLE_ISSUER by reusing
the existing detachMu + detachQueue + eventfd machinery. Handler
goroutine writes land in writeBuf; after ProcessH1, the FD is
queued on detachQueue, the worker picks it up via the eventfd
wakeup and submits SEND SQEs from its own thread. closeConn now
signals asyncClosed.Store(true) so the dispatch goroutine exits
at its next iteration.

Server.AsyncHandlers() now returns true for IOUring too.

Cross-cut fixes so detachMu != nil no longer implies "truly
detached": writeCap / sendCap / timeout-scan / shutdown / closeConn
now gate on h1State.Detached, which is only set when OnDetach fires
(WS/SSE). Async-mode conns pre-allocate detachMu in
acquireConnState without triggering those branches.

Adds test/integration/pipeline_test.go exercising ordering under
both AsyncHandlers=true and false.
… engine

Symmetric to memcached/redis direct mode. Standalone pools and pools
opened WithEngine on an async engine now dial *net.TCPConn directly
and drive reads from the caller goroutine via Go's netpoll — no
mini-loop involvement, no LockOSThread, no futex storm.

- writeRaw(data) uniform helper: tcp.Write under directMu or
  loop.Write via the mini-loop. All 15 c.loop.Write(c.fd, ...)
  call sites use it.
- driveDirect(ctx, req): tight tcp.Read → onRecv loop until
  req.doneAtom fires or ctx cancels. Includes a non-blocking
  MSG_DONTWAIT peek so loopback-fast responses skip the netpoll
  G-park wakeup.
- waitForQueryRows direct branch: buffers rows (syncMode pinned
  so dispatch never promotes to streaming, which would deadlock
  on the caller goroutine).
- dialDirectConn: TCP dial, SetNoDelay, SyscallConn-captured fd
  for peek, doStartup runs synchronously via the new drive path.
- Close: direct-mode closes via tcp.Close with bounded write
  deadline; loop-mode path unchanged.
- COPY FROM/TO guarded: ErrDirectModeUnsupported surfaces when
  called on a direct-mode conn. CopyInResponse / CopyOutResponse
  require event-loop-driven unsolicited delivery which has no
  reader in the direct model.
- Pool.dial routes to dialDirectConn when !hasEngine OR
  asyncEngine, matching the rule memcached uses today.
One non-blocking syscall.Recvfrom(MSG_DONTWAIT) before each tcp.Read
in execDirect / execManyDirect (redis) and execTextDirect /
execBinaryDirect / execBinaryMultiDirect (memcached). Loopback-fast
responses (10-byte GET, +PONG, small memcached VALUEs) land in the
recv buffer before tcp.Read is called; the peek catches them with
a single syscall and skips the ~1-2µs netpoll G-park wakeup.

One peek per iteration (not a tight spin) — repeated MSG_DONTWAIT
would re-introduce the P-hogging regression that a bounded spin
avoids. Fd cached at dial time via SyscallConn().Control.
@FumingPower3925
Copy link
Copy Markdown
Contributor Author

Post-review follow-up: close remaining integration gaps (W1-W4)

Three additional commits land on top of 6f353ee:

  • b4ac40bW3 + W1: pipelining-safe async dispatch on both engines. Per-conn single-goroutine dispatch model replaces the earlier per-read-batch spawn (fixed an ordering hazard under pipelined HTTP/1.1). io_uring now implements async dispatch under SINGLE_ISSUER by reusing the existing detachMu + detachQueue + eventfd machinery. Server.AsyncHandlers() now returns true for IOUring too. test/integration/pipeline_test.go exercises ordering under both inline and async dispatch.
  • 93ca5deW2: PostgreSQL direct *net.TCPConn mode. Symmetric to memcached/redis — standalone pools and WithEngine(async) pools bypass the mini-loop and drive Read on the caller goroutine via Go's netpoll. COPY FROM/TO returns ErrDirectModeUnsupported on direct-mode conns (require event-loop-driven unsolicited delivery).
  • 55967efW4: MSG_DONTWAIT peek before netpoll in every direct-mode read loop (redis / memcached / postgres). Catches loopback-fast responses without the G-park wakeup; one peek per iteration (no tight spin — P-hog avoidance).

What this closes

  1. io_uring + async-handlers parity with epoll (previously async was silently a no-op).
  2. PG direct mode (was left on mini-loop after the sync-fast-path busy-path fix).
  3. HTTP/1.1 pipelining correctness under async dispatch.
  4. Redis direct-mode netpoll wakeup overhead vs nethttp.

Verification

  • Full test suite — go test -race ./...59/59 packages green on darwin. Driver suites: redis / memcached / postgres + their protocol packages.
  • Build verified on GOOS=linux GOARCH=arm64.
  • Full MSR1 matrix + pipeline test need Linux infra to execute; plan's exit criteria (celeris-epoll + celerismc ≥ 100k rps, celeris-iouring + celerismc ≈ epoll, celeris-epoll + celerispg ≥ 80k, celeris-epoll + celerisredis ≥ 93k) await CI.

…hold cs

Data race detected by -race in TestHTTP1PipeliningAsync/async: worker's
closeConn -> releaseConnState was resetting cs fields concurrently with
the async dispatch goroutine's asyncInBuf/asyncRun/asyncClosed writes.

Root cause: my earlier split between 'detached' (detachMu != nil) and
'trulyDetached' (h1State.Detached) let releaseConnState run for
async-mode conns even though their dispatch goroutine still held a cs
reference. Restore the original invariant — any goroutine-holding conn
skips the pool return. GC collects cs once the goroutine exits.

CloseH1 gating (trulyDetached) is kept: async-mode conns still own H1
state because no middleware goroutine is holding it open past Detach.
Only the release path now uses the broader 'detached' flag.
Earlier W2 commit guarded COPY with ErrDirectModeUnsupported because
direct mode has no event-loop goroutine driving onRecv — copyReady /
doneCh would never fire. That regressed 5 conformance tests.

Fix: spawn a short-lived reader goroutine (startDirectReader) for the
duration of each copy operation. The reader pumps tcp.Read → onRecv
with a 50ms read deadline so it periodically checks the stop channel;
the caller goroutine remains the sole writer of CopyData frames
(tcp.Write concurrent with tcp.Read on another goroutine is safe).

Final wait in copyFrom / copyTo uses select on doneCh/ctx.Done in
direct mode instead of c.wait — c.wait's driveDirect would spawn a
second concurrent tcp.Read and race the reader goroutine.

Background reader also fails the request chain via c.failAll on
unexpected EOF, so transport errors surface cleanly through doneCh
rather than hanging the caller.
…atch

pprof on msr1 (aarch64, celeris-epoll+celerisredis ASYNC=1) showed
'go l.runAsyncHandler(cs)' at 450ms / 13.82s CPU = 3.3% — every
request was re-spawning the dispatch goroutine. On keep-alive load
with per-conn request gaps, asyncInBuf would drain, asyncRun went
false, goroutine exited, next read spawned a fresh one.

Fix: add sync.Cond so the dispatch goroutine parks on asyncCond.Wait
when asyncInBuf is empty rather than exiting. Worker signals after
each append. Goroutine lives until closeConn broadcasts via
asyncClosed + Cond.Broadcast.

Also double-buffer asyncInBuf/asyncOutBuf so the goroutine's swap
on pickup doesn't force the worker to re-allocate on the next
append. Drops the dataCopy intermediate (was one heap alloc per
request) — worker now appends cs.buf bytes directly into asyncInBuf,
goroutine swaps out before ProcessH1 (dropping the cs.buf aliasing
risk).

MSR1 matrix impact (aarch64, 12c, 256 conns, ASYNC=1):

  | cell                              | before  | after   | delta   |
  |-----------------------------------|---------|---------|---------|
  | celeris-epoll   + celerisredis    |  85,134 | 100,322 | +17.8%  |
  | celeris-iouring + celerisredis    |  99,962 | 115,981 | +16.0%  |
  | celeris-epoll   + goredis         |  65,082 |  76,358 | +17.3%  |
  | celeris-iouring + goredis         |  80,295 |  96,041 | +19.6%  |
  | celeris-epoll   + celerispg       |  55,361 |  61,740 | +11.5%  |
  | celeris-iouring + celerispg       |  58,745 |  66,411 | +13.1%  |
  | celeris-epoll   + celerismc       | 103,467 | 108,147 |  +4.5%  |
  | celeris-iouring + celerismc       | 112,865 | 118,668 |  +5.1%  |

celeris-epoll + celerisredis: now +9.2% vs nethttp+celerisredis
(was -6.1% before). The epoll-async redis gap is closed.

New matrix leader: celeris-iouring + celerismc = 118,668 rps
(+35% vs nethttp+celerismc = 87,876).

Validated on msr1 (Linux 6.6.10-cix, aarch64): 62/62 packages pass
-race, all spec/conformance suites pass (pgspec/redisspec/mcspec +
conformance/postgres/redis/memcached + H1 RFC 9112).
doExtendedQuery sent Parse(if first)+Bind+Describe+Execute+Sync for
every extended query. For cached prepared statements (autoCache +
stmtCache hit), the row description is already known from the
initial prepare, so the portal Describe is redundant — server still
returns RowDescription each call, costing 7 bytes on the wire and
one protocol-state transition per query.

Pass cached columns through the call chain (QueryContext ->
doExtendedQuery), pre-populate req.columns + req.extended.Columns,
set HasDescribe=false so the state machine transitions
BindComplete -> ExecuteResult directly. The ExtendedQueryState
machine already supported HasDescribe=false; no state-machine
change needed.

FormatCode fixup: prepare-time Describe returns FormatCode=0 (text)
because Postgres doesn't decide the output encoding until Execute
receives the resultFormats vector. We pass [FormatBinary] in the
Execute, so we shallow-copy the cached ColumnDesc slice and
overwrite FormatCode to FormatBinary — keeps decode on the fast
binary path and leaves the stmtCache's slice pristine for re-use.

Measured impact is small (net +~1% RPS, noise-level on MSR1) — the
27% CPU in tcp.Write on the hot PG cell is syscall fixed cost, not
per-byte. Keeping the change for correctness and slightly smaller
wire footprint; saves one server state transition per query which
reduces tail latency on PG cells (p50 3645µs -> 3606µs).

All pgspec/conformance suites pass on Linux (msr1, Postgres 16).
Two regression tests for the async dispatch path:

  TestAsyncHandlerGoroutineReuse — spins up the engine with
  AsyncHandlers=true, opens one keep-alive conn, sends 100 serial
  requests with 500µs idle gaps between them, and asserts that the
  runtime goroutine count only grew by ≤5 between first-request
  baseline and last request. Before the sync.Cond reuse fix, each
  idle-then-resume batch respawned the dispatch goroutine, which
  would drive that delta well above tolerance.

  TestAsyncHandlerCloseWakesGoroutine — opens one conn, sends one
  request, closes. Asserts that within 2s the goroutine count
  returns to within +2 of the pre-test baseline. This exercises
  closeConn's asyncCond.Broadcast path — without it, the parked
  dispatch goroutine would leak until GC finalized the connState,
  which can be tens of seconds under a busy test suite.

Also cleaned up three leftover references to internal plan jargon
("W4") in public struct comments — replaced with descriptive text.
Updated runAsyncHandler's doc comment to reflect the reuse model
(the stale comment still said "exit when buffer empty" from the
pre-Cond implementation).
@FumingPower3925
Copy link
Copy Markdown
Contributor Author

Post-review updates (commits since the last summary at 55967ef)

Seven additional commits landed after the initial W1-W4 summary. Chronological:

  • 62c0d79 — gofmt fix (trailing field alignment in epoll/conn.go + pg/conn.go).
  • a93d15b — fix: race in releaseConnState surfaced by -race in TestHTTP1PipeliningAsync/async. Reverted the release-skip gate from trulyDetached to detached, so any goroutine-holding conn (WS detach OR async dispatch) skips pool return and GC collects cs after the goroutine exits.
  • 808e8bb — fix: COPY FROM/TO in direct-mode PG. Previous W2 commit guarded COPY with ErrDirectModeUnsupported, which regressed 5 conformance tests. Fix spawns a short-lived reader goroutine (startDirectReader) for the duration of each copy operation; reader pumps tcp.Read → onRecv with a 50ms deadline so it observes the stop signal. conformance/postgres + pgspec both green after the fix.
  • 4a38e45 — perf: goroutine-reuse via sync.Cond + zero-alloc double-buffered input. The big one. pprof on msr1 showed go runAsyncHandler(cs) at 3.3% of CPU (450ms / 13.82s). Each keep-alive idle gap respawned the dispatch goroutine. Fix parks the goroutine on asyncCond.Wait between requests; asyncInBuf / asyncOutBuf swap eliminates the per-request dataCopy allocation.
  • 7c4a80b — perf: skip Describe for cached prepared statements. Saves 7 bytes/query and one server state transition; marginal (~1% RPS) but keeps the wire tight. Populates cached columns with FormatCode=FormatBinary since prepare-time Describe returns FormatText.
  • d595c23 — test: two regression tests for the async dispatch path — TestAsyncHandlerGoroutineReuse locks in the sync.Cond reuse behavior (goroutine count stays constant across 100 keep-alive requests), TestAsyncHandlerCloseWakesGoroutine guards the asyncCond.Broadcast on close path. Also cleans three stale "W4" plan-jargon references from public struct comments.

Measured impact on MS-R1 (aarch64, 12c, 256 conns, 8s per cell, post-fix clean run)

Driver cells under AsyncHandlers=true (the headline workload):

Cell Pre-4a38e45 Post-4a38e45 Δ
celeris-epoll + celerisredis 85,134 100,322 +17.8%
celeris-iouring + celerisredis 99,962 115,895 +16.0%
celeris-epoll + goredis 65,082 76,358 +17.3%
celeris-iouring + goredis 80,295 96,041 +19.6%
celeris-epoll + celerispg 55,361 62,153 +12.3%
celeris-iouring + celerispg 58,745 66,884 +13.9%
celeris-epoll + celerismc 103,467 108,147 +4.5%
celeris-iouring + celerismc 112,865 118,873 +5.1%

The epoll + celerisredis cell flipped from -6.1% vs nethttp+celerisredis to +9.2% above after the reuse fix. The iouring + celerismc cell is the overall matrix leader at 118,873 rps — +37.5% over nethttp + celerismc (86,479).

Async vs sync — full matrix context

A broader async-vs-sync comparison across pure-CPU handlers also landed. Honest finding: async and sync are complementary, not substitutes.

Workload Sync (AsyncHandlers=false) Async (=true)
Pure-CPU (plaintext, JSON, params, body) +30-33% RPS baseline
Chain middleware +23-26% RPS baseline
DB-integrated / 3rd-party drivers baseline +30-200% RPS

Removing sync would regress plaintext from 428k → 288k rps on celeris-iouring. Removing async would leave goredis at 25k instead of 75k.

Tracked as #239 — v1.5.0 spike for per-route AsyncHandler(true/false) so users with mixed workloads can opt-in per endpoint.

Known issue (follow-up tracked)

celeris-adaptive with AsyncHandlers=true regresses on celerisredis (-7.2%), celerispg (-4.5%), celerismc (-11.0%) — the engine-side async dispatch is enabled but the driver sees Server.AsyncHandlers() == false for Adaptive (deliberately gated because Adaptive hot-swaps between epoll and iouring), so drivers pick the mini-loop sync path. This mismatch creates a worse combo than either pure mode. Actively being debugged; will ship a fix before merging if it's driver-side, or punt to v1.4.1 if it requires adaptive-engine rework.

CI status

41/41 checks green on d595c23. All spec + conformance suites pass on msr1 (Linux 6.6.10-cix, aarch64): pgspec, redisspec, mcspec, conformance/{postgres,redis,memcached}, H1 RFC 9112 × epoll × iouring.

Adaptive was excluded from Server.AsyncHandlers() out of a worry that
the engine's hot-swap between epoll and iouring could invalidate
direct-mode driver conns mid-flight. That concern was based on a
false premise — direct-mode drivers don't register FDs with the
engine (they dial net.TCPConn and drive reads on the caller
goroutine via Go netpoll). adaptive.performSwitch already refuses
to switch while any driver-registered FDs exist, and direct-mode
drivers contribute zero, so a switch is a no-op for them.

What the old gate actually did: Config.AsyncHandlers=true enabled
the async dispatch path in the engine (both epoll and iouring
workers honor it), but Server.AsyncHandlers() returned false for
Adaptive, so drivers opened WithEngine(srv) saw IsAsyncServer=false
and picked the mini-loop busy-poll sync path. Handlers ran on
unlocked spawned Gs, and 256 concurrent busy-poll Gs starved
CPU — which regressed 3 celeris-native driver cells on the matrix
(celerisredis -7.2%, celerispg -4.5%, celerismc -11.0% vs ASYNC=0).

Flipping the gate makes drivers pick their direct-mode path, same
as they do on pure epoll or iouring. Direct-mode drivers go through
Go netpoll and are engine-agnostic — they keep working regardless
of which sub-engine is active, and neither participate in nor block
a switch.

MS-R1 impact (aarch64, 12c, 256 conns, 8s per cell):

  celeris-adaptive + celerisredis ASYNC=1: 64,206 -> 77,592 (+20.8%)
  celeris-adaptive + celerispg    ASYNC=1: 41,864 -> 63,217 (+51.0%)
  celeris-adaptive + celerismc    ASYNC=1: 54,959 -> 89,912 (+63.6%)

Adaptive now matches celeris-epoll's async numbers within a few
percent across the driver matrix.
The test failed intermittently on ubuntu-latest with the message
\"WriteAndPoll returned ok=false; sync fast path not engaged\". Root
cause: the worker goroutine's epoll_wait could observe the POLLIN
edge from the pre-staged peer write and consume the 4 bytes via
handleReadable before WriteAndPoll took recvMu and masked EPOLLIN.
handleReadable called the registered onRecv (a no-op in the old
test), leaving WriteAndPoll's phases all returning EAGAIN.

The race is real and legitimate — the worker consuming data on an
EPOLLIN edge before the caller's WriteAndPoll arrives is expected
behavior, not a bug. What the test is actually asserting is that
the data round-trips correctly under the sync fast-path design,
not that Phase A specifically wins every race.

Change the RegisterConn callback from a discard to an append into
the same buffer WriteAndPoll would populate, so both paths (worker
consumed OR WriteAndPoll consumed) deliver into `got`. Poll for
\"pong\" for up to 100ms after WriteAndPoll returns so the worker-
dispatch case has time to complete.

20/20 passes on Linux aarch64 under -race after the fix.
Addresses the issues from the honest review:

1. #240 — Panic recover in async dispatch goroutine.
   runAsyncHandler in epoll + iouring now wraps its loop body in
   defer recover(). A panicking user handler no longer crashes the
   entire server. Logs stack trace, marks the conn closed, and
   force-closes the fd so the worker's close path tears down state
   from its own goroutine. Symmetric to routerAdapter's sync-path
   safety net.

2. asyncInBuf DoS cap (maxPendingInputBytes = 4 MiB).
   A client pipelining requests faster than the dispatch goroutine
   can drain them would otherwise grow asyncInBuf without bound.
   Symmetric with the existing maxPendingBytes cap on the output
   side; drainRead closes the conn when the append would exceed
   the cap. Applied in both epoll and iouring.

3. #241 — PG direct-mode COPY cancel no longer orphans the request.
   copyFrom / copyTo direct-mode paths now route final-wait
   ctx.Done through awaitDirectWithCancel, which sends CancelRequest
   and waits bounded (30s) for the server's Error+RFQ before
   returning. Without this, a canceled COPY left req in the pending
   queue and the next query would pop it — wire-format desync.

4. Direct-mode result buffer cap (maxDirectResultBytes = 64 MiB).
   Direct mode pins syncMode=true so streaming never promotes.
   A huge SELECT would buffer every row in req.rowSlab. Now fails
   with ErrResultTooBig once accumulated bytes cross the cap;
   caller gets a typed error with actionable remediation
   (paginate with LIMIT/OFFSET or use non-async pool for streaming).

5. PG LISTEN/UNLISTEN/NOTIFY guarded in direct mode.
   Direct-mode conns have no background reader between queries,
   so NotificationResponse messages would be silently dropped.
   simpleQuery / simpleExec / simpleExecNoTag now detect these
   statements and return ErrDirectModeUnsupported with a clear
   workaround hint. Added isListenOrUnlisten() helper for
   prefix detection with whitespace + comment skip.

6. H2 + AsyncHandlers: documentation-level warning at engine start.
   When both flags are set, engines now log that async dispatch is
   HTTP/1.1-only; H2 conns still run inline on the worker. No
   behavior change, but surfaces the limitation instead of a silent
   per-conn-type inconsistency.

7. Backpressure end-to-end test skeleton (skipped by default).
   Loopback TCP auto-tuning on Linux makes deterministically
   exercising the maxPendingBytes path hard without sysctl tuning.
   Test body documents the shape and runs under
   GOTEST_BACKPRESSURE=1. The defensive paths are exercised in
   production via Autobahn 9.1.6 (WS 16 MiB frames through the
   detached 64 MiB cap).

All 62 packages pass go test -race on Linux aarch64 (msr1).
startTestEngine waits for workers to be ready before returning. The
3s deadline was tight; on GitHub Actions' shared Azure VMs io_uring
ring setup (NewRingCPU, SINGLE_ISSUER init, NUMA bind, SQPOLL thread
creation) can legitimately take 3-5s when the host is loaded. The
older flake showed 'engine did not start in time' as a single-run
false-positive that passed on rerun.

15s covers the tail without hiding real failures — if a worker
actually fails to initialize, it'll show up immediately via the
error channel, not by timing out on the startup check.
1. PG Describe-skip: hasDescribe gate was `len(cachedCols) == 0`,
   which treated zero-column prepared statements as "no cache" and
   defeated the optimization. Change to `cachedCols == nil`.

2. iouring drainDetachQueue now checks asyncClosed and calls
   closeConn. The error/panic path in runAsyncHandler already
   enqueues cs + sets asyncClosed, but drainDetachQueue only called
   markDirty — the FD/connState stayed zombie until the next
   handleRecv. Matching fix enqueued in epoll's drainDetachQueue.

3. epoll async error + panic paths no longer call unix.Close from
   the dispatch goroutine. That raced with the worker's drainRead
   holding l.conns[fd]. Replaced with asyncClosed + detachQueue
   enqueue + eventfd signal; worker's drainDetachQueue picks up
   the teardown on its own goroutine.

4. Graceful shutdown now joins dispatch goroutines. Added
   asyncWG sync.WaitGroup on both Loop (epoll) and Worker
   (iouring); Add on spawn, Done via defer in runAsyncHandler,
   Wait at the tail of the engine shutdown. Prevents dispatch Gs
   from touching connState after the engine claims to have stopped.

5. CVE-2023-44487 Rapid Reset mitigated in H2. Processor now
   tracks RST_STREAM count in a sliding one-second window; a
   sustained burst > rstBurstMax (200) triggers GOAWAY with
   ENHANCE_YOUR_CALM and closes the connection. Honest clients
   reset a handful of streams per second; 200/s is well above
   legitimate patterns and well below the thousands/s needed to
   amplify the attack.

6. H1 MaxHeaderSize reduced 16 MiB -> 64 KiB (nginx-class default),
   and new MaxHeaderCount = 200 rejects thousands-of-tiny-headers
   DoS that would stay under the byte cap. 64 KiB covers verbose
   proxy chains; 16 MiB was a slow-loris amplifier. New sentinel
   ErrTooManyHeaders; existing ErrHeadersTooLarge unchanged.
H2 hardening:
- HPACK decoder now enforces SetMaxStringLength(64 KiB), matching H1
  MaxHeaderSize. Prevents a single HEADERS frame from growing the
  decode target unboundedly.
- Framer initial max read size 16 KiB (RFC 9113 default) instead of
  hard-coded 1 MiB; new Parser.SetMaxReadFrameSize method so the
  processor can apply negotiated SETTINGS_MAX_FRAME_SIZE.
- PRIORITY frames rejected when stream ID is >2048 past last client
  stream — prevents unbounded priority-tree growth via phantom-
  stream flood.

Redis driver:
- Cluster refreshTopology now fully resets slots/replicas maps at
  the start of every refresh. Previously per-range reset left stale
  replicas for slots that dropped out, and replica append accumulated
  across overlapping range entries (resharding window).
- MOVED redirect on attempt > 0 now also refreshes topology (was
  skipped), eliminating the MOVED-loop-until-background-tick bug.
- Cluster redirect loop bounds from attempt-based range to explicit
  maxAttempts = MaxRedirects+1 so the documented redirect count is
  honored (was off by one).
- Sentinel reconnect no longer appends to sentinelConns unboundedly;
  stale entries are closed and the slice is capped at one entry.

Postgres driver:
- ParseRowDescriptionInto rejects column count > 1600 (PG's own
  MaxHeapAttributeNumber). Previously a server-supplied int16 up to
  32767 forced multi-MB allocations per RowDescription.
- dropPreparedAsync tracked via pgConn.closeWG; Close() joins the
  WaitGroup so the background DEALLOCATE G cannot outlive the conn.
  Early-exit if c.closed is already set.

Redis RESP:
- readBulk's dead rewind removed — Next() already rewinds to
  pre-tag on ErrIncomplete; readBulk's post-tag rewind was redundant
  and its comment was misleading.

Server lifecycle:
- StartWithContext / StartWithListenerAndContext no longer leak the
  shutdown-watcher goroutine on Listen error. Added listenDone chan
  the main flow closes; watcher selects on ctx.Done || listenDone.
- Engine.Shutdown docs for epoll/iouring now correctly state that
  shutdown is ctx-driven (context cancel → Listen returns → worker
  shutdown runs asyncWG.Wait), not something Shutdown() does itself.
Summary of defensive hardening and polish swept in the final
v1.4.0 pre-tag pass.

Postgres driver
- Defer dropPreparedAsync goroutines through closeWG so Close() waits
  for background DEALLOCATEs rather than racing with pgConn teardown.
- Replace `len(cachedCols) == 0` with nil-check in doExtendedQuery so
  cached-but-empty RowDescriptions correctly skip the Describe step.
- Bound all bare <-req.doneCh waits in COPY error paths with a 30s
  awaitDoneBounded closure to prevent hung CopyIn/CopyOut unwinds.
- Case-insensitive, word-boundary SQL keyword detection for
  isListenOrUnlisten + isCacheableQuery (hasKeywordPrefix helper)
  to avoid false positives on column names like `selected`.
- Move ErrDirectModeUnsupported from pool.go to errors.go alongside
  the other exported sentinels; prefix all scan convertTo errors with
  "celeris-postgres: scan: " for consistency.
- dsn.go now warns to stderr on sslmode=prefer/allow (previously
  silent downgrade to plaintext) so operators see the change.
- protocol/scram.go: enforce RFC 7677 minimum iteration count (4096)
  and zero saltedPassword/authMessage/clientFirst/serverFirst/
  serverKey/serverSig/password after handleServerFinal.
- protocol/query.go: reject RowDescription with >1600 columns (PG's
  MAX_TUPLE_ATTR) to guard against malformed server input.

Redis driver
- cluster.refreshTopology now fully resets slots/replicas at start;
  MOVED always refreshes topology (not only attempt==0) so stale
  routes don't persist through redirect storms.
- Explicit maxAttempts = MaxRedirects+1 loop replaces range form
  after removing the attempt-gated refresh branch.
- sentinel.subscribeLoop closes stale sentinelConns and caps the
  slice to a single live entry to prevent conn leaks on reconnect.
- commands.asStringSlice/asStringMap return ErrNil on TyNull
  (previously nil, which callers couldn't distinguish from "empty").
- protocol/resp.go: drop dead rewind code in readBulk.

Memcached driver
- protocol/text.parseUint overflow check uses (maxU64-digit)/10
  to detect the last-digit overflow case without false negatives.

Error-prefix consistency
- Normalize all user-facing error prefixes in driver/redis and
  driver/memcached from "celeris/redis:" / "celeris/memcached:"
  slash form to "celeris-redis:" / "celeris-memcached:" hyphen
  form matching driver/postgres. No test strings assert on the
  old prefix. Internal packages (async/pool, eventloop) keep
  slash form since they're not user-facing.

Engine async dispatch
- engine/epoll/loop.go: runAsyncHandler now panic-recovers and
  signals the worker via detachQueue + eventfd instead of calling
  unix.Close(cs.fd) from the handler goroutine (cross-thread FD
  close was racy against io_uring SQE submission).
- Graceful shutdown awaits asyncWG so detached handler goroutines
  finish before Shutdown() returns.

Config / server
- ReadTimeout and WriteTimeout defaults: 300s → 60s (slow-loris
  hardening; matches nginx client_header_timeout / client_body_timeout).
- Validate() flags Listener + explicit Addr conflict, but only when
  Addr has a concrete non-zero port — `:0` stays valid since callers
  intentionally delegate port selection to the pre-bound listener.
- server.go: Version constant bumped to "1.4.0" (was stuck at "1.3.4").

celeristest
- WithCookie godoc now explicitly notes no escaping of semicolons
  or CR/LF in value; tests needing malformed cookie headers should
  use WithHeader directly.

Test fixes
- resource/config_test.go TestWithDefaults now expects 60s
  ReadTimeout/WriteTimeout matching the new defaults.
@FumingPower3925 FumingPower3925 merged commit 5cb0f9b into main Apr 19, 2026
51 of 52 checks passed
@FumingPower3925 FumingPower3925 deleted the feat/v1.4.0-drivers-and-h2c branch April 19, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant