Context
The MS-R1 matrix comparing sync vs async dispatch across every handler shape revealed a clean split:
| Workload |
Sync (AsyncHandlers=false) |
Async (=true) |
| Pure-CPU (plaintext, JSON, params, body) |
+30-33% RPS |
baseline |
| Chain middleware |
+23-26% RPS |
baseline |
| DB-integrated (celerisredis/pg/mc) |
baseline |
+30-60% RPS |
| 3rd-party drivers (goredis, pgx, gomc) |
baseline |
+30-200% RPS |
Numbers (celeris-iouring, 12c ARM64, 256 conns, 8s per cell):
- /plaintext: 428,582 (sync) vs 287,838 (async) — sync +48.9%
- celerismc: 65,265 (sync) vs 101,902 (async) — async +56.1%
- goredis: 24,623 (sync) vs 75,416 (async) — async +206%
Real apps mix: some routes hit a DB, others return canned responses or small computed results. Today the choice is all-or-nothing per server via Config.AsyncHandlers. That forces a user with 80% plaintext routes + 20% DB routes to either lose 30% on plaintext or lose 2x on DB routes.
Proposal
Per-route opt-in / opt-out of async dispatch. Rough API surface:
- Server default stays at Config.AsyncHandlers (whatever the user configured).
- Per-route: Route.Async(true/false) overrides the default for that specific handler chain.
- Per-group: RouteGroup.Async(true/false) inherits to its children.
- Defaults: all routes inherit the server default unless explicitly overridden.
Usage sketch:
```go
srv := celeris.New(celeris.Config{Engine: celeris.Epoll})
srv.GET("/ping", pingHandler) // sync, inherits server default
srv.GET("/users/:id", userHandler).Async() // async, this route only
api := srv.Group("/api").Async() // async for all /api/*
api.GET("/products", productHandler) // async
api.GET("/cached", cachedHandler).Async(false) // opt out of async
```
Engine implementation questions (spike scope)
-
Dispatch decision point. The engine currently decides sync vs async per-connection at drainRead time based on Loop.async. Per-route means we decide per-request after routing. Options:
- Always spawn a goroutine for the HTTP1 path, but have the dispatch goroutine check the route's async flag and either (a) dispatch to a worker goroutine if route is async, or (b) run inline if sync. Adds one extra goroutine spawn to compensate.
- Peek the request path early in drainRead (before ProcessH1), look up the route, decide dispatch path. Breaks encapsulation.
- Always-async mode: dispatch every request to a goroutine, but have the goroutine check route flag and call-and-return vs chain. The "sync" case becomes goroutine-does-everything. Close to net/http's model.
-
Middleware inheritance. If a route is async but inherits middleware from a sync group, which mode wins? Proposal: most-specific wins.
-
WebSocket / detached handlers. Detach implies async by construction. Per-route flag can't affect detached flows.
-
H2 conns. Today H2 always runs inline. Per-route async on H2 needs additional design — may be out of scope for v1.5.
Open questions for spike
- Is the 1.5-2µs async-dispatch cost reducible to a point where we can just make everything async? pprof points at sync.Cond + goroutine wake — can any of this be optimized to <500ns per request?
- Does routing-before-dispatch introduce enough overhead that the sync path loses its advantage anyway?
- What's the right default for .Async()? Opt-in or opt-out?
Exit criteria
- Design doc proposing an API and dispatch model.
- Prototype showing a mixed-workload benchmark (pure-CPU + DB routes on same server) outperforming both "all-sync" and "all-async" configs by at least 10% aggregate.
- Decision on whether to ship in v1.5.0 or punt to v1.6.0 based on prototype results.
Related
Context
The MS-R1 matrix comparing sync vs async dispatch across every handler shape revealed a clean split:
Numbers (celeris-iouring, 12c ARM64, 256 conns, 8s per cell):
Real apps mix: some routes hit a DB, others return canned responses or small computed results. Today the choice is all-or-nothing per server via Config.AsyncHandlers. That forces a user with 80% plaintext routes + 20% DB routes to either lose 30% on plaintext or lose 2x on DB routes.
Proposal
Per-route opt-in / opt-out of async dispatch. Rough API surface:
Usage sketch:
```go
srv := celeris.New(celeris.Config{Engine: celeris.Epoll})
srv.GET("/ping", pingHandler) // sync, inherits server default
srv.GET("/users/:id", userHandler).Async() // async, this route only
api := srv.Group("/api").Async() // async for all /api/*
api.GET("/products", productHandler) // async
api.GET("/cached", cachedHandler).Async(false) // opt out of async
```
Engine implementation questions (spike scope)
Dispatch decision point. The engine currently decides sync vs async per-connection at drainRead time based on Loop.async. Per-route means we decide per-request after routing. Options:
Middleware inheritance. If a route is async but inherits middleware from a sync group, which mode wins? Proposal: most-specific wins.
WebSocket / detached handlers. Detach implies async by construction. Per-route flag can't affect detached flows.
H2 conns. Today H2 always runs inline. Per-route async on H2 needs additional design — may be out of scope for v1.5.
Open questions for spike
Exit criteria
Related