Releases: jaredboynton/warpsock
v4.2.7
Added
- HTTP/1 WebSocket builders support opt-in
permessage-deflatenegotiation through.permessage_deflate(), including RSV1 compression and decompression for negotiated data messages. Node exposes this asWebSocketBuilder.permessageDeflate(), and Python exposes it asWebSocketBuilder.permessage_deflate().
Fixed
- RFC 8441 and RFC 9220 raw WebSocket CONNECT builders allow
Sec-WebSocket-Extensionsnegotiation metadata for caller-managed tunnel frame extensions. - Python async
RequestBuilder.body_stream().send()keeps the Rust response-body driver alive after headers return by pumping streaming body frames into a Python-owned channel, preventing Linux wheel smoke failures when iteratingresponse.body.
v4.2.5
Complete Warpsock rename release.\n\n- Rust crate: warpsock 4.2.5.\n- npm packages: warpsock and warpsock-* 4.2.5.\n- PyPI package: warpsock 4.2.5.\n- Deprecated bridge packages: specters 4.2.5 for crates.io/PyPI, and old npm specters names deprecated in favor of Warpsock.\n- Python release wheel smoke now runs from a clean temp directory; Node/Python release jobs disable Cargo HTTP multiplexing to avoid transient crates.io HTTP/2 failures.
v4.2.3
Changed
-
README "Local native HTTP/3 vs Rust H3 clients" section and the benchmark README now publish the canonical GET-only ledger repeat gate result: two consecutive fail-closed gates at
ba356d7in which Specter's worst rep beats every comparator's per-metric best rep on p50/p95 TTFB, ledger-paced throughput, and p50/p95 ledger-paced tail (gate 1 worst rep 37.6 us p50 TTFB with 1.0 us p50 / 7.2 us p95 ledger tail; gate 2 confirms at 32.5 us and 2.9 / 7.4 us). This retires the 2026-06-05 same-process matrix that had tokio-quiche leading loopback GET TTFB. Causes: the deferred boundary-ACK send, the GET-burst drain ordering, and the single-copy 1-RTT datagram decode (entries below), plus bench-harness client-process pinning to cores 4-11 for all five clients, which removed scheduler placement luck from both sides of the worst-vs-best comparison. Evidence:docs/benchmarks/native-h3-vs-rust-clients/2026-06-09-direct-get-clientpin-clean-gate/and-r2/. -
Native HTTP/3 1-RTT receive path now caches one AES-128-GCM AEAD context per key epoch (
EVP_AEAD_CTX) and reuses it for every datagram open, instead of constructing a freshboring::symm::Crypterper packet. The per-packet path redid the AES key schedule and PMULL GHASH H-table precompute for a key constant across the epoch; a microbench put that setup at ~half of the ~0.83us/datagram open on Graviton4. A same-session A/B at n=100 (verified byEVP_AEAD_CTXsymbol count 0->11) cut the GET p95 ledger-paced tail from ~16.5us to ~13.7us median across every quantile, moving Specter from losing to beating tokio_quiche (median 13.6us vs 16.2us) on that workload at the gate-relevant n. Receive-side decrypt only: no wire byte, frame cadence, or fingerprint change, and the seal->open round-trip stays byte-identical (full suite 999 passed). Evidence:docs/benchmarks/native-h3-vs-rust-clients/2026-06-09-aead-context-cache-n100/. -
Native HTTP/3 1-RTT send path now reuses the same per-epoch AEAD context for sealing that the receive path uses for opening, instead of constructing a fresh
boring::symm::Crypterper sent packet. The sharedEVP_AEAD_CTX(built once per write-key epoch,seal_packet_payload_intowrites ciphertext||tag in one pass into asplit_at_mutoutput slice) drops the per-packet AES key schedule + GHASH H-table rebuild from every sealed packet, including the ACKs a client seals during a GET body. A same-session A/B at n=100 cut the GET p95 ledger-paced tail a further ~1.8us (open-only ~14.4us median -> open+seal ~12.6us); against tokio_quiche the same session, Specter's worst p95 (12.4us) now beats tokio's best (15.1us) outright, where the receive-only cache had overlapped on worst-vs-best. Encrypt output is byte-identical AES-128-GCM: no wire byte, frame cadence, or fingerprint change. Evidence:docs/benchmarks/native-h3-vs-rust-clients/2026-06-09-seal-context-cache-getack/. -
Native HTTP/3 benchmark fixture stream responses now use absolute 1ms chunk cadence instead of rescheduling each chunk from the post-send wall clock. This removes fixture scheduler drift from the GET throughput cell and is recorded as a benchmark-truth improvement, not a superiority claim; the strict repeat gate still must pass before README performance claims are upgraded.
-
Native HTTP/3 GET repeat-gate runs now build the benchmark binary once and reuse it for every repeat child run, tightening same-binary fairness while reducing awsdev iteration time.
-
Native HTTP/3 selected-row truth gates now require fixture-ledger capture for publishable and GET-only comparisons, so missing ledger provenance cannot fall back to nominal paced-tail metrics and produce a false pass.
-
Native HTTP/3 DPLPMTUD path-MTU probes are now deferred off the active RFC 9220 tunnel critical path. A full-size probe is a build + AEAD seal +
send_to; the prior code emitted it inline on the tunnel recv->send turn, so the two probes a connection sends while binary-searching the path MTU (around echo 13 and 22 at the fixture's ~16KB receive window) landed as ~100us p99 spikes on the proxied echo round-trip. A flow-control hypothesis was falsified by experiment (suppressing MAX_DATA emission left the spikes intact); gating the probe send removed both.send_client_pmtu_probe_if_availablenow waits until the tunnel has been quiescent forPMTU_TUNNEL_IDLE_GAP(2ms) before probing (RFC 8899 Section 5.2: probes are low priority); GET/streaming connections with no open tunnel probe immediately as before. On awsdev (Graviton4, quiet, 8 reps, n=100) this drops the per-run spike count 2->0, collapses the tunnel echo p99 from ~103us to ~43us, and moves the echo p95 tail from a loss to a non-overlapping win vs tokio_quiche (40.3us vs 51.2us median; Specter worst 43.3 < tokio best 48.3). Echo p50 and throughput stay parity at the 1KB single-frame payload. Combined with the per-epoch AEAD context cache, Specter now holds the lower p95 tail on the echo, client-DATA+FIN close, and slow-consumer mixed tunnel workloads, reversing the 4.2.1 withdrawal rationale. No wire byte or fingerprint change: probe packets are unchanged and still sent, only their scheduling moves off the interactive path. Full suite 999 passed. Evidence:docs/benchmarks/native-h3-vs-rust-clients/2026-06-09-pmtu-probe-tunnel-defer/. -
Native HTTP/3 direct-GET epoch loop now drains the entire ready burst before any wire maintenance and keeps one pinned timer across loop passes. Three coordinated scheduling changes: boundary ACKs are sealed inline at their exact ack-eliciting-threshold crossings during the burst drain (the caller-side threshold clamp that chopped multi-quantum bursts into drain/flush round trips is gone; sealed packets join the existing deferred-send queue, so wire bytes and ACK cadence are unchanged); the parked-wake select arm drains the rest of the burst immediately after the waking datagram instead of running the full maintenance pass between the first datagram and the FIN-bearing remainder; and the two per-iteration tokio sleep futures are replaced by one pinned sleep re-armed only when the min of the delayed-ACK and loss-detection deadlines changes, removing timer-wheel register/deregister churn from every park/wake cycle. Worst-rep tails at the shipping io-epoch profile (n=100 x 4 reps): p50 ledger-paced tail 4.28us -> 2.58us with three of four reps at 0, p95 11.61us -> 7.85us; TTFB and throughput unchanged. Receive-side scheduling only: no wire byte or fingerprint change. Full suite 1004 passed. Evidence: docs/benchmarks/native-h3-vs-rust-clients/2026-06-09-direct-get-burst-ordering-scout/.
-
Native HTTP/3 1-RTT receive path now decodes each datagram with one payload copy instead of three. The header-protection unmask copies only the up-to-21-byte protected prefix into the reusable scratch buffer instead of refilling the full ~1.3KB ciphertext (the AEAD reads the payload region from the original datagram buffer in place); QUIC frame decode consumes the already-owned plaintext Bytes directly via a new decode_frames_bytes, so STREAM data becomes refcounted slices of the packet allocation instead of a second full copy plus malloc per datagram; and the post-decode padding filter retains in place instead of collecting a second frame Vec. Scout at the shipping io-epoch profile (n=100 x 4 reps): specter per-rep p50 ledger-paced tail 0/0/803/0 ns (was 0/0/0/2582 before this change, with other same-day gates rolling 2143-3441 ns worst reps) and p95 3804/3252/6746/2638 ns (was worst 7124-9368); worst-rep p50 now sits below the best reqwest_h3 rep ever observed on this host. TTFB p50 31.1-33.4us across reps. Decrypt output and wire bytes are unchanged: receive-side memory layout only, no fingerprint change. Full suite 1004 passed. Evidence: docs/benchmarks/native-h3-vs-rust-clients/2026-06-09-direct-get-copychain-scout/.
-
Pruned superseded benchmark artifacts and refreshed benchmark documentation to current state:
docs/benchmarks/drops from 98MB to 17MB, keeping the 2026-06-09 evidence chain, the 2026-06-03 combined capture and streaming/websocket reps, and the 2026-05-25 transport-baseline regression fixture. Artifact paths cited by earlier changelog entries may refer to pruned files; current claims and their evidence live inREADME.md,docs/benchmarks/native-h3-vs-rust-clients/README.md, anddocs/specter-native-h3-remaining-seams.md.
Fixed
- Native HTTP/3 direct-GET epoch loop now defers only the send syscall of boundary ACKs past the current drain phase. The ACK packet is still built and sealed at its normal ack-eliciting-threshold boundary (packet number consumed, tracker marked, bytes and cadence identical on the wire); its dispatch moves from between the threshold drain-stop and the FIN-bearing remainder of the burst to the next flush, the pre-park drain, or the sample exit, whichever comes first. This removes an AEAD seal + sendmsg from the measured window between a response body's last datagram arriving and completion being observed, which the fixture-ledger gate charges to the client tail. A/B at n=100 x4 reps cut the GET worst-rep p95 ledger-paced tail from 14.5us to 10.6us with TTFB and throughput unchanged. Applies to the shipping io-epoch path; real clients see response bodies complete sooner by the same margin.
- Native HTTP/3 stream reassembly now buffers out-of-order STREAM segments (RFC 9000 Section 2.2) instead of discarding everything ahead of the contiguous edge. The old in-order-only guard meant one reordered datagram froze the stream: every later segment including the FIN was dropped while its packet number was still ACKed, so the peer never retransmitted and the native H3 GET hung until idle timeout under as little as 200us of path latency (netem repro: 0/8 completions before, 8/8 after), on both the shipping driver path and the bench epoch path. Masked at RTT~0 because the loopback fixture delivers in order. The fix tracks per-stream pending segments and final size, comp...
v4.2.2
Changed
- Republished the crate, npm, and PyPI packages so the bundled README cites the reachable benchmark-provenance commit
25395a8. The 4.2.1 packages shipped a README citing26d5a78, a pre-cherry-pick commit unreachable frommain; this release corrects that citation in the published artifacts. Documentation-only release with no library, API, or benchmarked behavior change from 4.2.1.
v4.2.1
Changed
- HTTP/2 warm streaming path now borrows the cookie helpers, dropping a per-request allocation on the cookie-bearing path (commit
26d5a78). A/B measurement on a quiet host showed this neutral on TTFB and throughput; it is allocation hygiene with no measured latency or throughput effect. - Renamed the streaming benchmark metric from
ttfttottfb(time-to-first-byte) across the benchmark harness, threshold test, JSON artifact schema keys, and binding keywords. The transport benchmark measures time-to-first-byte; the LLM-facing binding descriptions keep "TTFT" (time-to-first-token) because lower transport time-to-first-byte is what produces the faster first token when proxying a model's response stream. - Re-baselined all published streaming, native-HTTP/3, and WebSocket benchmark numbers (README and
docs/benchmarks/) to a single quiet AWS Graviton4 host so every competitor comparison is measured on the same machine, replacing the prior mix of Mac-sourced runs. Library request/response behavior is unchanged from 4.2.0; the figures move with the environment. Streaming clears every gate by wide margins with paired Wilcoxon p underflowing to zero at n=100 (H2 request-body TTFB and throughput rose versus the prior Mac figures; H2 response-body TTFB and H1 response-body throughput fell). The native HTTP/3 superiority gate passes against quiche, tokio-quiche, h3-quinn, and reqwest_h3. WebSocket loopback message-rate is parity with fastwebsockets and tokio-tungstenite inside run-to-run variance.
Removed
- Withdrew the RFC 9220 WebSocket-over-H3 tunnel superiority claim from the README. On the Graviton4 host the full-suite gate does not pass: Specter leads p50 TTFB and throughput on all three tunnel workloads and wins the slow-consumer mixed workload outright, while tokio-quiche holds a lower p95 tail on the echo and client-DATA+FIN workloads. The measured result is recorded in
docs/benchmarks/native-h3-vs-rust-clients/README.md.
v4.2.0
Added
- gRPC plumbing primitive behind the
grpcCargo feature: HTTP/2 response trailers surfaced throughResponse::trailers()(three-state contract: present, clean-end-absent, stream-reset error), a length-prefixed message framing codec (GrpcFramer) with zero-copy on the contained-message path and per-message gzip support, and a fingerprint-safegrpc_requestconstructor that emitsPOST /pkg.Service/Methodwithcontent-type: application/grpc+protoandte: trailerswithout altering the wire fingerprint. - gRPC surface exposed in the Node and Python bindings (
grpc_requestbuilder and asynctrailers()accessor), with both binding crates enabling thegrpcfeature.
Changed
- HTTP/2 trailer delivery uses a per-stream side channel allocated only when a request opts into trailers (
te: trailers), keeping the existing response-streaming hot path byte-identical and allocation-neutral when gRPC is unused.
Fixed
- Moved Python
SyncClientonto native PyO3 classes with a client-owned Tokio runtime and GIL release around blocking sends while keeping the pure-Python compatibility wrapper for older local extensions.
v4.1.10
Fixed
- Moved Python
SyncClientonto native PyO3 classes with a client-owned Tokio runtime and GIL release around blocking sends while keeping the pure-Python compatibility wrapper for older local extensions.
v4.1.9
Added
- Added Python
SyncClient/SyncRequestBuilder/SyncResponsewrappers for no-event-loop HTTP calls, plus an explicitAsyncClientalias for the existing async client.
Fixed
- Corrected Python docs and type stubs so sync HTTP is the default PyPI quickstart and async response helper annotations match runtime behavior.
v4.1.8
Fixed
- Added HTTP/2 padded-DATA zstd exact-body regression coverage and shipped a synchronized crates.io, npm, PyPI, and GitHub release after 4.1.7 only reached crates.io.
- Restored current-toolchain compilation in the TLS handshake poll path.
v4.1.6
Fixed
- Restored the non-streaming
send()contract by buffering transport streaming bodies before returning, soResponse::text(),bytes(), andjson()work for normal H1/H2/H3 requests. - Made default request futures spawnable on multithreaded Tokio runtimes by avoiding non-
Syncrequest/response borrows across awaits.