Transparent Avro binary transport layer for JavaScript. Drop-in fetch and WebSocket replacement that serializes JSON as compact Avro binary on the wire — ~50% smaller payloads with zero DX friction.
- Transparent Fetch Wrapper — pass plain objects, get plain objects back. Binary encoding happens under the hood.
- Automatic Schema Inference — no upfront schema definitions needed. Inferred, hashed, and cached automatically.
- 406 Schema Negotiation — if the server loses its schema cache, the client retries with the full schema. Invisible to the caller.
- WebSocket Support — binary framing with message-type multiplexing over a single socket; optional auto-reconnect with exponential backoff.
- Streaming Decoder — decode 100K+ record responses via
for awaitwithout loading everything into RAM. - Offline Queue — PWA-ready IndexedDB queue that flushes binary payloads when connectivity returns.
- Debug Mode — human-readable console output with exact byte-savings metrics.
- Metrics Hook —
onMetricscallback for telemetry pipelines, fires on every encode/decode regardless of debug flag. - CLI Tool (
avro-gen) — pre-compile TypeScript interfaces into a schema manifest at build time.
npm install avrostream-jsimport { AvroClient } from 'avrostream-js';
const client = new AvroClient({
endpoint: 'https://api.example.com',
debug: true,
autoInfer: true,
});
// Binary request — looks identical to using fetch with JSON
const user = await client.fetch('/users', {
method: 'POST',
body: { name: 'Alice', role: 'Admin' },
});
console.log(user); // { name: 'Alice', role: 'Admin' }const stream = await client.streamFetch('/large-dataset');
for await (const record of stream) {
process(record); // Memory stays flat — records decoded one by one
}const socket = client.connectSocket('wss://api.example.com');
socket.send('UpdateLocation', { lat: 34.05, lon: -118.24 });
socket.on('NewMessage', (msg) => {
console.log(msg.text);
});Auto-reconnect with exponential backoff:
const socket = client.connectSocket('wss://api.example.com', {
reconnect: true,
reconnectOptions: {
maxAttempts: 10, // -1 for infinite
initialDelayMs: 500,
maxDelayMs: 30_000,
jitter: true,
},
});A runnable demo is included under demo/ to exercise the full API surface — HTTP, WebSocket, schema negotiation, metrics, and reconnection — as a developer would using only this README.
Start the server:
node demo/server.mjsRun the Node.js client:
node demo/client.mjsOr open the web UI at http://localhost:3000 for an interactive browser demo.
| Section | What it exercises |
|---|---|
| 1. Pre-compiled schemas | AvroClient with schemas map, debug: true, POST/GET with automatic Avro encoding |
| 2. onMetrics telemetry | onMetrics callback collecting byte-savings per request |
| 3. 406 schema negotiation | autoInfer: true, server responds 406, client retries with full schema inline |
| 4. WebSocket chat | connectSocket(), socket.send(), socket.on('message') with binary framing |
| 5. Reconnect config | reconnect: true with reconnectOptions (backoff, jitter, max attempts) |
| 6. Registry introspection | client.registry.size |
[AvroStream] >>> REQUEST /api/users (User)
Payload: { id: 0, name: 'Diana', email: 'diana@example.com', role: 'editor' }
Size: 32 bytes (Avro) vs 67 bytes (JSON) — saved 35 bytes (52.2%)
[AvroStream] >>> REQUEST /api/orders (Order)
Size: 27 bytes (Avro) vs 92 bytes (JSON) — saved 65 bytes (70.7%)
[AvroStream] >>> REQUEST ws://ChatMessage (ChatMessage)
Size: 41 bytes (Avro) vs 80 bytes (JSON) — saved 39 bytes (48.8%)
The demo server (demo/server.mjs) shows how to integrate AvroStream on the server side with Express:
import { SchemaRegistry, encode, decode, frameForWire, parseWireFrame } from 'avrostream-js';
const registry = new SchemaRegistry();
registry.register(UserSchema, '/api/users');
// Middleware: decode Avro requests, encode Avro responses
app.post('/api/users', avroMiddleware('/api/users'), (req, res) => {
const body = req.avroBody || req.body; // Avro-decoded or JSON fallback
// ... handle request ...
res.avro(responseData); // Sends Avro if client accepts it, JSON otherwise
});The server handles both standard frames (0x01) and schema-inline frames (0x02 from 406 retries), and responds with X-Avro-Missing-Schema: true to trigger client-side schema negotiation.
npx avro-gen --input src/types --output avro-manifest.jsonimport manifest from './avro-manifest.json';
const client = new AvroClient({
endpoint: 'https://api.example.com',
schemas: manifest,
});| Option | Type | Default | Description |
|---|---|---|---|
endpoint |
string |
— | Base URL for HTTP requests |
debug |
boolean |
false |
Log decoded payloads and byte-savings to console |
autoInfer |
boolean |
true |
Generate schemas from objects when not registered |
offline |
boolean |
false |
Queue requests in IndexedDB when offline |
schemas |
object |
— | Pre-compiled schema manifest |
fetch |
function |
— | Custom fetch implementation (for tests/polyfills) |
inference |
object |
{ maxDepth: 32, maxNodes: 50000 } |
Runtime inference guardrails for large payloads |
networkListener |
NetworkListener |
env default | Inject custom online/offline detection strategy |
onMetrics |
(m: DebugMetrics) => void |
— | Telemetry callback — fires on every encode/decode regardless of debug flag |
registryMaxSize |
number |
0 (unlimited) |
Max schemas in registry before LRU eviction kicks in |
- For large/deep payloads, prefer precompiled schemas via
avro-gento bypass synchronous runtime inference. - In long-lived server processes with dynamic schemas, set
registryMaxSizeto bound memory growth (e.g.1000). - Browser and Node.js connectivity checks are abstracted behind
NetworkListener; inject your own strategy for custom runtimes. - Schema fingerprints use the Avro Parsing Canonical Form, ensuring consistent fingerprints regardless of JS object key order and compatibility with other Avro implementations.
Measures inferSchema(), fingerprint(), avsc.Type.forSchema(), and registry round-trip latency across flat and nested object shapes:
npm run bench:schemaRelease-grade profile (more rounds):
npm run bench:schema:releaseControls:
ROUNDSmeasured rounds (default10)WARMUP_ROUNDSwarmup rounds (default5)OUTPUT_DIRartifacts path (defaultbenchmark-results/schema/latest)
Artifacts:
- benchmark-results/schema/latest/latest.log
- benchmark-results/schema/latest/latest.json
- benchmark-results/schema/latest/latest.csv
- benchmark-results/schema/latest/latest.md
Measures the overhead of internal safety/correctness features so the tradeoffs are quantified:
npm run bench:internalsRelease-grade profile:
npm run bench:internals:releaseWhat it measures:
encode()circular-ref check: overhead ofassertNoCircularRefsvsskipCircularCheck=true- LRU registry bookkeeping:
getByFingerprintwithmaxSizeset vs unlimited - Canonical fingerprint: Avro Parsing Canonical Form vs raw
JSON.stringify
Artifacts:
- benchmark-results/internals/latest/latest.log
- benchmark-results/internals/latest/latest.json
- benchmark-results/internals/latest/latest.csv
- benchmark-results/internals/latest/latest.md
Latest baseline:
| Feature | Overhead | Context |
|---|---|---|
| Circular-ref check | +18.4% encode | Use encode(entry, obj, true) to skip on trusted input |
| LRU bookkeeping | +35.7% lookup | Still 1.98M ops/s — negligible in practice |
| Canonical form | +49.6% fingerprint | One-time cost at register(), not on hot encode/decode path |
Use the stream decoder micro-benchmark to compare parsing strategies:
npm run bench:streamOptional tuning variables:
RECORD_COUNT(default100000)PAYLOAD_BYTES(default128)CHUNK_MIN/CHUNK_MAX(defaults256/4096)ITERATIONS(default7)
Run a deterministic, multi-scenario benchmark comparing Avro encode/decode against JSON stringify/parse:
npm run bench:avro-vs-jsonFor tighter memory stability (manual GC between scenarios):
npm run bench:avro-vs-json:gcFor release-grade validation (larger scenarios + more rounds):
npm run bench:avro-vs-json:releaseBenchmark controls:
SCENARIOSCSV record counts (default5000,20000,50000)WARMUP_ROUNDS(default3)ROUNDSmeasured rounds (default8)
Methodology notes:
- Uses a fixed schema and deterministic record generator (no random per-run shape drift).
- Reports median, p95, p99, stddev, throughput, and payload size reduction.
- Validates Avro and JSON roundtrip correctness on sampled records before reporting.
Latest full benchmark artifacts (generated by npm run bench:avro-vs-json:gc):
- Full console report: benchmark-results/avro-vs-json/latest/latest.log
- Machine-readable JSON: benchmark-results/avro-vs-json/latest/latest.json
- Spreadsheet-friendly CSV: benchmark-results/avro-vs-json/latest/latest.csv
- Markdown summary report: benchmark-results/avro-vs-json/latest/latest.md
Release-profile artifacts (generated by npm run bench:avro-vs-json:release):
- Full console report: benchmark-results/avro-vs-json/release/latest.log
- Machine-readable JSON: benchmark-results/avro-vs-json/release/latest.json
- Spreadsheet-friendly CSV: benchmark-results/avro-vs-json/release/latest.csv
- Markdown summary report: benchmark-results/avro-vs-json/release/latest.md
Current baseline summary:
| Records | Encode (Avro faster) | Decode (Avro faster) | Size Reduction |
|---|---|---|---|
| 5,000 | 22.96% | 19.17% | 59.54% |
| 20,000 | 24.82% | 28.48% | 59.54% |
| 50,000 | 58.47% | 17.77% | 59.54% |
These values are from the latest recorded run in this repository and are hardware/runtime dependent.
Pure codec: Avro encode is consistently faster than JSON.stringify (up to 2.3x at 100k records) and produces ~60% smaller payloads. Decode trades blows with JSON.parse depending on batch size and GC pressure.
E2E (HTTP/WS): Under release-grade load, Avro wins on both throughput and latency even on localhost — +54% WS throughput, +36% HTTP throughput, +15-27% faster median latency. Combined with ~50% smaller payloads, Avro dominates in high-concurrency scenarios.
When Avro wins:
- Bandwidth-constrained paths (mobile, edge, inter-region, metered connections).
- High-volume pipelines where ~50% smaller payloads compound into significant savings.
- Structured, repetitive records with high key-name overhead in JSON.
- Bulk/streaming workloads where codec speed dominates over per-request overhead.
When JSON is the better choice:
- Localhost or same-datacenter with sub-millisecond RTT where CPU overhead exceeds transfer savings.
- Systems requiring human-readable payloads in logs without tooling.
- Prototyping or low-volume APIs where schema management isn't worth it.
How to read benchmark outputs:
Throughput delta> 0 means Avro handles more requests/messages per second.Median latency delta> 0 means Avro is faster; < 0 means Avro is slower (expected on localhost).Payload bytes deltashows bandwidth savings — Avro's primary value proposition.- Use release profiles for publish decisions, not quick smoke runs.
Consolidated dashboard:
Generate/update dashboard:
npm run bench:dashboardThis benchmark runs a real local HTTP server and real client requests (web-style interaction) to compare JSON vs Avro end-to-end behavior, including serialization, transport framing, parsing, and latency.
npm run bench:e2e:webRelease-grade profile:
npm run bench:e2e:web:releaseControls:
REQUESTSrequests per mode (default5000)WARMUPwarmup requests per mode (default300)CONCURRENCYconcurrent in-flight requests (default32)HOST/PORT(defaults127.0.0.1/43110)OUTPUT_DIRartifacts path (defaultbenchmark-results/e2e-web/latest)
Artifacts:
- benchmark-results/e2e-web/latest/latest.log
- benchmark-results/e2e-web/latest/latest.json
- benchmark-results/e2e-web/latest/latest.csv
- benchmark-results/e2e-web/latest/latest.md
This benchmark uses a real local WebSocket server and compares JSON string messages vs Avro-framed messages using AvroSocket.
npm run bench:e2e:wsRelease-grade profile:
npm run bench:e2e:ws:releaseControls:
REQUESTSmessages per mode (default6000)WARMUPwarmup messages per mode (default600)CONCURRENCYin-flight messages (default64)HOST/PORT(defaults127.0.0.1/43120)OUTPUT_DIRartifacts path (defaultbenchmark-results/e2e-ws/latest)
Artifacts:
- benchmark-results/e2e-ws/latest/latest.log
- benchmark-results/e2e-ws/latest/latest.json
- benchmark-results/e2e-ws/latest/latest.csv
- benchmark-results/e2e-ws/latest/latest.md
Release artifacts:
- benchmark-results/e2e-ws/release/latest.log
- benchmark-results/e2e-ws/release/latest.json
- benchmark-results/e2e-ws/release/latest.csv
- benchmark-results/e2e-ws/release/latest.md
Release baseline (REQUESTS=20000, WARMUP=2000, CONCURRENCY=128):
| Metric | Result |
|---|---|
| Throughput delta (Avro vs JSON) | +54.41% |
| Median latency delta (Avro vs JSON) | +15.68% |
| Request payload bytes delta | -52.21% |
| Response payload bytes delta | -39.74% |
This profile measures Node service-to-service HTTP interaction (JSON vs Avro) using the same real request path but with higher default throughput settings.
npm run bench:s2sRelease-grade profile:
npm run bench:s2s:releaseArtifacts:
- benchmark-results/s2s/latest/latest.log
- benchmark-results/s2s/latest/latest.json
- benchmark-results/s2s/latest/latest.csv
- benchmark-results/s2s/latest/latest.md
Release artifacts:
- benchmark-results/s2s/release/latest.log
- benchmark-results/s2s/release/latest.json
- benchmark-results/s2s/release/latest.csv
- benchmark-results/s2s/release/latest.md
Release baseline (REQUESTS=12000, WARMUP=1200, CONCURRENCY=96):
| Metric | Result |
|---|---|
| Throughput delta (Avro vs JSON) | +15.23% |
| Median latency delta (Avro vs JSON) | +6.79% |
| Request payload bytes delta | -49.03% |
| Response payload bytes delta | -56.59% |
All errors extend AvroStreamError:
import {
AvroCircularReferenceError,
SchemaValidationError,
SchemaNotFoundError,
SchemaNegotiationError,
CodecError,
InferenceError,
} from 'avrostream-js';Every HTTP payload is framed as:
[1 byte: version (0x01)][8 bytes: CRC-64 schema fingerprint][N bytes: Avro binary data]
On a 406 schema-negotiation retry, the client sends the full schema inline:
[1 byte: version (0x02)][4 bytes: schema JSON length][schema JSON][8 bytes: fingerprint][data]
WebSocket frames add a message-type prefix:
[1 byte: version (0x01)][1 byte: type-length][N bytes: UTF-8 type string][8 bytes: fingerprint][data]
Streaming responses use a header + chunked record format:
Header: [1 byte: version (0x01)][8 bytes: fingerprint]
Records: [4 bytes: record length (big-endian)][N bytes: Avro data] ... repeating
The leading version byte reserves space for future wire-format evolution without a breaking change. parseWireFrame rejects unknown versions and enforces that schema-inline frames (0x02) are only handled by the transport layer.
MIT