AvroStream JS

Transparent Avro binary transport layer for JavaScript. Drop-in fetch and WebSocket replacement that serializes JSON as compact Avro binary on the wire — ~50% smaller payloads with zero DX friction.

Features

Transparent Fetch Wrapper — pass plain objects, get plain objects back. Binary encoding happens under the hood.
Automatic Schema Inference — no upfront schema definitions needed. Inferred, hashed, and cached automatically.
406 Schema Negotiation — if the server loses its schema cache, the client retries with the full schema. Invisible to the caller.
WebSocket Support — binary framing with message-type multiplexing over a single socket; optional auto-reconnect with exponential backoff.
Streaming Decoder — decode 100K+ record responses via for await without loading everything into RAM.
Offline Queue — PWA-ready IndexedDB queue that flushes binary payloads when connectivity returns.
Debug Mode — human-readable console output with exact byte-savings metrics.
Metrics Hook — onMetrics callback for telemetry pipelines, fires on every encode/decode regardless of debug flag.
CLI Tool (avro-gen) — pre-compile TypeScript interfaces into a schema manifest at build time.

Install

npm install avrostream-js

Quick Start

import { AvroClient } from 'avrostream-js';

const client = new AvroClient({
  endpoint: 'https://api.example.com',
  debug: true,
  autoInfer: true,
});

// Binary request — looks identical to using fetch with JSON
const user = await client.fetch('/users', {
  method: 'POST',
  body: { name: 'Alice', role: 'Admin' },
});

console.log(user); // { name: 'Alice', role: 'Admin' }

Streaming Large Datasets

const stream = await client.streamFetch('/large-dataset');

for await (const record of stream) {
  process(record); // Memory stays flat — records decoded one by one
}

WebSocket

const socket = client.connectSocket('wss://api.example.com');

socket.send('UpdateLocation', { lat: 34.05, lon: -118.24 });

socket.on('NewMessage', (msg) => {
  console.log(msg.text);
});

Auto-reconnect with exponential backoff:

const socket = client.connectSocket('wss://api.example.com', {
  reconnect: true,
  reconnectOptions: {
    maxAttempts: 10,      // -1 for infinite
    initialDelayMs: 500,
    maxDelayMs: 30_000,
    jitter: true,
  },
});

Demo

A runnable demo is included under demo/ to exercise the full API surface — HTTP, WebSocket, schema negotiation, metrics, and reconnection — as a developer would using only this README.

Start the server:

node demo/server.mjs

Run the Node.js client:

node demo/client.mjs

Or open the web UI at http://localhost:3000 for an interactive browser demo.

What the demo covers

Section	What it exercises
1. Pre-compiled schemas	`AvroClient` with `schemas` map, `debug: true`, POST/GET with automatic Avro encoding
2. onMetrics telemetry	`onMetrics` callback collecting byte-savings per request
3. 406 schema negotiation	`autoInfer: true`, server responds 406, client retries with full schema inline
4. WebSocket chat	`connectSocket()`, `socket.send()`, `socket.on('message')` with binary framing
5. Reconnect config	`reconnect: true` with `reconnectOptions` (backoff, jitter, max attempts)
6. Registry introspection	`client.registry.size`

Sample output

[AvroStream] >>> REQUEST /api/users (User)
  Payload: { id: 0, name: 'Diana', email: 'diana@example.com', role: 'editor' }
  Size: 32 bytes (Avro) vs 67 bytes (JSON) — saved 35 bytes (52.2%)

[AvroStream] >>> REQUEST /api/orders (Order)
  Size: 27 bytes (Avro) vs 92 bytes (JSON) — saved 65 bytes (70.7%)

[AvroStream] >>> REQUEST ws://ChatMessage (ChatMessage)
  Size: 41 bytes (Avro) vs 80 bytes (JSON) — saved 39 bytes (48.8%)

Server-side integration

The demo server (demo/server.mjs) shows how to integrate AvroStream on the server side with Express:

import { SchemaRegistry, encode, decode, frameForWire, parseWireFrame } from 'avrostream-js';

const registry = new SchemaRegistry();
registry.register(UserSchema, '/api/users');

// Middleware: decode Avro requests, encode Avro responses
app.post('/api/users', avroMiddleware('/api/users'), (req, res) => {
  const body = req.avroBody || req.body; // Avro-decoded or JSON fallback
  // ... handle request ...
  res.avro(responseData); // Sends Avro if client accepts it, JSON otherwise
});

The server handles both standard frames (0x01) and schema-inline frames (0x02 from 406 retries), and responds with X-Avro-Missing-Schema: true to trigger client-side schema negotiation.

Pre-compiled Schemas (CLI)

npx avro-gen --input src/types --output avro-manifest.json

import manifest from './avro-manifest.json';

const client = new AvroClient({
  endpoint: 'https://api.example.com',
  schemas: manifest,
});

Configuration

Option	Type	Default	Description
`endpoint`	`string`	—	Base URL for HTTP requests
`debug`	`boolean`	`false`	Log decoded payloads and byte-savings to console
`autoInfer`	`boolean`	`true`	Generate schemas from objects when not registered
`offline`	`boolean`	`false`	Queue requests in IndexedDB when offline
`schemas`	`object`	—	Pre-compiled schema manifest
`fetch`	`function`	—	Custom fetch implementation (for tests/polyfills)
`inference`	`object`	`{ maxDepth: 32, maxNodes: 50000 }`	Runtime inference guardrails for large payloads
`networkListener`	`NetworkListener`	env default	Inject custom online/offline detection strategy
`onMetrics`	`(m: DebugMetrics) => void`	—	Telemetry callback — fires on every encode/decode regardless of `debug` flag
`registryMaxSize`	`number`	`0` (unlimited)	Max schemas in registry before LRU eviction kicks in

Production Guidance

For large/deep payloads, prefer precompiled schemas via avro-gen to bypass synchronous runtime inference.
In long-lived server processes with dynamic schemas, set registryMaxSize to bound memory growth (e.g. 1000).
Browser and Node.js connectivity checks are abstracted behind NetworkListener; inject your own strategy for custom runtimes.
Schema fingerprints use the Avro Parsing Canonical Form, ensuring consistent fingerprints regardless of JS object key order and compatibility with other Avro implementations.

Schema Pipeline Benchmark

Measures inferSchema(), fingerprint(), avsc.Type.forSchema(), and registry round-trip latency across flat and nested object shapes:

npm run bench:schema

Release-grade profile (more rounds):

npm run bench:schema:release

Controls:

ROUNDS measured rounds (default 10)
WARMUP_ROUNDS warmup rounds (default 5)
OUTPUT_DIR artifacts path (default benchmark-results/schema/latest)

Artifacts:

Internals Micro-Benchmark

Measures the overhead of internal safety/correctness features so the tradeoffs are quantified:

npm run bench:internals

Release-grade profile:

npm run bench:internals:release

What it measures:

encode() circular-ref check: overhead of assertNoCircularRefs vs skipCircularCheck=true
LRU registry bookkeeping: getByFingerprint with maxSize set vs unlimited
Canonical fingerprint: Avro Parsing Canonical Form vs raw JSON.stringify

Artifacts:

Latest baseline:

Feature	Overhead	Context
Circular-ref check	+18.4% encode	Use `encode(entry, obj, true)` to skip on trusted input
LRU bookkeeping	+35.7% lookup	Still 1.98M ops/s — negligible in practice
Canonical form	+49.6% fingerprint	One-time cost at `register()`, not on hot encode/decode path

Stream Decoder Benchmark

Use the stream decoder micro-benchmark to compare parsing strategies:

npm run bench:stream

Optional tuning variables:

RECORD_COUNT (default 100000)
PAYLOAD_BYTES (default 128)
CHUNK_MIN / CHUNK_MAX (defaults 256 / 4096)
ITERATIONS (default 7)

Avro vs JSON Benchmark

Run a deterministic, multi-scenario benchmark comparing Avro encode/decode against JSON stringify/parse:

npm run bench:avro-vs-json

For tighter memory stability (manual GC between scenarios):

npm run bench:avro-vs-json:gc

For release-grade validation (larger scenarios + more rounds):

npm run bench:avro-vs-json:release

Benchmark controls:

SCENARIOS CSV record counts (default 5000,20000,50000)
WARMUP_ROUNDS (default 3)
ROUNDS measured rounds (default 8)

Methodology notes:

Uses a fixed schema and deterministic record generator (no random per-run shape drift).
Reports median, p95, p99, stddev, throughput, and payload size reduction.
Validates Avro and JSON roundtrip correctness on sampled records before reporting.

Latest full benchmark artifacts (generated by npm run bench:avro-vs-json:gc):

Full console report: benchmark-results/avro-vs-json/latest/latest.log
Machine-readable JSON: benchmark-results/avro-vs-json/latest/latest.json
Spreadsheet-friendly CSV: benchmark-results/avro-vs-json/latest/latest.csv
Markdown summary report: benchmark-results/avro-vs-json/latest/latest.md

Release-profile artifacts (generated by npm run bench:avro-vs-json:release):

Full console report: benchmark-results/avro-vs-json/release/latest.log
Machine-readable JSON: benchmark-results/avro-vs-json/release/latest.json
Spreadsheet-friendly CSV: benchmark-results/avro-vs-json/release/latest.csv
Markdown summary report: benchmark-results/avro-vs-json/release/latest.md

Current baseline summary:

Records	Encode (Avro faster)	Decode (Avro faster)	Size Reduction
5,000	22.96%	19.17%	59.54%
20,000	24.82%	28.48%	59.54%
50,000	58.47%	17.77%	59.54%

These values are from the latest recorded run in this repository and are hardware/runtime dependent.

Benchmark Interpretation

Pure codec: Avro encode is consistently faster than JSON.stringify (up to 2.3x at 100k records) and produces ~60% smaller payloads. Decode trades blows with JSON.parse depending on batch size and GC pressure.

E2E (HTTP/WS): Under release-grade load, Avro wins on both throughput and latency even on localhost — +54% WS throughput, +36% HTTP throughput, +15-27% faster median latency. Combined with ~50% smaller payloads, Avro dominates in high-concurrency scenarios.

When Avro wins:

Bandwidth-constrained paths (mobile, edge, inter-region, metered connections).
High-volume pipelines where ~50% smaller payloads compound into significant savings.
Structured, repetitive records with high key-name overhead in JSON.
Bulk/streaming workloads where codec speed dominates over per-request overhead.

When JSON is the better choice:

Localhost or same-datacenter with sub-millisecond RTT where CPU overhead exceeds transfer savings.
Systems requiring human-readable payloads in logs without tooling.
Prototyping or low-volume APIs where schema management isn't worth it.

How to read benchmark outputs:

Throughput delta > 0 means Avro handles more requests/messages per second.
Median latency delta > 0 means Avro is faster; < 0 means Avro is slower (expected on localhost).
Payload bytes delta shows bandwidth savings — Avro's primary value proposition.
Use release profiles for publish decisions, not quick smoke runs.

Consolidated dashboard:

benchmark-results/benchmark-dashboard.md

Generate/update dashboard:

npm run bench:dashboard

Real Client/Server E2E Benchmark

This benchmark runs a real local HTTP server and real client requests (web-style interaction) to compare JSON vs Avro end-to-end behavior, including serialization, transport framing, parsing, and latency.

npm run bench:e2e:web

Release-grade profile:

npm run bench:e2e:web:release

Controls:

REQUESTS requests per mode (default 5000)
WARMUP warmup requests per mode (default 300)
CONCURRENCY concurrent in-flight requests (default 32)
HOST / PORT (defaults 127.0.0.1 / 43110)
OUTPUT_DIR artifacts path (default benchmark-results/e2e-web/latest)

Artifacts:

WebSocket E2E Benchmark

This benchmark uses a real local WebSocket server and compares JSON string messages vs Avro-framed messages using AvroSocket.

npm run bench:e2e:ws

Release-grade profile:

npm run bench:e2e:ws:release

Controls:

REQUESTS messages per mode (default 6000)
WARMUP warmup messages per mode (default 600)
CONCURRENCY in-flight messages (default 64)
HOST / PORT (defaults 127.0.0.1 / 43120)
OUTPUT_DIR artifacts path (default benchmark-results/e2e-ws/latest)

Artifacts:

Release artifacts:

Release baseline (REQUESTS=20000, WARMUP=2000, CONCURRENCY=128):

Metric	Result
Throughput delta (Avro vs JSON)	+54.41%
Median latency delta (Avro vs JSON)	+15.68%
Request payload bytes delta	-52.21%
Response payload bytes delta	-39.74%

Server-to-Server Benchmark

This profile measures Node service-to-service HTTP interaction (JSON vs Avro) using the same real request path but with higher default throughput settings.

npm run bench:s2s

Release-grade profile:

npm run bench:s2s:release

Artifacts:

Release artifacts:

Release baseline (REQUESTS=12000, WARMUP=1200, CONCURRENCY=96):

Metric	Result
Throughput delta (Avro vs JSON)	+15.23%
Median latency delta (Avro vs JSON)	+6.79%
Request payload bytes delta	-49.03%
Response payload bytes delta	-56.59%

Error Handling

All errors extend AvroStreamError:

import {
  AvroCircularReferenceError,
  SchemaValidationError,
  SchemaNotFoundError,
  SchemaNegotiationError,
  CodecError,
  InferenceError,
} from 'avrostream-js';

Wire Format (v0.1)

Every HTTP payload is framed as:

[1 byte: version (0x01)][8 bytes: CRC-64 schema fingerprint][N bytes: Avro binary data]

On a 406 schema-negotiation retry, the client sends the full schema inline:

[1 byte: version (0x02)][4 bytes: schema JSON length][schema JSON][8 bytes: fingerprint][data]

WebSocket frames add a message-type prefix:

[1 byte: version (0x01)][1 byte: type-length][N bytes: UTF-8 type string][8 bytes: fingerprint][data]

Streaming responses use a header + chunked record format:

Header:  [1 byte: version (0x01)][8 bytes: fingerprint]
Records: [4 bytes: record length (big-endian)][N bytes: Avro data] ... repeating

The leading version byte reserves space for future wire-format evolution without a breaking change. parseWireFrame rejects unknown versions and enforces that schema-inline frames (0x02) are only handled by the transport layer.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
benchmark-results		benchmark-results
demo		demo
scripts		scripts
src		src
.codex		.codex
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AvroStream JS

Features

Install

Quick Start

Streaming Large Datasets

WebSocket

Demo

What the demo covers

Sample output

Server-side integration

Pre-compiled Schemas (CLI)

Configuration

Production Guidance

Schema Pipeline Benchmark

Internals Micro-Benchmark

Stream Decoder Benchmark

Avro vs JSON Benchmark

Benchmark Interpretation

Real Client/Server E2E Benchmark

WebSocket E2E Benchmark

Server-to-Server Benchmark

Error Handling

Wire Format (v0.1)

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AvroStream JS

Features

Install

Quick Start

Streaming Large Datasets

WebSocket

Demo

What the demo covers

Sample output

Server-side integration

Pre-compiled Schemas (CLI)

Configuration

Production Guidance

Schema Pipeline Benchmark

Internals Micro-Benchmark

Stream Decoder Benchmark

Avro vs JSON Benchmark

Benchmark Interpretation

Real Client/Server E2E Benchmark

WebSocket E2E Benchmark

Server-to-Server Benchmark

Error Handling

Wire Format (v0.1)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages