Skip to content

khaledxdls/AvroStream-JS

Repository files navigation

AvroStream JS

Transparent Avro binary transport layer for JavaScript. Drop-in fetch and WebSocket replacement that serializes JSON as compact Avro binary on the wire — ~50% smaller payloads with zero DX friction.

Features

  • Transparent Fetch Wrapper — pass plain objects, get plain objects back. Binary encoding happens under the hood.
  • Automatic Schema Inference — no upfront schema definitions needed. Inferred, hashed, and cached automatically.
  • 406 Schema Negotiation — if the server loses its schema cache, the client retries with the full schema. Invisible to the caller.
  • WebSocket Support — binary framing with message-type multiplexing over a single socket; optional auto-reconnect with exponential backoff.
  • Streaming Decoder — decode 100K+ record responses via for await without loading everything into RAM.
  • Offline Queue — PWA-ready IndexedDB queue that flushes binary payloads when connectivity returns.
  • Debug Mode — human-readable console output with exact byte-savings metrics.
  • Metrics HookonMetrics callback for telemetry pipelines, fires on every encode/decode regardless of debug flag.
  • CLI Tool (avro-gen) — pre-compile TypeScript interfaces into a schema manifest at build time.

Install

npm install avrostream-js

Quick Start

import { AvroClient } from 'avrostream-js';

const client = new AvroClient({
  endpoint: 'https://api.example.com',
  debug: true,
  autoInfer: true,
});

// Binary request — looks identical to using fetch with JSON
const user = await client.fetch('/users', {
  method: 'POST',
  body: { name: 'Alice', role: 'Admin' },
});

console.log(user); // { name: 'Alice', role: 'Admin' }

Streaming Large Datasets

const stream = await client.streamFetch('/large-dataset');

for await (const record of stream) {
  process(record); // Memory stays flat — records decoded one by one
}

WebSocket

const socket = client.connectSocket('wss://api.example.com');

socket.send('UpdateLocation', { lat: 34.05, lon: -118.24 });

socket.on('NewMessage', (msg) => {
  console.log(msg.text);
});

Auto-reconnect with exponential backoff:

const socket = client.connectSocket('wss://api.example.com', {
  reconnect: true,
  reconnectOptions: {
    maxAttempts: 10,      // -1 for infinite
    initialDelayMs: 500,
    maxDelayMs: 30_000,
    jitter: true,
  },
});

Demo

A runnable demo is included under demo/ to exercise the full API surface — HTTP, WebSocket, schema negotiation, metrics, and reconnection — as a developer would using only this README.

Start the server:

node demo/server.mjs

Run the Node.js client:

node demo/client.mjs

Or open the web UI at http://localhost:3000 for an interactive browser demo.

What the demo covers

Section What it exercises
1. Pre-compiled schemas AvroClient with schemas map, debug: true, POST/GET with automatic Avro encoding
2. onMetrics telemetry onMetrics callback collecting byte-savings per request
3. 406 schema negotiation autoInfer: true, server responds 406, client retries with full schema inline
4. WebSocket chat connectSocket(), socket.send(), socket.on('message') with binary framing
5. Reconnect config reconnect: true with reconnectOptions (backoff, jitter, max attempts)
6. Registry introspection client.registry.size

Sample output

[AvroStream] >>> REQUEST /api/users (User)
  Payload: { id: 0, name: 'Diana', email: 'diana@example.com', role: 'editor' }
  Size: 32 bytes (Avro) vs 67 bytes (JSON) — saved 35 bytes (52.2%)

[AvroStream] >>> REQUEST /api/orders (Order)
  Size: 27 bytes (Avro) vs 92 bytes (JSON) — saved 65 bytes (70.7%)

[AvroStream] >>> REQUEST ws://ChatMessage (ChatMessage)
  Size: 41 bytes (Avro) vs 80 bytes (JSON) — saved 39 bytes (48.8%)

Server-side integration

The demo server (demo/server.mjs) shows how to integrate AvroStream on the server side with Express:

import { SchemaRegistry, encode, decode, frameForWire, parseWireFrame } from 'avrostream-js';

const registry = new SchemaRegistry();
registry.register(UserSchema, '/api/users');

// Middleware: decode Avro requests, encode Avro responses
app.post('/api/users', avroMiddleware('/api/users'), (req, res) => {
  const body = req.avroBody || req.body; // Avro-decoded or JSON fallback
  // ... handle request ...
  res.avro(responseData); // Sends Avro if client accepts it, JSON otherwise
});

The server handles both standard frames (0x01) and schema-inline frames (0x02 from 406 retries), and responds with X-Avro-Missing-Schema: true to trigger client-side schema negotiation.

Pre-compiled Schemas (CLI)

npx avro-gen --input src/types --output avro-manifest.json
import manifest from './avro-manifest.json';

const client = new AvroClient({
  endpoint: 'https://api.example.com',
  schemas: manifest,
});

Configuration

Option Type Default Description
endpoint string Base URL for HTTP requests
debug boolean false Log decoded payloads and byte-savings to console
autoInfer boolean true Generate schemas from objects when not registered
offline boolean false Queue requests in IndexedDB when offline
schemas object Pre-compiled schema manifest
fetch function Custom fetch implementation (for tests/polyfills)
inference object { maxDepth: 32, maxNodes: 50000 } Runtime inference guardrails for large payloads
networkListener NetworkListener env default Inject custom online/offline detection strategy
onMetrics (m: DebugMetrics) => void Telemetry callback — fires on every encode/decode regardless of debug flag
registryMaxSize number 0 (unlimited) Max schemas in registry before LRU eviction kicks in

Production Guidance

  • For large/deep payloads, prefer precompiled schemas via avro-gen to bypass synchronous runtime inference.
  • In long-lived server processes with dynamic schemas, set registryMaxSize to bound memory growth (e.g. 1000).
  • Browser and Node.js connectivity checks are abstracted behind NetworkListener; inject your own strategy for custom runtimes.
  • Schema fingerprints use the Avro Parsing Canonical Form, ensuring consistent fingerprints regardless of JS object key order and compatibility with other Avro implementations.

Schema Pipeline Benchmark

Measures inferSchema(), fingerprint(), avsc.Type.forSchema(), and registry round-trip latency across flat and nested object shapes:

npm run bench:schema

Release-grade profile (more rounds):

npm run bench:schema:release

Controls:

  • ROUNDS measured rounds (default 10)
  • WARMUP_ROUNDS warmup rounds (default 5)
  • OUTPUT_DIR artifacts path (default benchmark-results/schema/latest)

Artifacts:

Internals Micro-Benchmark

Measures the overhead of internal safety/correctness features so the tradeoffs are quantified:

npm run bench:internals

Release-grade profile:

npm run bench:internals:release

What it measures:

  • encode() circular-ref check: overhead of assertNoCircularRefs vs skipCircularCheck=true
  • LRU registry bookkeeping: getByFingerprint with maxSize set vs unlimited
  • Canonical fingerprint: Avro Parsing Canonical Form vs raw JSON.stringify

Artifacts:

Latest baseline:

Feature Overhead Context
Circular-ref check +18.4% encode Use encode(entry, obj, true) to skip on trusted input
LRU bookkeeping +35.7% lookup Still 1.98M ops/s — negligible in practice
Canonical form +49.6% fingerprint One-time cost at register(), not on hot encode/decode path

Stream Decoder Benchmark

Use the stream decoder micro-benchmark to compare parsing strategies:

npm run bench:stream

Optional tuning variables:

  • RECORD_COUNT (default 100000)
  • PAYLOAD_BYTES (default 128)
  • CHUNK_MIN / CHUNK_MAX (defaults 256 / 4096)
  • ITERATIONS (default 7)

Avro vs JSON Benchmark

Run a deterministic, multi-scenario benchmark comparing Avro encode/decode against JSON stringify/parse:

npm run bench:avro-vs-json

For tighter memory stability (manual GC between scenarios):

npm run bench:avro-vs-json:gc

For release-grade validation (larger scenarios + more rounds):

npm run bench:avro-vs-json:release

Benchmark controls:

  • SCENARIOS CSV record counts (default 5000,20000,50000)
  • WARMUP_ROUNDS (default 3)
  • ROUNDS measured rounds (default 8)

Methodology notes:

  • Uses a fixed schema and deterministic record generator (no random per-run shape drift).
  • Reports median, p95, p99, stddev, throughput, and payload size reduction.
  • Validates Avro and JSON roundtrip correctness on sampled records before reporting.

Latest full benchmark artifacts (generated by npm run bench:avro-vs-json:gc):

Release-profile artifacts (generated by npm run bench:avro-vs-json:release):

Current baseline summary:

Records Encode (Avro faster) Decode (Avro faster) Size Reduction
5,000 22.96% 19.17% 59.54%
20,000 24.82% 28.48% 59.54%
50,000 58.47% 17.77% 59.54%

These values are from the latest recorded run in this repository and are hardware/runtime dependent.

Benchmark Interpretation

Pure codec: Avro encode is consistently faster than JSON.stringify (up to 2.3x at 100k records) and produces ~60% smaller payloads. Decode trades blows with JSON.parse depending on batch size and GC pressure.

E2E (HTTP/WS): Under release-grade load, Avro wins on both throughput and latency even on localhost — +54% WS throughput, +36% HTTP throughput, +15-27% faster median latency. Combined with ~50% smaller payloads, Avro dominates in high-concurrency scenarios.

When Avro wins:

  • Bandwidth-constrained paths (mobile, edge, inter-region, metered connections).
  • High-volume pipelines where ~50% smaller payloads compound into significant savings.
  • Structured, repetitive records with high key-name overhead in JSON.
  • Bulk/streaming workloads where codec speed dominates over per-request overhead.

When JSON is the better choice:

  • Localhost or same-datacenter with sub-millisecond RTT where CPU overhead exceeds transfer savings.
  • Systems requiring human-readable payloads in logs without tooling.
  • Prototyping or low-volume APIs where schema management isn't worth it.

How to read benchmark outputs:

  • Throughput delta > 0 means Avro handles more requests/messages per second.
  • Median latency delta > 0 means Avro is faster; < 0 means Avro is slower (expected on localhost).
  • Payload bytes delta shows bandwidth savings — Avro's primary value proposition.
  • Use release profiles for publish decisions, not quick smoke runs.

Consolidated dashboard:

Generate/update dashboard:

npm run bench:dashboard

Real Client/Server E2E Benchmark

This benchmark runs a real local HTTP server and real client requests (web-style interaction) to compare JSON vs Avro end-to-end behavior, including serialization, transport framing, parsing, and latency.

npm run bench:e2e:web

Release-grade profile:

npm run bench:e2e:web:release

Controls:

  • REQUESTS requests per mode (default 5000)
  • WARMUP warmup requests per mode (default 300)
  • CONCURRENCY concurrent in-flight requests (default 32)
  • HOST / PORT (defaults 127.0.0.1 / 43110)
  • OUTPUT_DIR artifacts path (default benchmark-results/e2e-web/latest)

Artifacts:

WebSocket E2E Benchmark

This benchmark uses a real local WebSocket server and compares JSON string messages vs Avro-framed messages using AvroSocket.

npm run bench:e2e:ws

Release-grade profile:

npm run bench:e2e:ws:release

Controls:

  • REQUESTS messages per mode (default 6000)
  • WARMUP warmup messages per mode (default 600)
  • CONCURRENCY in-flight messages (default 64)
  • HOST / PORT (defaults 127.0.0.1 / 43120)
  • OUTPUT_DIR artifacts path (default benchmark-results/e2e-ws/latest)

Artifacts:

Release artifacts:

Release baseline (REQUESTS=20000, WARMUP=2000, CONCURRENCY=128):

Metric Result
Throughput delta (Avro vs JSON) +54.41%
Median latency delta (Avro vs JSON) +15.68%
Request payload bytes delta -52.21%
Response payload bytes delta -39.74%

Server-to-Server Benchmark

This profile measures Node service-to-service HTTP interaction (JSON vs Avro) using the same real request path but with higher default throughput settings.

npm run bench:s2s

Release-grade profile:

npm run bench:s2s:release

Artifacts:

Release artifacts:

Release baseline (REQUESTS=12000, WARMUP=1200, CONCURRENCY=96):

Metric Result
Throughput delta (Avro vs JSON) +15.23%
Median latency delta (Avro vs JSON) +6.79%
Request payload bytes delta -49.03%
Response payload bytes delta -56.59%

Error Handling

All errors extend AvroStreamError:

import {
  AvroCircularReferenceError,
  SchemaValidationError,
  SchemaNotFoundError,
  SchemaNegotiationError,
  CodecError,
  InferenceError,
} from 'avrostream-js';

Wire Format (v0.1)

Every HTTP payload is framed as:

[1 byte: version (0x01)][8 bytes: CRC-64 schema fingerprint][N bytes: Avro binary data]

On a 406 schema-negotiation retry, the client sends the full schema inline:

[1 byte: version (0x02)][4 bytes: schema JSON length][schema JSON][8 bytes: fingerprint][data]

WebSocket frames add a message-type prefix:

[1 byte: version (0x01)][1 byte: type-length][N bytes: UTF-8 type string][8 bytes: fingerprint][data]

Streaming responses use a header + chunked record format:

Header:  [1 byte: version (0x01)][8 bytes: fingerprint]
Records: [4 bytes: record length (big-endian)][N bytes: Avro data] ... repeating

The leading version byte reserves space for future wire-format evolution without a breaking change. parseWireFrame rejects unknown versions and enforces that schema-inline frames (0x02) are only handled by the transport layer.

License

MIT

About

AvroStream JS is a lightweight, zero-dependency JavaScript library designed to act as a transparent transport layer. It automatically converts plain JavaScript objects into highly efficient Apache Avro binary payloads before sending them over the network (via HTTPS, WebSockets, or Streams) and decodes them on the receiving end.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors