Turn audio into a self-describing spectrogram image — and reconstruct the audio back from that image. Wavegram is a two-tab single-page app that runs entirely in your browser: no server, no uploads, no network calls. The output is an ordinary PNG that carries everything needed to rebuild the sound inside its own pixels.
It's built for the voice-message use case (Opus/OGG voice notes, 16 kHz mono), but works on any audio file up to 60 seconds.
┌──────────────────────────────────────────────┐
│ Rows 0–15 Metadata header │ 16px, 8×8 B&W squares
│ magic · version · params · CRC │
├──────────────────────────────────────────────┤
│ Rows 16+ Spectrogram │
│ X = time → │
│ Y = frequency (low at bottom) ↑ │
│ brightness/colour = magnitude │
└──────────────────────────────────────────────┘
Audio → Image (Forward tab) decode (Web Audio) → mono mixdown → resample to 16 kHz → peak-normalize → cap at 60 s → STFT with a Hann window (FFT 1024 / hop 512) → log-magnitude spectrogram → write the 16-px metadata header → export PNG.
Image → Audio (Backward tab) read the PNG → reject lossy formats → validate magic bytes + CRC-16 → display the decoded metadata → recover linear magnitudes → Griffin-Lim phase recovery in a Web Worker → overlap-add → trim to the original sample count → download WAV.
Phase is discarded in the forward direction and recovered probabilistically (Griffin-Lim) on the way back, so the reconstruction is faithful but not bit-exact — that's inherent to a magnitude-only spectrogram, not a bug.
Every reconstruction parameter lives in the image, so any Wavegram PNG decodes with no external state. The header is a 140-bit structure rendered as 8×8 black/white squares (two rows, 70 squares each) and protected by a CRC-16:
| Field | Bits | Notes |
|---|---|---|
| Magic | 32 | 0x41554456 ("AUDV") |
| Version | 4 | schema version |
| Precision flag | 1 | reserved bit 0 — see below |
| Reserved | 1 | spare |
| Sample rate | 24 | e.g. 16000 |
| FFT size | 11 | e.g. 1024 |
| Hop size | 11 | e.g. 512 |
| Sample count | 32 | for exact trim on reconstruction |
| Channels | 8 | 0x01 = mono |
| CRC-16 | 16 | CCITT, over all preceding bits |
The FFT/hop fields are 11 bits each (the original 10-bit spec couldn't hold 1024). The two extra bits come from the reserved field, keeping the header at exactly 140 bits.
Magnitude precision is header-flagged:
- 16-bit (default) — each magnitude packed across the R (high byte) and G (low byte) channels. Much cleaner reconstruction; this is the recommended mode.
- 8-bit grayscale — the spec-literal single-channel encoding, smaller but with an audible quantization floor.
PNG only. JPEG/WebP and other lossy formats are rejected by signature at the decoder, because their compression corrupts the magnitude values.
bun install
bun run dev # http://localhost:5173- Audio → Image: pick an audio file, optionally adjust FFT size / precision, click Generate Wavegram, download the PNG.
- Image → Audio: load that PNG, confirm the Magic ✓ / CRC ✓ badges, choose a Griffin-Lim quality preset (Fast 32 / Default 50 / Quality 100), reconstruct, download the WAV.
| Command | What it does |
|---|---|
bun run dev |
Vite dev server with HMR |
bun run build |
Type-check (tsc -b) + production build to dist/ |
bun run preview |
Serve the built bundle |
bun test |
Unit + numerical round-trip tests (Vitest) |
bun run typecheck |
Type-check without emitting |
bun run e2e |
Headless-browser smoke test of the full UI round trip (Playwright) |
bun scripts/roundtrip-file.mjs <input-audio> [outDir] [iterations]Decodes via ffmpeg, runs the complete pipeline through a real PNG, and writes
<name>.original.wav, <name>.wavegram.png, and <name>.reconstructed.wav for an A/B
comparison, printing the spectral-convergence error.
src/
core/ framework-agnostic DSP + codec (pure, unit-tested)
audio/ decode · stft · istft · griffinlim · wav
image/ crc16 · header · squares · spectrogram · png · codec
log.ts log-magnitude scaling
params.ts defaults + shared types
forward.ts audio → image pipeline
backward.ts image → magnitude pipeline
workers/
reconstruct.worker.ts Griffin-Lim off the main thread
ui/ React components (forward/ + backward/ tabs, hooks)
components/ui/ shadcn/ui primitives
e2e/smoke.mjs Playwright end-to-end test
scripts/roundtrip-file.mjs CLI round-trip helper
The DSP and codec logic is deliberately kept free of any DOM or React dependency, so it runs identically in Vitest (Node), the browser, and the Web Worker.
Vite · React · TypeScript · Tailwind · shadcn/ui · fft.js
· audiobuffer-to-wav · native Web Audio /
Canvas APIs. Package manager: bun.
Mono audio, ≤ 60 seconds, PNG input only. Camera capture is intentionally deferred — a photo of the analog spectrogram region can't preserve magnitude values faithfully (perspective and lighting distortion), so it's out of scope for faithful reconstruction.