Wavegram

Turn audio into a self-describing spectrogram image — and reconstruct the audio back from that image. Wavegram is a two-tab single-page app that runs entirely in your browser: no server, no uploads, no network calls. The output is an ordinary PNG that carries everything needed to rebuild the sound inside its own pixels.

It's built for the voice-message use case (Opus/OGG voice notes, 16 kHz mono), but works on any audio file up to 60 seconds.

┌──────────────────────────────────────────────┐
│  Rows 0–15    Metadata header                 │  16px, 8×8 B&W squares
│               magic · version · params · CRC  │
├──────────────────────────────────────────────┤
│  Rows 16+     Spectrogram                      │
│               X = time  →                      │
│               Y = frequency (low at bottom) ↑  │
│               brightness/colour = magnitude    │
└──────────────────────────────────────────────┘

How it works

Audio → Image (Forward tab) decode (Web Audio) → mono mixdown → resample to 16 kHz → peak-normalize → cap at 60 s → STFT with a Hann window (FFT 1024 / hop 512) → log-magnitude spectrogram → write the 16-px metadata header → export PNG.

Image → Audio (Backward tab) read the PNG → reject lossy formats → validate magic bytes + CRC-16 → display the decoded metadata → recover linear magnitudes → Griffin-Lim phase recovery in a Web Worker → overlap-add → trim to the original sample count → download WAV.

Phase is discarded in the forward direction and recovered probabilistically (Griffin-Lim) on the way back, so the reconstruction is faithful but not bit-exact — that's inherent to a magnitude-only spectrogram, not a bug.

Self-describing format

Every reconstruction parameter lives in the image, so any Wavegram PNG decodes with no external state. The header is a 140-bit structure rendered as 8×8 black/white squares (two rows, 70 squares each) and protected by a CRC-16:

Field	Bits	Notes
Magic	32	`0x41554456` ("AUDV")
Version	4	schema version
Precision flag	1	reserved bit 0 — see below
Reserved	1	spare
Sample rate	24	e.g. 16000
FFT size	11	e.g. 1024
Hop size	11	e.g. 512
Sample count	32	for exact trim on reconstruction
Channels	8	`0x01` = mono
CRC-16	16	CCITT, over all preceding bits

The FFT/hop fields are 11 bits each (the original 10-bit spec couldn't hold 1024). The two extra bits come from the reserved field, keeping the header at exactly 140 bits.

Magnitude precision is header-flagged:

16-bit (default) — each magnitude packed across the R (high byte) and G (low byte) channels. Much cleaner reconstruction; this is the recommended mode.
8-bit grayscale — the spec-literal single-channel encoding, smaller but with an audible quantization floor.

PNG only. JPEG/WebP and other lossy formats are rejected by signature at the decoder, because their compression corrupts the magnitude values.

Quick start

bun install
bun run dev          # http://localhost:5173

Audio → Image: pick an audio file, optionally adjust FFT size / precision, click Generate Wavegram, download the PNG.
Image → Audio: load that PNG, confirm the Magic ✓ / CRC ✓ badges, choose a Griffin-Lim quality preset (Fast 32 / Default 50 / Quality 100), reconstruct, download the WAV.

Scripts

Command	What it does
`bun run dev`	Vite dev server with HMR
`bun run build`	Type-check (`tsc -b`) + production build to `dist/`
`bun run preview`	Serve the built bundle
`bun test`	Unit + numerical round-trip tests (Vitest)
`bun run typecheck`	Type-check without emitting
`bun run e2e`	Headless-browser smoke test of the full UI round trip (Playwright)

Round-trip a real file (listening test)

bun scripts/roundtrip-file.mjs <input-audio> [outDir] [iterations]

Decodes via ffmpeg, runs the complete pipeline through a real PNG, and writes <name>.original.wav, <name>.wavegram.png, and <name>.reconstructed.wav for an A/B comparison, printing the spectral-convergence error.

Project layout

src/
  core/                 framework-agnostic DSP + codec (pure, unit-tested)
    audio/              decode · stft · istft · griffinlim · wav
    image/              crc16 · header · squares · spectrogram · png · codec
    log.ts              log-magnitude scaling
    params.ts           defaults + shared types
    forward.ts          audio → image pipeline
    backward.ts         image → magnitude pipeline
  workers/
    reconstruct.worker.ts   Griffin-Lim off the main thread
  ui/                   React components (forward/ + backward/ tabs, hooks)
  components/ui/         shadcn/ui primitives
e2e/smoke.mjs           Playwright end-to-end test
scripts/roundtrip-file.mjs   CLI round-trip helper

The DSP and codec logic is deliberately kept free of any DOM or React dependency, so it runs identically in Vitest (Node), the browser, and the Web Worker.

Tech stack

Vite · React · TypeScript · Tailwind · shadcn/ui · fft.js · audiobuffer-to-wav · native Web Audio / Canvas APIs. Package manager: bun.

Scope (v1)

Mono audio, ≤ 60 seconds, PNG input only. Camera capture is intentionally deferred — a photo of the analog spectrogram region can't preserve magnitude values faithfully (perspective and lighting distortion), so it's out of scope for faithful reconstruction.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
e2e		e2e
public		public
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
components.json		components.json
eslint.config.js		eslint.config.js
index.html		index.html
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wavegram

How it works

Self-describing format

Quick start

Scripts

Round-trip a real file (listening test)

Project layout

Tech stack

Scope (v1)

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wavegram

How it works

Self-describing format

Quick start

Scripts

Round-trip a real file (listening test)

Project layout

Tech stack

Scope (v1)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages