Skip to content

jschoemaker/colorgpt

Repository files navigation

ColorGPT

A language model's vocabulary, rendered as color and sound.

Qwen 2.5 0.5B's full 151,936-token vocabulary, sorted by hue

ColorGPT takes the input-token embedding matrix of a small language model — Qwen 2.5 0.5B, ~152k tokens × 896 dimensions — projects it to three dimensions with UMAP, and uses those coordinates as both a perceptual color space (OKLab → sRGB) and a sonic one (Web Audio). Each of the model's tokens gets a fixed color and a fixed timbre. Generation, reading, and dialogue then become chromatic and acoustic events, played out in real time.

The piece is an attempt to render the model's mental geography of language — not what words mean to us, but how a network with no eyes, no ears, and no body has learned to organize symbols among themselves.


What this is — and what it is not

This is not visual synesthesia. The token blue is not blue. scarlet is not red.

What you see is the geometry of the model's input-embedding space, projected into three dimensions and assigned to color and sound. Tokens that are close in that space are close in color and timbre. So blue / azure / navy will cluster — not because they describe blue things, but because the model has learned they appear in similar contexts. The same is true for Monday / Tuesday / Wednesday, for inflections of a single verb, and for the BPE fragments ing, ed, ly.

A second commitment: ColorGPT renders subword tokens, not words. The model does not see "scarlet"; it sees scar followed by let, with two unrelated colors. Word-averaging would smooth this away and lie about how the model perceives text. The fragmentation is the point.


How it works

Qwen 2.5 0.5B input embeddings (vocab × 896)
    ↓ UMAP (cosine metric, 3D)
3D coordinates per token
    ↓ axis 0  → OKLab L                ↓ all three axes
    ↓ axis 1  → OKLab a   →  sRGB      → normalized [0,255]³
    ↓ axis 2  → OKLab b                  driving Web Audio synth
                                         (pitch, timbre, pan)

The same UMAP coordinates drive both modalities. A token's color and its sound are coupled — they are two readings of the same vector. Dialogue mode hard-pans speaker A to the left channel and speaker B to the right; otherwise the audio is a function of the token alone, not who said it.

OKLab is used because it is perceptually uniform: equal Euclidean distances in OKLab correspond to equal perceived color differences. This means small movements in embedding space produce small movements in apparent color, and large movements produce large ones — the projection is an honest one rather than a function of sRGB's well-known non-uniformities.


Three live modes

All three are streamed over Server-Sent Events. The browser receives {id, text, rgb, u, source, hold_ms} per token and paints / sounds it.

speaker

A single instance of Qwen autoregresses from a prompt. A literary-prose prefix is prepended to nudge the model away from QA-style continuations. Non-Latin tokens are suppressed via bad_words_ids — we want the script the LUT was tuned for.

reader

A corpus file is tokenized and emitted in order. No generation, no sampling — pure transcription, the corpus passed through the model's vocabulary as colored cells. Provided corpora are public-domain English: the King James Bible (Genesis 1, 1 Corinthians 13, Ecclesiastes 3, John 1) and Conrad's Heart of Darkness.

Reader view of John 1: every token a labeled colored chip, verse newlines preserved

dialogue

Two Qwen instances pass the rolling transcript back and forth. Even turns are speaker A, odd turns are speaker B. Same weights, separate KV state per turn. In the audio, A is panned hard left, B is panned hard right — the room becomes stereo conversation.


Pacing — punctuation as heartbeat

Pacing lives in pacing.py, downstream of the streams. Every token gets a base hold of 1 / base_tps (default 500 ms). On top of that, tokens ending in punctuation receive a tiered additional pause:

punctuation pause (ms) role
. ! ? 700 sentence-end
900 trailing-off
; 400 clause
: 350 clause
, — – 220 phrase
\n 450 line break
closers ) ] " ' ” ’ 140–180 release

A . is always rendered as rgb(0, 143, 87) and held for 700 ms. The piece has a heartbeat: punctuation marks become visual and sonic punctuation, identical across modes.


Static printable artifacts

histogram.py renders four image types from any text. All are produced PNGs intended for print or wall display.

render what it shows
transcript Every content token of a corpus packed into a perfect ⌈√N⌉ × ⌈√N⌉ grid, with a hue-sorted palette strip below. Verse structure is deliberately dropped — the corpus collapses into a square of color.
palette Frequency-weighted bars sorted by hue. The corpus's chromatic fingerprint as a Pantone fan.
reading-map A per-corpus grid of unique tokens, each cell labeled with its decoded text, sized for printing as a handout (25 mm cells) or wall card (15 mm cells). The Rosetta stone for a corpus.
vocab atlas The whole 151,936-token vocabulary as a √V × √V chromatic atlas. Sortable by token ID (training-derived structure: common merges first, then rarer / CJK-byte-fallback tokens) or by hue (chromatic distribution).

transcript · John 1 (KJV)

Sequential transcript of John 1 with hue-sorted palette strip below

Every content token of the passage as a colored cell, packed into a square. The hue-sorted palette underneath is the corpus's chromatic fingerprint.

reading-map · Heart of Darkness

Labeled reading-map of Heart of Darkness: every unique token rendered as a colored cell with its decoded text

The whole vocabulary of Conrad's novella as a labeled grid — the Rosetta stone for a corpus, sized for a wall card.

vocab atlas · corpus filter

The atlas restricted to the tokens present in a single corpus — a chromatic fingerprint at the vocabulary level. John 1 (146 unique tokens) on the left, Genesis 1 (187) on the right.

John 1 Genesis 1
Vocab atlas filtered to john1 tokens Vocab atlas filtered to genesis1 tokens

Generate your own with the snippet under Run below.


Demo

demo.mp4

Run

Requires Python 3.11+. First-time setup, in PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
python build_lut.py     # ~6 minutes, one-time precompute
python server.py        # http://127.0.0.1:5000

build_lut.py produces three files: lut.bin (the token → RGB lookup, ~445 KB), lut_meta.json (vocab size, model id, parameters), and umap_3d.npy (cached 3D coords, so OKLab tuning can be re-rendered without re-running UMAP). All three are regenerable and gitignored.

Press F for fullscreen, or open ?fullscreen=1 directly. Audio starts on first user interaction (browser policy).

For static renders without the server:

python visualize.py "the quick brown fox"        # one-off colored strip → out.html
python -c "from histogram import render_transcript; render_transcript(open('corpus/john1.txt').read()).save('john1.png')"

File map

file role
build_lut.py One-time LUT precompute (embeddings → UMAP → OKLab → sRGB).
engine.py Lazy-loaded shared state: model, tokenizer, color LUT, audio LUT.
streams.py The three modes. Each yields token events; bounded Queue(maxsize=1) for backpressure.
pacing.py Base TPS + tiered punctuation pauses. Single source of truth for timing.
server.py Flask + SSE. Serves the UI, three streams, corpus uploads, static renders, filter bitmaps.
histogram.py Static PNG renders (transcript, palette, reading-map, vocab atlas).
visualize.py Standalone colored-strip HTML for arbitrary text.
templates/index.html UI + Web Audio synth + client-side vocab canvas.
corpus/ Public-domain text (KJV passages, Heart of Darkness).

Physical installation

A physical instantiation of ColorGPT is in development.


Acknowledgements

ColorGPT uses Qwen 2.5 0.5B (Apache 2.0) for both the embedding source and the live generation. UMAP is umap-learn. OKLab is Björn Ottosson's perceptual color space (2020). Provided corpora are public domain.


Citation

See CITATION.cff — GitHub renders a "Cite this repository" widget from it. If you write about the work, the dual license below applies.


License

Dual-licensed:

  • Code (everything except README.md, CITATION.cff, and any future docs/) — Apache License 2.0. See also NOTICE.
  • Writeup (README.md and any future docs/*.md) — CC BY 4.0.

The split exists because the code is meant to be reused and the prose is meant to be cited. Apache 2.0 carries an explicit patent grant and a retaliation clause — defensive cover for a project working in a domain where patent activity is increasing. CC BY 4.0 preserves the conceptual claim: anyone may quote, adapt, or translate the framing of ColorGPT, but must credit the original.

About

No description, website, or topics provided.

Resources

License

Apache-2.0, Unknown licenses found

Licenses found

Apache-2.0
LICENSE
Unknown
LICENSE-writeup.md

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors