tokenfiche

A picture is worth a thousand words. We made each one worth about 33,333 tokens, and GPT-5.5 read them back.

What is a token, really? A model's text window stops at a hard number, 272,000 tokens for GPT-5.5 through the Codex endpoint. So we asked a simple question: if you render the text as images instead of sending it as text, does that number still apply? The vision path reads pixels, and pixels are cheap. How much text can you actually get a model to read by photographing it?

The answer, for this one proven run: 1,000,000 tokens of source text, packed into 30 page images, every fact recovered exactly. That is 3.14 tokens of source text carried for every input token the API billed.

TL;DR

We took 1,000,000 tokenizer-counted tokens of text (the source is standard Gutenberg books), hid three exact "needle" markers inside it, rendered the whole thing as 30 dense grayscale page images using the hyperlegible Atkinson Mono font, and sent the images to GPT-5.5. The model found all three needles, character for character.

Metric	Value
Source text rendered	1,000,000 tokens
Page images	30 (3000x3000 px)
Source tokens per image	~33,333
Billed input tokens	318,283
Source-to-billed ratio	3.14x
Needles recovered	3 / 3 exact
One image more (31)	rejected: `context_length_exceeded`

Evidence: experiments/atkinson_10_1M-api/atkinson_10_1M.summary.json.

This is the vision path used as a lossy, OCR-like transport layer. Images carry far more source text than the text window would normally hold, as long as the text is rendered densely, legibly, and verified with retrieval probes.

The two gates

The interesting result is that "the endpoint accepted it" and "the model read it correctly" are two separate things, and they fail for different reasons.

Gate 1, the context gate, counts image patches. The endpoint slices each 3000x3000 image into 32x32 patches: 94 x 94 = 8,836 patches per page. The context limit is enforced on that patch count, near 272,000, and the billed input_tokens figure (318,283) is a separate billing number that is allowed to exceed it. The boundary is clean:

Pages	Patches	Result
30	265,080	accepted
31	273,916	`context_length_exceeded`

272,000 sits exactly between them. Thirty pages is the wall for this page size. Evidence: experiments/codex-gpt55-image-maximize-2026-06-20/api/t836811-p31-c10-fs10-m0-g0-ext562.summary.json.

Gate 2, the retrieval gate, is OCR. Passing the context gate only means the bytes fit. The model still has to read pixels. While standard Courier New layouts struggled with minor character spelling drops at ~900k tokens (dropping a single 'E' in QUEEQUEG), switching the design to the hyperlegible Atkinson Mono at size 10 and 8px line height completely solved these OCR limits. The model achieved perfect, character-for-character retrieval of all needles even at 1,000,000 source tokens.

How it works

Four scripts, run in order:

render_token_images.py   text -> dense page PNGs + manifest
build_codex_request.py   PNGs -> base64 Codex request with a strict-JSON output schema
send_codex_request.py    request -> SSE stream -> classified summary
verify_repo.py           sanity-check the packaged evidence

The renderer trims text to an exact token budget with tiktoken, inserts the three needle markers at 20% / 50% / 90% offsets, reflows paragraphs, and packs them into multi-column pages. It computes layout capacity and fails preflight if the text would overflow, so you never pay for an image that dropped characters off the page.

The winning layout is intentionally plain:

3000x3000 grayscale PNG pages
8 columns, 1 px margins, 3 px gutters
Atkinson Mono at 10 px, 62 characters per line, 8 px line height
30 pages, 81,686 wrapped lines, 97.71% average rendered ink width

The reflow step is the core trick. Early attempts preserved Project Gutenberg's hard wraps, which left text in a narrow strip down the left side and wasted most of the page. Reflowing each paragraph to fill the full column width is what turned that wasted whitespace into dense, readable pages. Switching from Courier New to the hyperlegible Atkinson Mono then let the line height drop to 8 px without the model dropping characters, which is what carried the run past the old Courier New ceiling of 810,549 tokens to a clean 1,000,000.

Reproduce it

You need uv. Nothing else installs globally.

Verify the packaged evidence:

uv run --with pillow --with tiktoken python scripts/verify_repo.py

Re-render the proven best source:

uv run --with pillow --with tiktoken python scripts/render_token_images.py \
  --source-text experiments/atkinson_10_1M/source-1000000-tokens.txt \
  --target-tokens 1000000 \
  --pages 30 --columns 8 --font-size 10 \
  --chars-per-line 62 --line-height 8 \
  --margin 1 --gutter 3 \
  --font experiments/color-layer-overlap-2026-06-20/fonts/AtkinsonMono.ttf \
  --out runs/repro-1000000

Build the request, then send it with your Codex auth:

python scripts/build_codex_request.py \
  --render-dir runs/repro-1000000 \
  --output runs/repro-1000000.request.json \
  --redacted-output runs/repro-1000000.request.redacted.json

python scripts/send_codex_request.py \
  --request runs/repro-1000000.request.json \
  --manifest runs/repro-1000000/manifest.json \
  --out-dir runs/repro-1000000-api

send_codex_request.py reads ~/.codex/auth.json by default, or accepts CODEX_ACCESS_TOKEN and CHATGPT_ACCOUNT_ID.

When this is worth it

Good fits:

Long-document triage where finding anchored evidence matters more than perfect transcription.
Retrieval evals with inserted needles across huge payloads.
Agent-memory or compaction experiments comparing text transport against image transport.
Stress-testing multimodal context windows and billing behavior.

Bad fits:

Anything where one wrong character is dangerous.
Legal, medical, or financial work without a second verifier.
Code execution or patch generation from image text.
Private data, unless you are comfortable with the endpoint and its retention behavior.

Repo map

scripts/render_token_images.py   render source text into page PNGs
scripts/build_codex_request.py   build a Responses-style Codex request body
scripts/send_codex_request.py    send the request and summarize the SSE result
scripts/verify_repo.py           check the packaged evidence and scripts
docs/experiment-log.md           best result first, then superseded and failed runs
docs/request-shape.md            observed Codex request shape and limits
docs/blob-context-probe.md       earlier blob / base64 / image probes
experiments/                     preserved summaries and the best run's images
examples/gutenberg-cache/        public-domain source texts

Status

This is an experiment package, not an official SDK. The Codex endpoint, headers, model behavior, and limits can change. Treat the numbers here as grounded evidence for the recorded run, not a contract. Always test with needles, always inspect the marker crops, and only trust a layout that passes the retrieval task you actually care about.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
examples/gutenberg-cache		examples/gutenberg-cache
experiments		experiments
scripts		scripts
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tokenfiche

TL;DR

The two gates

How it works

Reproduce it

When this is worth it

Repo map

Status

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tokenfiche

TL;DR

The two gates

How it works

Reproduce it

When this is worth it

Repo map

Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages