A deterministic, registry-free, 13-character identifier for books — computed from any title page, without a central authority.
USBN ::= 'U' <12 Crockford Base32 characters> (edition-level)
WSBN ::= 'W' <12 Crockford Base32 characters> (work-level, no year)
Every book published after 1970 has an ISBN. Every book published before 1970 does not. That sentence is the entire motivation for USBN.
The roughly sixty million editions published before the ISBN era have no machine-readable identifier. Alternatives like OCLC, LCCN, and Open Library identifiers help, but they all require a central registry — a book must first be cataloged by a member institution before it has an ID. A used-book dealer holding a 1923 volume that no institution has previously cataloged has no standard identifier to point to at all.
USBN solves this by computing the identifier from the book itself. Hash the title, author, and publication year as printed on the title page through a fixed algorithm, encode the result in a compact human- transcribable string, and you have a deterministic, registry-free identifier that any two people can produce independently from the same physical book.
This repository contains the v1.0 specification, the reference implementations in Python and JavaScript, a complete collision analysis, the canonical test vectors, and the Code4Lib Journal paper describing the design.
The live reference site is at openusbn.org — it hosts an in-browser converter, a long-form explainer, the rendered spec, and a downloadable copy of the paper.
# No dependencies. Pure Python stdlib.
from code.usbn import generate_usbn, generate_wsbn
generate_usbn("The Outline of History", "H. G. Wells", 1949)
# → 'UAZJA136WFYXF'
generate_wsbn("The Outline of History", "H. G. Wells")
# → 'WC17225YANQAM'
# Two printings of the same work share a WSBN but have distinct USBNs:
generate_usbn("The Outline of History", "H. G. Wells", 1961)
# → 'UQHJ8P28DXHRC' ← different edition
generate_wsbn("The Outline of History", "H. G. Wells")
# → 'WC17225YANQAM' ← same workOr in the browser, at openusbn.org.
| ISBN-13 | OCLC (OCN) | LCCN | USBN | |
|---|---|---|---|---|
| Length | 13 digits | variable | variable | 13 chars |
| Registry required? | yes | yes | yes | no |
| Deterministic? | no | no | no | yes |
| Pre-1970 coverage? | no | partial | partial | yes * |
| Computable offline? | no | no | no | yes |
| Assignment fee? | yes | — | — | no |
| Check character? | yes | no | — | no |
| Work-level companion? | no | no | — | yes (WSBN) |
* USBN coverage depends on the availability of accurate title-page metadata; anything with a legible title, author, and year can receive a USBN.
title ─┐
author ─┼─▶ normalize ─▶ BLAKE2s-8 ─▶ top 60 bits ─▶ Crockford Base32 ─▶ U + 12 chars
year ─┘ (NFKD, (64-bit (>>4) (encoded as 12 (13-char
strip M, digest) chars, no pad) USBN)
uppercase,
collapse ws)
Every step is deterministic. The same title-page metadata produces the
same USBN on every machine, in every locale, every time. The full
normative algorithm is in spec/usbn_v1.0.md;
the paper (usbn.pdf) walks through why each step was
chosen and how the v1.0 design corrected four defects in the original
draft.
The Python reference has no dependencies — only the standard library. Drop it into any project:
git clone https://github.com/novalis78/USBN.git
cd USBN
python3 code/usbn.py
# USBN WSBN metadata
# --------------------------------------------------------------------------------
# UAZJA136WFYXF WC17225YANQAM The Outline of History / H. G. Wells / 1949
# ...Or, more commonly, import it into your own code:
import sys; sys.path.insert(0, 'code')
from usbn import generate_usbn, generate_wsbn, parse_identifier
generate_usbn("Gödel, Escher, Bach", "Douglas R. Hofstadter", 1979)
# → 'UY10W4C9Z66C2'
parse_identifier(" uazja136wfyxf ") # case-insensitive, whitespace-tolerant
# → 'UAZJA136WFYXF'One small dependency (blakejs, ≈4 kB):
npm install blakejsconst { generateUSBN, generateWSBN } = require('./code/usbn.js');
generateUSBN("The Outline of History", "H. G. Wells", 1949);
// → 'UAZJA136WFYXF'The same module works in browsers when bundled (Astro, Vite, webpack, Rollup, esbuild — all standard). A live example runs at openusbn.org.
python3 -c "from code.usbn import generate_usbn; \
print(generate_usbn('The Outline of History', 'H. G. Wells', 1949))"
# UAZJA136WFYXFAny conformant implementation MUST produce these identifiers for
these inputs. A machine-readable version is at
data/test_vectors.json.
| Title | Author | Year | USBN | WSBN |
|---|---|---|---|---|
| The Outline of History | H. G. Wells | 1949 | UAZJA136WFYXF |
WC17225YANQAM |
| The Outline of History | H. G. Wells | 1961 | UQHJ8P28DXHRC |
WC17225YANQAM ← |
| George Washington: A Biography | Douglas Southall Freeman | 1949 | UVKK6DS3YWESM |
WGVKGH0WKR66C |
| College Calculus with Analytic Geometry | Murray H. Protter | 1964 | UGM4Y9KZVGYH7 |
WDYNK8KP7FHSG |
| Über die Relativitätstheorie | Albert Einstein | 1916 | URAYHF9EDXKGQ |
W718QV0NXA405 |
| The Elements of Style | W. Strunk Jr. & E. B. White | 1959 | U4TMJP8GE1DSF |
W4NPQT7637D53 |
| The Joy of Cooking | (anonymous) | 1931 | UHPXVG93MX6HY |
WPNJPWZYV2NDF |
The ← marks the two Wells printings: same work, different editions,
same WSBN, distinct USBNs.
USBN v1.0 carries 60 bits of hash entropy, giving a birthday-bound 50% collision probability at approximately 1.26 billion entries — roughly eight times the estimated global book corpus.
| Corpus | Size | P(≥1 collision) |
|---|---|---|
| Typical library | 1 M | 4.3 × 10⁻⁵ % |
| Large union catalog | 10 M | 4.3 × 10⁻³ % |
| Pre-ISBN corpus (target) | 60 M | 0.16 % |
| Estimated global book corpus | 150 M | 0.97 % |
The full analytic table, an empirical probe across a 100,000-book
synthetic corpus, and a plotting script are in
code/collision_analysis.py. The paper
(§6) compares these numbers against four intermediate designs we
considered before settling on 60-bit USBN-13.
usbn.tex / usbn.pdf # Code4Lib Journal paper (source + compiled)
references.bib # BibTeX bibliography
code/
usbn.py # Reference implementation (Python, stdlib only)
usbn.js # Reference implementation (Node.js, blakejs)
generate_test_vectors.py # Regenerate data/test_vectors.json
collision_analysis.py # Birthday bound + empirical probe
plot_collisions.py # Figure generator (matplotlib)
spec/
usbn_v1.0.md # Formal v1.0 specification (authoritative)
usbn_spec_draft_v0.md # Original April 2025 draft (preserved)
usbn_article_draft_v0.md # Original article draft (preserved)
data/
test_vectors.json # Canonical test vectors (version-stamped)
results/
collision_analysis.json # Analytic + empirical outputs
figures/
collision_curves.pdf # Figure 1 from the paper
collision_curves.png
# Python reference + demo
python3 code/usbn.py
# Regenerate canonical test vectors
python3 code/generate_test_vectors.py
# Collision analysis: analytic table + empirical probe (100k books)
python3 code/collision_analysis.py
# Figure 1 (collision curves)
python3 code/plot_collisions.py
# JavaScript reference (requires blakejs)
cd code && npm install blakejs && node usbn.js
# Compile the paper
pdflatex usbn && bibtex usbn && pdflatex usbn && pdflatex usbnEverything in data/, results/, and figures/ is deterministic:
re-running the scripts from a clean checkout will reproduce the same
bytes.
If you use USBN in your work, you can cite it as:
@misc{lopin2026usbn,
author = {Lopin, Lennart},
title = {{USBN}: A Deterministic, Registry-Free Identifier
for Pre-{ISBN} Books},
year = {2026},
month = apr,
howpublished = {\url{https://openusbn.org}},
note = {Reference implementation:
\url{https://github.com/novalis78/USBN}},
}Or, once the Code4Lib Journal issue is published, the canonical
citation will be to that article; the pre-publication PDF is in this
repository at usbn.pdf.
- Determinism. Two catalogers, same title page, same USBN. No registry, no network, no coordination.
- Brevity. Thirteen characters, matching ISBN-13. Drop-in compatibility with MARC fields, catalog UIs, and URL slots.
- Human transcribability. Case-insensitive (single-case Crockford Base32 alphabet). No character pair is ambiguous in any common font. You can copy a USBN from a napkin.
- Collision resistance. Sixty bits of entropy. Any realistic catalog is effectively collision-free; the full global book corpus sees fewer than two collisions in expectation.
- Work/edition distinction. USBN identifies a particular printing; the companion WSBN identifies the underlying work, enabling FRBR-style grouping without a second lookup.
This v1.0 release is a substantial revision of an unpublished April 2025 draft, which contained four subtle defects. They are not hidden; §3 of the paper documents them in detail.
| # | Defect | v1.0 resolution |
|---|---|---|
| 1 | Example USBNs in the draft contained the 0 digit, a character the alphabet was supposed to exclude |
Crockford Base32 alphabet (which legitimately includes 0 as its zero value) |
| 2 | Alphabet counted as 32 chars but actually had 33, leaving base-32 arithmetic ambiguous | Exactly 32 characters, with a published value table |
| 3 | 48-bit digest → ~99.8% collision probability at pre-ISBN scale | 60-bit digest → 0.16% at pre-ISBN scale |
| 4 | No work-level identifier | WSBN companion: same algorithm, W prefix, no year |
All four fixes fit within the same thirteen-character form length. The v1.0 budget is the same as the draft's; the difference is that it is used well.
- openusbn.org — the official reference site: in-browser converter, long-form explainer, rendered spec, paper download.
- The paper — formal presentation, design rationale, collision analysis, and complete test vectors. Submitted to the Code4Lib Journal.
- Crockford Base32 — Douglas Crockford's human-transcribable base-32 alphabet, which USBN uses for its case-insensitive encoding.
- BLAKE2s (RFC 7693) — the cryptographic hash function used by USBN. Fast, modern, with configurable output length.
- Bitcoin (Nakamoto, 2008) — the design precedent for deterministic, registry-free identifiers encoded in a human-friendly alphabet.
Small fixes, alternative-language ports, and implementation-conformance reports are all welcome via pull request or issue.
One non-negotiable invariant enforced by CI and by the pre-commit hook:
Every USBN or WSBN displayed anywhere in this repository — the paper, this README, the live site, the test-vector file, the reference implementations — MUST be reproducible from exactly the metadata shown next to it.
This was the defect class that motivated the v1.0 rewrite, and the whole point of a deterministic identifier is that a reader can verify any displayed identifier by running the reference implementation on the displayed metadata. A committed change that violates this invariant undermines the entire project.
The invariant is mechanically enforced by
code/audit_displayed_vectors.py,
which enumerates every displayed (metadata, USBN, WSBN) triple and
verifies each against the reference implementation. When you add a
new displayed example anywhere, add a corresponding row to that
script in the same commit.
To run the audit locally:
python3 code/audit_displayed_vectors.py
# AUDIT PASSED — all 27 displayed identifiers verify against reference.To enable automatic enforcement on every git commit (recommended):
pip install pre-commit
pre-commit installThe .pre-commit-config.yaml in the repo root also runs the standard
whitespace and JSON/YAML syntax checks. Every push to main and every
pull request also triggers the audit in GitHub Actions (see
.github/workflows/audit.yml).
The reference implementation and test vectors are released under the
MIT License — see LICENSE.
The paper text, figures, and bibliographic prose are © 2026 Lennart Lopin / Euler's Identity LLC, all rights reserved except as needed for academic fair use and Code4Lib Journal publication.
Maintained by Euler's Identity LLC. Questions, corrections, implementation reports → open an issue.