This is a robust English wordlist for encoding data. It was inspired by previous natural language encoding systems, such as What3Words, BIP-0039, and Diceware.
This wordlist was created for the Mnemonikey project.
Each word in this list represents one of 4096 equally possible options. As per Claude Shannon's theory of information, a word from this list encodes 12 bits of information, since
But this wordlist is not composed of any old random words. The list has been carefully constructed to fulfill the following properties:
Property | Advantage |
---|---|
Avoids more than one homophone - words which sound the same but have different spelling. (e.g. beat and beet ) |
Avoids ambiguity and confusion when orally communicating words from the list. |
Avoids more than one conceptually competitive word (e.g. shook and shake both refer to the same action in a different tense) |
This prevents conceptual confusion, to allow words to be more easily memorized without mnemonic ambiguity. |
Contains no words shorter than 3 characters or longer than 8 characters | Long words add complexity. Small words reduce uniqueness. |
Every word is uniquely identifiable by at most the first four characters. | Allows fast automated interpretation when typing words as input. |
The Damerau-Levenshtein distance between every word in the wordlist is at least 2. | Reduces the risk of typos which could convert one valid word into another valid one by accident. |
This project was inspired by the following natural-language encoding projects:
The tooling for this package requires a Golang compiler.
For the moment, wordlist4096
is not finalized. I'm happy to accept PRs to improve the wordlist. Please ensure your changes fulfill these requirements:
- Any changes must pass tests. To run tests, use
go test
(ormake validate
). - Any new words must not contravene the heuristic properties discussed in the Specification section.
- Any word deletions or changes must be justifiable.
Be aware that discussions about what words are 'memorable' or 'confusing' may be highly subjective. In review, I may veto any arbitrary decision between words, simply to save time.
- To validate the computable properties of the wordlist, run
make validate
. - To suggest new possible words to add to the wordlist, run
make suggest
. (This pulls from/usr/share/dict/words
, only available on unix systems) - To sort the wordlist and deduplicate words, run
make tidy
.