A curated Latin word list — 134,154 unique forms, analogous to Unix /usr/share/dict/words.
- One word per line
- UTF-8 encoded
- Alphabetically sorted
- Normalized: v→u, j→i
- Corpus-validated against 375M+ tokens of Latin text
# Check if a word is Latin
grep -q "^amicus$" verba.txt && echo "found"
# Count words
wc -l verba.txt
# Use as a spell-check dictionary
aspell --lang=la --personal=./verba.txt check mytext.txt- 134,154 unique word forms
- Validated against 975,803 forms from LatinCy word lists (Wiktionary + UD treebanks)
- Validated against a 375M-token Latin corpus (CC100, Wikipedia, Wikisource, Perseus, Tesserae, Latin Library, CAMENA, Patrologia Latina, UD treebanks)
If you find an incorrect, missing, or spurious entry in the word list, please open an issue.
CC0 1.0 Universal — see LICENSE.
If you use this word list in research, please cite:
@dataset{burns_verba_2026,
author = {Burns, Patrick J.},
title = {Verba: A Curated Latin Word List for {NLP} Applications},
year = {2026},
url = {https://github.com/latincy/verba},
version = {0.1.1},
note = {134,154 unique Latin word forms derived from LatinCy word lists and validated against a 375M-token corpus}
}