Skip to content
🙈 assistant for hunting down tpyos
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

tpyo gif


  • Saw couple of typos while reading a fantasy book and wondered why weren't they caught
  • Felt like a good mini-project to improve my Python and programming skills


  1. Compare list of dictionary words with words extracted from e-book using Python code
    • as of now, working on docx/epub formats
  2. The output generated has to be manually checked to validate
    • in-world terms like names, places, etc
    • words not found in reference dictionary
    • hyphenated words
  3. These words can then be added to reference list of words so that further runs will reveal only typos
  4. Repeat steps 1-3 when input documents change


  • Use the program at your own risk
    • files/directories are read/created programmatically, bug could corrupt your system
    • I only have Linux, so don't know how it'll behave when used with other operating systems
  • at best, project could be said to be at alpha stage


For Linux and Unix-like systems

First, clone the repo or download the zip

$ git clone

$ cd tpyo_revealo/
$ mkdir ref_words input_doc
$ # multiple documents and reference lists can be put in these directories
$ cp samples/sample.docx input_doc/
$ cp /usr/share/dict/words ref_words/words.txt

$ # this will create a log directory using current time as directory name
$ python3

$ cat 2017-12-20_15_38_07.341621/hyphenated_words.log
en-IN: 1
full-fledged: 1
$ cat 2017-12-20_15_38_07.341621/tpyo_words.log
LibreOffice/$Linux_X: 1
LibreOffice_project/20m0$Build: 1
rny: 1
samlpe: 1
T15:37:31Z: 1
tpyo: 1
wordswithoutspace: 1

$ # create ignore lists and run again
$ cat > ref_words/ignore.txt
$ echo 'full-fledged' > ref_words/hyphenated_words.txt

$ python3
$ cat 2017-12-20_15_40_45.505735/hyphenated_words.log
$ cat 2017-12-20_15_40_45.505735/tpyo_words.log
rny: 1
samlpe: 1
tpyo: 1
wordswithoutspace: 1

Where to get word lists

  • this stackoverflow Q&A might help
  • aspell looked good (mentioned in above link)
    • American/British/Canadian/Australian spellings
    • SCOWL size 95, Variants 3, Diacritic stripped gives 660+K words
      • The script finished in less than 3 seconds for Oathbringer book(450+K words) against 660+K reference words, so performance not an issue
    • Can be downloaded for both Windows/Unix
    • See scowl-readme for more details including usage and license


  • Better parsing for xhtml files. As of now xml extraction is used, so things like T<span class="XXX">HOSE words messes up things
  • Code organization - need to break up into different functions, etc
  • Features - repeated words, adverbs repeated in short space, etc
  • Look into NLTK


  • Open an issue for suggestions, feature requests, bugs, etc


MIT, see LICENSE file

You can’t perform that action at this time.