Skip to content
Library for dissertation
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



Collection of functions used in my dissertation, A Gospel of Health and Salvation.

Available sections of the module are:

  • clean -- code for cleaning messy OCR
  • models -- code creating topic modeling pipeline
  • phrases -- collection of most common noun phrases in corpus
  • preprocess -- prepare text for modeling with Mallet
  • reports -- code for taking the data about the corpus and isolating particular elements
  • utilities -- helper functions for executing the above tasks


To generate error rate statistics:

from text2topics import reports

reports.process_directory(directory, spelling_dictionary)

To create a spelling dictionary from text files:

from text2topics import utilities

utilities.create_spelling_dictionary(directory, wordlists)

wordlists is a list of file(s) containing the verified words and directory is the directory where those wordlist files reside. This function converts all words to lowercase and returns only the list of unique entries.


To install, navigate to the root directory of module (text2topics/) and run

pip install .

To update, run

pip install --upgrade .
You can’t perform that action at this time.