Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Brewer's Dictionary of Phrase and Fable - Text Cleaning and Analysis


To convert a really messy public domain text file of Brewer's Dictionary into a clean csv file with the columns: Entry, Definition With the text more cleanly parsed in this structure, the text is ready for further analysis.

About the Dictionary

Brewer's Dictionary of Phrase and Fable, as its title states, is a reference for phrases and fables. It was first published in 1890 by John Brewer and contains countless stimulating tidbits. More information about the dictionary can be found on its wiki page:

A public domain version of this dictionary is available on


We were driving on a cool November night and a 30-year-old recording of Casey Kasem's American Top 40 was playing on the 80s station. To preface the next song, Stevie Wonder's Skeletons (, he read an excerpt from Brewer's Dictionary about skeletons

The family skeleton, or the skeleton in the cupboard.Some domestic secret that the whole family conspires to keep to itself; every family is said to have at least one.The story is that someone without a single care or trouble in the world had to be found.After long and unsuccessful search a lady was discovered whom all thought would "fill the bill"; but to the great surprise of the inquirers, after she had satisfied them on aH points and the quest seemed to be achieved, she took them upstairs and there opened a closet which con-tained a human skeleton."I try,*' said she, "to keep niy trouble to myself, but every night rny husband compels me to kiss that skeleton."She then explained that the skeleton was once her husband's rival, killed in a duel.

After hearing that, I felt like I needed more useless but interesting trivia in my life so I went searching for the Brewer's Dictionary of Phrase and Fable online. There's a lot more stuff like that in the dictionary. Hopefully this project should make it a little easier to track down entries, use the text for your own projects, and perform your own analysis (NLP??).


Python and NLTK's tokenizers are leveraged to clean up this text. Some general heuristics are used to define definitions in the text. Some of the definition text appears as entries because of idiosyncracies with the document. If you have some clever ways of further cleaning the text please contribute!

Main Files

  1. brewers_trim.txt - Raw text with beginning text manually removed from document.
  2. - Script to parse out entries from the text
  3. clean_brewers.csv - Output comma separated file with format entry, definition


Brewer's Dictionary of Phrase and Fable - Text Cleaning and Analysis







No releases published


No packages published