Skip to content
Two techniques for analyzing free text documents: a bag of words approach based upon creating a TFIDF matrix, and an n-gram language model.
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
bag_of_words.py
load_federalist_corpus.py
n_gram_language_models.py
pg18.txt

README.md

Natural-Language-Processing

Two techniques for analyzing free text documents: a bag of words approach based upon creating a TFIDF matrix, and an n-gram language model.

The dataset

The Federalist papers were a series of essay written in 1787 and 1788 by Alexander Hamilton, James Madison, and John Jay (they were published anonymously at the time), that promoted the ratification of the U.S. Constitution. If you're curious, you can read more about them here: https://en.wikipedia.org/wiki/The_Federalist_Papers . They are a particularly interesting data set, because although the authorship of most of the essays has been long known since around the deaths of Hamilton and Madison, there was still some question about the authorship of certain articles into the 20th century.

You can’t perform that action at this time.