# Natural Language Processing
## Introduction

In this problem we will develop two techniques for analyzing free text documents: a bag of words approach based upon creating a TFIDF matrix, and an n-gram language model.

We will be applying our models to the text from the Federalist Papers.  The Federalist papers were a series of essay written in 1787 and 1788 by Alexander Hamilton, James Madison, and John Jay (they were published anonymously at the time), that promoted the ratification of the U.S. Constitution.  If you're curious, you can read more about them here: https://en.wikipedia.org/wiki/The_Federalist_Papers . They are a particularly interesting data set, because although the authorship of most of the essays has been long known since around the deaths of Hamilton and Madison, there was still some question about the authorship of certain articles into the 20th century.  We have used document vectors and language models to do this analysis.

## The dataset

We will use a copy of the Federalist Papers downloaded from Project Guttenberg, available here: http://www.gutenberg.org/ebooks/18 .  Specifically, the "pg18.txt" file is the raw text downloaded from Project Guttenberg.

In [1]:
import re

def load_federalist_corpus(filename):
    """ Load the federalist papers as a tokenized list of strings, one for each eassay"""
    with open(filename, "rt") as f:
        data = f.read()
    papers = data.split("FEDERALIST")
    
    # all start with "To the people of the State of New York:" (sometimes . instead of :)
    # all end with PUBLIUS (or no end at all)
    locations = [(i,[-1] + [m.end()+1 for m in re.finditer(r"of the State of New York", p)],
                 [-1] + [m.start() for m in re.finditer(r"PUBLIUS", p)]) for i,p in enumerate(papers)]
    papers_content = [papers[i][max(loc[1]):max(loc[2])] for i,loc in enumerate(locations)]

    # discard entries that are not actually a paper
    papers_content = [p for p in papers_content if len(p) > 0]

    # replace all whitespace with a single space
    papers_content = [re.sub(r"\s+", " ", p).lower() for p in papers_content]

    # add spaces before all punctuation, so they are separate tokens
    punctuation = set(re.findall(r"[^\w\s]+", " ".join(papers_content))) - {"-","'"}
    for c in punctuation:
        papers_content = [p.replace(c, " "+c+" ") for p in papers_content]
    papers_content = [re.sub(r"\s+", " ", p).lower().strip() for p in papers_content]
    
    authors = [tuple(re.findall("MADISON|JAY|HAMILTON", a)) for a in papers]
    authors = [a for a in authors if len(a) > 0]
    
    numbers = [re.search(r"No\. \d+", p).group(0) for p in papers if re.search(r"No\. \d+", p)]
    
    return papers_content, authors, numbers
    
    

In [2]:
# AUTOLAB_IGNORE_START
papers, authors, numbers = load_federalist_corpus("pg18.txt")
# AUTOLAB_IGNORE_STOP

The `papers` object is a list of strings, each one containing the full content of one of the Federalist Papers.  All tokens (words) in the text are separated by a single space (this includes some puncutation tokens, which have been modified to include spaces both before and after the punctuation. The `authors` object is a list of lists, which each list contains the author (or potential authors) of a given paper.  Finally the `numbers` list just contains the number of each Federalist paper.  You won't need to use this last one, but you may be curious to compare the results of your textual analysis to the opinion of historians.

## Bag of words, and TFIDF

In this portion of the question, we use a bag of words model to describe the corpus, and write routines to build a TFIDF matrix and a cosine similarity function.  Specifically, we first implement the TFIDF function below.  This should return a _sparse_ TFIDF matrix.

In [106]:
import collections # optional, but we found the collections.Counter object useful
import scipy.sparse as sp
import numpy as np

#Important: make sure you do _not_ include the empty token `""` as one of your terms.
def tfidf(docs):
    """
    Create TFIDF matrix.  This function creates a TFIDF matrix from the
    docs input.

    Args:
        docs: list of strings, where each string represents a space-separated
              document
    
    Returns: tuple: (tfidf, all_words)
        tfidf: sparse matrix (in any scipy sparse format) of size (# docs) x
               (# total unique words), where i,j entry is TFIDF score for 
               document i and term j
        all_words: list of strings, where the ith element indicates the word
                   that corresponds to the ith column in the TFIDF matrix
    """
    all_words = {word for doc in docs for word in doc.split(" ")}
    wordIndexDict = {word:i for i,word in enumerate(list(all_words))}
    
    docWordCounts = {}
    for i,doc in enumerate(docs):
        words = doc.split(" ")
        counts = collections.Counter(words)
        docWordCounts[i] = dict(counts)
    df = {}

    for doc, wordCountDict in docWordCounts.items():
        for word in wordCountDict.keys():
            if word in df:
                df[word] += 1
            else:
                df[word] = 1
    print (df)
                    
    col  = np.array([wordIndexDict[word] for doc, wordCountDict in docWordCounts.items() for word in wordCountDict.keys()])
    row  = np.array([doc for doc, wordCountDict in docWordCounts.items() for word in wordCountDict.keys()])

    data = np.array([count*np.log(len(docs)*1.0/df[word]) for doc, wordCountDict in docWordCounts.items() for word,count in wordCountDict.items()])

    A = sp.coo_matrix((data, (row,col)))
    A = A.tocsr()
    A.eliminate_zeros()
    
    
    
    all_words = list(all_words)
    return A , all_words
    

In [11]:
# AUTOLAB_IGNORE_START
A, all_words= tfidf(papers)
# AUTOLAB_IGNORE_STOP



Our version results the following result (just showing the type, size, and # of non-zero elements):

    <86x8686 sparse matrix of type '<type 'numpy.float64'>'
        with 57607 stored elements in Compressed Sparse Row format>
     
For testing, we also run the algorithm on the following simpler "dataset" :

In [12]:
### AUTOLAB_IGNORE_START
data = [
    "the goal of this lecture is to explain the basics of free text processing",
    "the bag of words model is one such approach",
    "text processing via bag of words"
]

X_tfidf, words = tfidf(data)
print (X_tfidf.todense())
print (words)

### AUTOLAB_IGNORE_STOP

{'the': 2, 'goal': 1, 'of': 3, 'this': 1, 'lecture': 1, 'is': 2, 'to': 1, 'explain': 1, 'basics': 1, 'free': 1, 'text': 2, 'processing': 2, 'bag': 2, 'words': 2, 'model': 1, 'one': 1, 'such': 1, 'approach': 1, 'via': 1}
[[1.09861229 1.09861229 0.81093022 0.         0.40546511 0.
  1.09861229 0.40546511 1.09861229 1.09861229 0.40546511 0.
  1.09861229 0.         0.         0.         0.         0.
  1.09861229]
 [0.         0.         0.40546511 1.09861229 0.40546511 0.40546511
  0.         0.         0.         0.         0.         0.40546511
  0.         0.         1.09861229 1.09861229 0.         1.09861229
  0.        ]
 [0.         0.         0.         0.         0.         0.40546511
  0.         0.40546511 0.         0.         0.40546511 0.40546511
  0.         0.         0.         0.         1.09861229 0.
  0.        ]]
['goal', 'this', 'the', 'approach', 'is', 'words', 'to', 'text', 'basics', 'explain', 'processing', 'bag', 'free', 'of', 'such', 'model', 'via', 'one', 'lect

For our implementation, this returns the following output:

    [[ 0.          0.          1.09861229  1.09861229  0.          1.09861229
       0.          0.40546511  0.40546511  1.09861229  0.          1.09861229
       0.          0.          0.40546511  1.09861229  0.81093022  0.
       1.09861229]
     [ 1.09861229  1.09861229  0.          0.          0.40546511  0.          0.
       0.40546511  0.          0.          1.09861229  0.          0.
       0.40546511  0.          0.          0.40546511  1.09861229  0.        ]
     [ 0.          0.          0.          0.          0.40546511  0.          0.
       0.          0.40546511  0.          0.          0.          1.09861229
       0.40546511  0.40546511  0.          0.          0.          0.        ]]
    ['model', 'such', 'basics', 'goal', 'bag', 'this', 'of', 'is', 'processing', 'free', 'one', 'to', 'via', 'words', 'text', 'lecture', 'the', 'approach', 'explain']

Next, we implement the following simply function that takes the X_tfidf matrix (though it could also take simple term frequency matrices, etc), and compute a matrix of all pair-wise cosine similarities.

Given 2 text documents `x` and `y` represented by TFIDF vectors their Cosine Similarity is calculated by:

$$
\large{Cosine\_Similarity = \frac{x^Ty}{||x||_2||y||_2}}
$$


In [13]:
def cosine_similarity(X):
    """
    Return a matrix of cosine similarities.
    
    Args:
        X: sparse matrix of TFIDF scores or term frequencies
    
    Returns:
        M: dense numpy array of all pairwise cosine similarities.  That is, the 
           entry M[i,j], should correspond to the cosine similarity between the 
           ith and jth rows of X.
    """
    numdocs = X.get_shape()[0]
    M = X.copy().tocoo()
    M.data = M.data**2.0
    sumsq = np.array(M.sum(axis=1))
    rtsq = sumsq**.5
    invRtsq = 1.0/rtsq
    invRtsq_xpd = np.array([invRtsq[row][0] for row in M.row])
    M.data= M.data**.5*invRtsq_xpd
    return M.dot(M.transpose()).todense()

In [74]:
# AUTOLAB_IGNORE_START
M = cosine_similarity(X_tfidf)
#print(M)
M = cosine_similarity(A)
print(M)
# AUTOLAB_IGNORE_STOP

[[1.         0.09233858 0.06292452 ... 0.05999312 0.07779552 0.19045773]
 [0.09233858 1.         0.0956748  ... 0.05403348 0.10726189 0.11176058]
 [0.06292452 0.0956748  1.         ... 0.09040971 0.07130256 0.08910432]
 ...
 [0.05999312 0.05403348 0.09040971 ... 1.         0.12917947 0.09258297]
 [0.07779552 0.10726189 0.07130256 ... 0.12917947 1.         0.15379717]
 [0.19045773 0.11176058 0.08910432 ... 0.09258297 0.15379717 1.        ]]


If we apply this function to the simpler text data:

    M = cosine_similarity(X_tfidf)
    
we get the result presented in the slides:

    [[ 1.          0.06796739  0.07771876]
     [ 0.06796739  1.          0.10281225]
     [ 0.07771876  0.10281225  1.        ]]

# Disputed essays

The authorship of seventy-three of The Federalist essays is fairly certain. Twelve of these essays are disputed over by some scholars, though the modern consensus is that Madison wrote essays Nos. 49–58, with Nos. 18–20 being products of a collaboration between him and Hamilton; No. 64 was by John Jay. The first open designation of which essay belonged to whom was provided by Hamilton who, in the days before his ultimately fatal gun duel with Aaron Burr, provided his lawyer with a list detailing the author of each number. This list credited Hamilton with a full sixty-three of the essays (three of those being jointly written with Madison), almost three-quarters of the whole, and was used as the basis for an 1810 printing that was the first to make specific attribution for the essays.

Madison did not immediately dispute Hamilton's list, but provided his own list for the 1818 Gideon edition of The Federalist. Madison claimed twenty-nine numbers for himself, and he suggested that the difference between the two lists was "owing doubtless to the hurry in which (Hamilton's) memorandum was made out." A known error in Hamilton's list — `Hamilton incorrectly ascribed No. 54 to John Jay, when in fact, Jay wrote No. 64` — provided some evidence for Madison's suggestion.

We use this model to analyze potential authorship of the unknown Federalist Papers.  Specifically, we compute the average cosine similarity between all the _known_ Hamilton papers and all the _unknown_ papers (and similarly between known Madison and unknown, and Jay and unknown).  We also analyze the similarity between Jay's papers and paper No. 54 and No. 64 to check which one of the two is more likely to be wriiten by John Jay.

In [87]:
unknown = [x for x in range(48,57)]
unknown += [18,19,20]
known = [x for x in range(86) if x not in unknown]
ham = [i for i in range(86) if authors[i]==('HAMILTON',)]
mad = [i for i in range(86) if authors[i]==('MADISON',)]
jay = [i for i in range(86) if authors[i]==('JAY',)]
known_vs_unknown_cosine_similarity = 0 
hamilton_cosine_similarity = 0 
madison_cosine_similarity = 0
jay_cosine_similarity = 0
n=0;
# Cosine similarity between known and unknown documents
for i in known:
    for j in unknown:
        n+=1
        known_vs_unknown_cosine_similarity+=M[i,j]
known_vs_unknown_mean_cosine_similarity=known_vs_unknown_cosine_similarity/n
n=0
# Cosine similarity between Hamilton's and unknown documents
for i in ham:
    for j in unknown:
        n+=1
        hamilton_cosine_similarity+=M[i,j]
hamilton_mean_cosine_similarity=hamilton_cosine_similarity/n
n=0
# Cosine similarity between Madison's and unknown documents
for i in mad:
    for j in unknown:
        n+=1
        madison_cosine_similarity+=M[i,j]
madison_mean_cosine_similarity=madison_cosine_similarity/n
n=0
# Cosine similarity between Jay's and unknown documents
for i in jay:
    for j in unknown:
        n+=1
        jay_cosine_similarity+=M[i,j]
jay_mean_cosine_similarity=jay_cosine_similarity/n
n=0
jay_54_cs=sum([M[53,j] for j in jay])/len(jay)
jay_64_cs=sum([M[63,j] for j in jay])/len(jay)
print("Average cosine similarity between all the known Hamilton papers and all the unknown papers : "+str(hamilton_mean_cosine_similarity))
print("Average cosine similarity between all the known Madison papers and all the unknown papers : "+str(madison_mean_cosine_similarity))
print("Average cosine similarity between all the known Jay papers and all the unknown papers : "+str(jay_mean_cosine_similarity))
print()
print("Average cosine similarity between all the known Jay papers and paper 54 : " + str(jay_54_cs))
print("Average cosine similarity between all the known Jay papers and paper 64 : " + str(jay_64_cs))

Average cosine similarity between all the known Hamilton papers and all the unknown papers : 0.07086083204746346
Average cosine similarity between all the known Madison papers and all the unknown papers : 0.08759226683038128
Average cosine similarity between all the known Jay papers and all the unknown papers : 0.06165186646276634

Average cosine similarity between all the known Jay papers and paper 54 : 0.05293146803817872
Average cosine similarity between all the known Jay papers and paper 64 : 0.29103211560620873


We clearly observe that there is a very high chance that Jay has written paper 64 compared to paper 54 .

## N-gram language models 

In this question, we will implement an n-gram model to be able to model the language used in the Federalist Papers in a more structured manner than the simple bag of words approach.  You will fill in the following class:

### Initializing the language model

First, we implement the `__init__()` function in the `LanguageModel` class.  We did this by building a two-level dictionary (in fact, we used the `collections.defaultdict` class, but this only make a few things a little bit shorter) `self.counts`, where the first key refers to the previous $n-1$ tokens, and the second key refers to the $n$th token, and the value simply stores the count of the number of times this combination was seen.  For ease of use in later function, we also created a `self.count_sums`, which contains the number of total times each $n-1$ combination was seen.

For example, letting `l_hamilton` be a LanguageModel object built just from all the known Hamilton papers and with `n = 3`, the following varibles are populated in the object:

    l_hamilton.counts["privilege of"] = {'being': 1, 'originating': 1, 'paying': 1, 'the': 1}
    l_hamilton.count_sums["privilege of"] = 4
    
We also build a `self.dictionary` variable, which is just a `set` object containing all the unique words in across the entire set of input document.

### Part B: Computing perplexity

Next, we implement the `perplexity()` function, which takes a text sample and computes the perplexity of this sample under the model.

Perplexity of a document is given by the formula below:

$$
\large{Perplexity = \Bigg(\frac{1}{P(word_1,word_2,word_3....word_n)}\Bigg)^{1/N}}
$$

we can also write this as

$$
\large{Perplexity = 2^{\frac{\log_2{P(word_1,word_2,word_3....word_n)}}{N}}}
$$

Here `P()` calculates product of probablities of a word `i` appearing after a sequence of `N-1` words for a N-gram Language Model.

$$
\large{P(word_1,word_2,word_3....word_n) = \prod_{i=n}^{N} P(word_i|word_{i-n+1},word_(i-n+2)....word_{i-1})}
$$

we can also write `P()` for a specific word as:

$$
\large{P(word_{i}|word_{i-n+1},word_(i-n+2)....word_{i-1}) = \frac{\#(word_{i-n+1},word_{i-n+2}...word_{i} + \alpha)}{\# (word_{i-n+1},word_{i-n+2}...word_{i-1} + \alpha D)}}
$$

where $\alpha$ is known as the Laplace Smoothing Constant and $D$ is total size of dictionary.

As a simple example, if we build our `l_hamilton` model above (again, with `n=3`) and using default settings so that `alpha = 1e-3`, and run in on `papers[0]` (which was written by Hamilton), we get:

    l_hamilton.perplexity(papers[0]) = 12.5877

### Part C: Generating text

Finally, we implement the `sample()` function to generate random samples of text.  Initially we pick some random starting $n-1$ tuples, then sample according to the model.  Here we have _not_ used any Laplace smoothing, but just sample from the raw underlying counts.

One potential failure case, since we're just using raw counts, is if we generate an n-gram that _only_ occurs at the very end of a document (and so has no following n-gram observed in the data).  In this situation, we generate a new random set of $n-1$ tuples, and continue generating.

Here's what a sample of 200 words from our Hamilton model looks like (of course all random samples will be different). 

    'erroneous foundation . the authorities essential to the mercy of the body politic against these two legislatures coexisted for ages in two ways : either by actual possession of which , if it cease to regard the servile pliancy of the states which have a salutary and powerful means , by those who appoint them . they are rather a source of delinquency , it would enable the national situation , and consuls , judges of their own immediate aggrandizement ? would she not have been felt by those very maxims and councils which would be mutually questions of property ? and will he not only against the united netherlands with that which has been presumed . the censure attendant on bad measures among a multitude that might have been of a regular and effective system of civil polity had previously established in this state , but upon our neutrality . by thus circumscribing the plan of opposition , and the industrious habits of the trust committed to hands which could not be likely to do anything else . when will the time at which we might soar to a deed for conveying property of the people , were dreaded and detested'



In [103]:

class LanguageModel:
    def __init__(self, docs, n):
        """
        Initialize an n-gram language model.
        
        Args:
            docs: list of strings, where each string represents a space-separated
                  document
            n: integer, degree of n-gram model
        """
        count_sums = {}
        counts = {}
        for doc in docs:
            tokens = doc.split()
            prevN = []
            for i,token in enumerate(tokens[:-1]):
                prevN.append(token)
                if len(prevN)>=n-1:
                    prevNstr = " ".join(prevN)
                    if prevNstr in counts:
                        if tokens[i+1] in counts[prevNstr]:
                            counts[prevNstr][tokens[i+1]] += 1
                            count_sums[prevNstr] += 1
                        else:
                            counts[prevNstr][tokens[i+1]] = 1
                            count_sums[prevNstr] += 1
                    else:
                        counts[prevNstr] = {tokens[i+1]:1}
                        count_sums[prevNstr] = 1
                    prevN = prevN[1:]
                    
        self.count_sums = count_sums
        self.counts = counts
        self.dictionary = {word for doc in docs for word in doc.split()}
        self.n = n
                    
    
    def perplexity(self, text, alpha=1e-3):
        """
        Evaluate perplexity of model on some text.
        
        Args:
            text: string containing space-separated words, on which to compute
            alpha: constant to use in Laplace smoothing
            
        Note: for the purposes of smoothing, the dictionary size (i.e, the D term)
        should be equal to the total number of unique words used to build the model
        _and_ in the input text to this function.
            
        Returns: perplexity
            perplexity: floating point value, perplexity of the text as evaluted
                        under the model.
        """
            
        tokens = text.split()
        D = len(set(tokens).union(self.dictionary))
        N = len(tokens)
        
        wordLogProbs = []
        for i,word in enumerate(tokens):
            num = alpha
            denom = alpha*D
            if i >= self.n - 1:
                prevN = tokens[(i-(self.n-1)):i]
                prevNstr = " ".join(prevN)
                if prevNstr in self.counts:
                    if word in self.counts[prevNstr]:
                        num += self.counts[prevNstr][word]
                    denom += self.count_sums[prevNstr]
                wordLogProbs.append(np.log(num)-np.log(denom))
            
        perp = np.exp(-1.0/(N-self.n+1) *(sum(wordLogProbs)))
        return perp


        
    def sample(self, k):
        """
        Generate a random sample of k words.
        
        Args:
            k: integer, indicating the number of words to sample
            
        Returns: text
            text: string of words generated from the model.
        """
        def getKeyListProp(d):
            key_lst_lst = [[key]*count for key,count in d.items()]
            key_lst = [key for key_lst in key_lst_lst for key in key_lst]
            return key_lst        
        # pick a random dictionary key, taking the proportion of the values into account
        def getRandKey(key_lst):
            rand_index = np.random.randint(len(key_lst))
            return key_lst[rand_index]
        
        count_sumsKeyList = getKeyListProp(self.count_sums)
        
        words = []
        prevN = getRandKey(count_sumsKeyList)
        words.extend(prevN.split())
        while len(words)<k:
            
            if len(words)>=self.n-1:
                prevN = " ".join(words[-(self.n-1):])
                if prevN in self.counts:
                    prevNdict = self.counts[prevN]
                else:
                    prevN = getRandKey(count_sumsKeyList)
                    prevNdict = self.counts[prevN]
            else:
                prevN = getRandKey(count_sumsKeyList)
                prevNdict = self.counts[prevN]
                
            lastN = getRandKey(getKeyListProp(prevNdict))
            
            words.append(lastN)
        text = " ".join(words)
        return text
            


We test our Language Model by analyzing Hamilton's papers and checking our perplexity function to find perplexity of other papers to those written by Hamilton. 

In [104]:
# AUTOLAB_IGNORE_START
# papers, authors
# Hamilton's papers
hamdocs = [paper for paper,author in zip(papers,authors) if author==('HAMILTON',)]
l_hamilton = LanguageModel(hamdocs,3)

# Perplexity of Hamilton papers and all other papers
for i in range(85):
     print(authors[i],l_hamilton.perplexity(papers[i]))

# AUTOLAB_IGNORE_STOP

('HAMILTON',) 12.587724360648497
('JAY',) 2671.547031355331
('JAY',) 1778.371482965479
('JAY',) 2321.2457428855487
('JAY',) 2906.3445261142747
('HAMILTON',) 12.557137515404879
('HAMILTON',) 13.166360645077445
('HAMILTON',) 12.894749430196795
('HAMILTON',) 12.86780004391564
('MADISON',) 1658.2453402535689
('HAMILTON',) 12.65279940253859
('HAMILTON',) 12.985126546959675
('HAMILTON',) 12.645013808439622
('MADISON',) 2232.2621304060303
('HAMILTON',) 12.731666112033137
('HAMILTON',) 13.050845506927281
('HAMILTON',) 12.880771983633787
('HAMILTON', 'MADISON') 3289.260096257335
('HAMILTON', 'MADISON') 3070.6332136143033
('HAMILTON', 'MADISON') 3232.9613460083865
('HAMILTON',) 13.167408133726953
('HAMILTON',) 13.122074912536595
('HAMILTON',) 12.756785341829604
('HAMILTON',) 12.28403702124601
('HAMILTON',) 12.885564017572527
('HAMILTON',) 12.51888817677427
('HAMILTON',) 12.626842578530017
('HAMILTON',) 13.053502049141583
('HAMILTON',) 12.676866868311398
('HAMILTON',) 12.543319450451436
('HAMILTO

We clearly observe that the perplexity of papers written by Hamilton have lesser perplexity than other which indicates that our Language Model works properly.


Using this model, evaluate the mean of the perplexity of the unknown Federalist papers for the language models from each of the three authors (again, using `n=3` and the default of `alpha=1e-3`).  Populate the following variables with the mean perplexities.

In [105]:
# AUTOLAB_IGNORE_START
samp = l_hamilton.sample(200)
print(samp)
# AUTOLAB_IGNORE_STOP

, what would be attended to . but their good sense , corresponding with the most certain road to the fourth point rests on weak and unsubstantial foundations . it is easy to make him . how would it not natural that the most eligible of any magnitude that army can only be avoided by a government like the paradoxes of rhetoricians ; as that construction may expose the union from maine to georgia . the only use of the whole , have seen that the difficulty of governing thirteen states at any time obstruct the execution of that of establishing an auxiliary method of steering clear of an act . nor is the safest course to appeals in the hands of government must sink into a concise review of these circumstances would be stifled and lost , it is confined to the force of these states with each other , at first only unequal and disproportionate degrees of it had not communicated itself to the views of the matter suggests several important consequences . it is less proper than to have so important

In [90]:
# AUTOLAB_IGNORE_START
%timeit perp = l_hamilton.perplexity(samp)
# AUTOLAB_IGNORE_STOP

1.24 ms ± 245 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# Conlusion

We analyzed the Federalist Papers using two techniques: bag of words approach based upon creating a TFIDF Matrix and an n-gram Language Model.