In [1]:
import nltk
from nltk.corpus import brown
import pprint

##A Simple Baseline Tagger##
Keep in mind that the brown corpus is already tagged.  The simplest possible tagger assigns the **most likely** tag to each token. This establishes a baseline tagger.  So let's use the data we have to figure out what the most likely tag for English is.



In [3]:
def most_likely_brown_tag():
    tags = [tag for (word, tag) in brown.tagged_words(categories='news')]
    return (nltk.FreqDist(tags).max())

We can use FreqDist and max() to find out which tag is the **most likely tag** for English according to the Brown corpus by counting how many tags have been assigned to the words in this corpus.

In [None]:
most_likely_brown_tag()

Now that we know empirically which is the most likely tag for English, we can make a baseline tagger that automatically assigns the most likely tag when we don't know what else to do.

In [6]:
def make_default_tagger(init_tag = 'NN'):
    return(nltk.DefaultTagger(init_tag))
    
def run_tagger_on_sentence(tagger, sent):
    tokens = nltk.word_tokenize(sent)
    print(tagger.tag(tokens))
    
sent = r'''what will this silly tagger do?'''
run_tagger_on_sentence(make_default_tagger(),sent)

[('what', 'NN'), ('will', 'NN'), ('this', 'NN'), ('silly', 'NN'), ('tagger', 'NN'), ('do', 'NN'), ('?', 'NN')]


##Train a Unigram Tagger From Pre-Tagged Text##
Now train a unigram tagger on the news portion of the Brown corpus.

In [5]:
def train_unigram_tagger(training_sents):
    return(nltk.UnigramTagger(training_sents))

# train on brown news
def train_unigram_tagger_on_brown():
    brown_train_sents = brown.tagged_sents(categories='news')
    unigram_tagger = train_unigram_tagger(brown_train_sents)
    return(unigram_tagger)

run_tagger_on_sentence(train_unigram_tagger_on_brown(), sent)

NameError: name 'run_tagger_on_sentence' is not defined

###Separate Training From Testing Data###
But really we need to separate training and testing data.  We can use the handy python string slicing operator to do this *really* easily.  Here we divide into 90% training and 10% testing data.

In [8]:
def create_data_sets(tagged_sents):
    size = int(len(tagged_sents) * 0.9)
    train_sents = tagged_sents[:size]
    test_sents = tagged_sents[size:]
    return train_sents, test_sents

def train_and_test_unigram_tagger_on_brown():
    sample_sents = brown.tagged_sents(categories='news')
    brown_train_sents, brown_test_sents = create_data_sets(sample_sents)
    unigram_tagger = train_unigram_tagger(brown_train_sents)
    return(unigram_tagger, brown_train_sents, brown_test_sents)


ut, brown_train_sents, brown_test_sents = train_and_test_unigram_tagger_on_brown()
run_tagger_on_sentence(ut, sent)

[('what', 'WDT'), ('will', 'MD'), ('this', 'DT'), ('silly', 'JJ'), ('tagger', None), ('do', 'DO'), ('?', '.')]


###Evaluation Metric###
NLTK's tagger has a handy evaluation function built right in!  It automatically compares the output of your tagger with the tags assigned to the Brown corpus.  The score shown below is the average across the entire test collection.

In [9]:
print ("%0.3f" % ut.evaluate(brown_test_sents))


0.812


###Question###
What is this evaluation metric measuring?
* Answer: 



Which tags did it get wrong?  If you want to see what the gold standard tags were vs. what the tagger produced, here is some code to do it (written by Jason Ost, MIMS from 2014).  The first column is the word, the second is the tag from the gold standard, and the third is what the algorithm assigned.  (The last element is a little tricky: the tagger's tag() function expects a list of words as input, so you have to enclose "w" in square brackets, and it returns a list of tagged words (as two-element tuples), so you have to grab the second element of the first tuple, which is the predicted tag.  This works because the unigram tagger looks at each word in isolation.)

In [10]:
def evaluate_unigram_tagger_output (tagger, test_sent):
   return ( [(w, t, tagger.tag([w])[0][1]) for w, t in test_sent])   
    
evaluate_unigram_tagger_output(ut, brown_test_sents[3])    
 

[('For', 'IN', 'IN'),
 ('18', 'CD', 'CD'),
 ('months', 'NNS', 'NNS'),
 (',', ',', ','),
 ('Hamilton', 'NP', None),
 ('Holmes', 'NP', None),
 (',', ',', ','),
 ('19', 'CD', 'CD'),
 (',', ',', ','),
 ('and', 'CC', 'CC'),
 ('Charlayne', 'NP', None),
 ('Hunter', 'NP', 'NP-TL'),
 (',', ',', ','),
 ('18', 'CD', 'CD'),
 (',', ',', ','),
 ('had', 'HVD', 'HVD'),
 ('tried', 'VBN', 'VBN'),
 ('to', 'TO', 'TO'),
 ('get', 'VB', 'VB'),
 ('into', 'IN', 'IN'),
 ('the', 'AT', 'AT'),
 ('university', 'NN', 'NN'),
 ('.', '.', '.')]

Try the another sentence.

In [11]:
evaluate_unigram_tagger_output(ut, brown_test_sents[20])

[('``', '``', '``'),
 ('Surprised', 'VBN', None),
 ('and', 'CC', 'CC'),
 ('pleased', 'VBN', 'VBN'),
 ("''", "''", "''"),
 (',', ',', ','),
 ('Students', 'NNS', 'NNS'),
 ('Holmes', 'NP', None),
 ('and', 'CC', 'CC'),
 ('Hunter', 'NP', 'NP-TL'),
 ('may', 'MD', 'MD'),
 ('enter', 'VB', 'VB'),
 ('the', 'AT', 'AT'),
 ('University', 'NN-TL', 'NN-TL'),
 ('of', 'IN-TL', 'IN'),
 ('Georgia', 'NP-TL', 'NP'),
 ('this', 'DT', 'DT'),
 ('week', 'NN', 'NN'),
 ('.', '.', '.')]

##Train an N-Gram Tagger With Backoff ##

Below is code for a bigram tagger with backoff.  When it encounters a token, it first
1. Tries tagging the token with the bigram tagger.
2. If the bigram tagger is unable to find a tag for the token, tries the unigram tagger.
3. If the unigram tagger is also unable to find a tag, uses the default tagger.

In [12]:
def build_backoff_tagger (train_sents):
    t0 = nltk.DefaultTagger('NN')
    t1 = nltk.UnigramTagger(train_sents, backoff=t0)
    t2 = nltk.BigramTagger(train_sents, backoff=t1)
    return t2
ngram_tagger = build_backoff_tagger(brown_train_sents)
bigram_tagger = ngram_tagger
print ("%0.3f" % ngram_tagger.evaluate(brown_test_sents))

0.845


## EXERCISE: Train and Evaluate a Trigram Tagger ##

Modify build_backoff_tagger() to build a backoff trigram tagger.  Evaluate the results.  How does it do compared to the previous backoff tagger?

In [15]:
def build_backoff_tagger (train_sents):
    t0 = nltk.DefaultTagger('NN')
    t1 = nltk.UnigramTagger(train_sents, backoff=t0)
    t2 = nltk.BigramTagger(train_sents, backoff=t1)
    t3 = nltk.TrigramTagger(train_sents, backoff=t2)
    return t3
ngram_tagger = build_backoff_tagger(brown_train_sents)
trigram_tagger = ngram_tagger
print ("%0.3f" % ngram_tagger.evaluate(brown_test_sents))

0.843


## EXERCISE: Train a Simplified Tagger ##
Train and evaluate a bigram backoff tagger like the one above but using the universal Brown tagset (or make a tagset of your own by discarding all but the first character of each tag name). This tagger has fewer distinctions to make but more ambiguity.  Evaluate its performance.  How does it compare to the previous tagger?

## Evaluating a Tagger by Looking at Tags that Follow Tags ##
(For this exercise, use your regular tagger, not the simplified one.)  The word **to** is frequently confused; it can be helpful to inspect the context it occurs in.  This code shows how to view the frequency of the tag that *follows* the word.

In [None]:
def examine_tag_contexts(tagger, target_word, target_tag):
    test_sents = [tagger.tag(sent) for sent in brown.sents(categories='editorial')]
    tags = [b[1] for test_sent in test_sents 
            for (a,b) in nltk.bigrams(test_sent)
            if a[0] == target_word and a[1] == target_tag]
    fd = nltk.FreqDist(tags)
    print ("Tags that follow the target word and tag " + target_word + " and " + target_tag)
    fd.tabulate(15)
examine_tag_contexts(ngram_tagger, 'to', 'TO')
examine_tag_contexts(ngram_tagger, 'to', 'IN')                                               