This notebook compares the performance of multiple POS taggers, specifically in the case of ambiguous sentences. Consider the following 5 sentences.

In [42]:
sent1 = "We saw her duck."

sent2 = "I saw her duck swim."

sent3 = "I saw her duck swim quickly."

#sent4 = "Change noun to verb."

#sent5 = "Change the noun to a verb."

sentences = [sent1, sent2, sent3]

# NLTK POS Tagger

In [43]:
import nltk

In [44]:
for sentence in sentences:
    user_sentence = nltk.word_tokenize(sentence)
    print(nltk.pos_tag(user_sentence))
    print()

[('We', 'PRP'), ('saw', 'VBD'), ('her', 'PRP'), ('duck', 'NN'), ('.', '.')]

[('I', 'PRP'), ('saw', 'VBD'), ('her', 'PRP'), ('duck', 'NN'), ('swim', 'NN'), ('.', '.')]

[('I', 'PRP'), ('saw', 'VBD'), ('her', 'PRP'), ('duck', 'NN'), ('swim', 'JJ'), ('quickly', 'RB'), ('.', '.')]



# TextBlob POS Tagger

In [45]:
from textblob import TextBlob

In [46]:
for sentence in sentences:
    blob = TextBlob(sentence)
    print(blob.tags)
    print()

[('We', 'PRP'), ('saw', 'VBD'), ('her', 'PRP'), ('duck', 'NN')]

[('I', 'PRP'), ('saw', 'VBD'), ('her', 'PRP'), ('duck', 'NN'), ('swim', 'NN')]

[('I', 'PRP'), ('saw', 'VBD'), ('her', 'PRP'), ('duck', 'NN'), ('swim', 'JJ'), ('quickly', 'RB')]



# Spacy POS Tagger

In [47]:
import spacy 

nlp = spacy.load("en_core_web_sm") 

In [48]:
for sentence in sentences:
    sent = nlp(sentence) 
    for word in sent: 
          print("('" + str(word) + "', '" + str(word.pos_) + "''),", end =" ") 
            
    print()
    print()

('We', 'PRON''), ('saw', 'VERB''), ('her', 'DET''), ('duck', 'NOUN''), ('.', 'PUNCT''), 

('I', 'PRON''), ('saw', 'VERB''), ('her', 'DET''), ('duck', 'NOUN''), ('swim', 'NOUN''), ('.', 'PUNCT''), 

('I', 'PRON''), ('saw', 'VERB''), ('her', 'DET''), ('duck', 'NOUN''), ('swim', 'NOUN''), ('quickly', 'ADV''), ('.', 'PUNCT''), 



# Results & Discussion

1) In example #1, duck can be labeled as a noun or a verb, but only the noun is considered. A model could be trained to allow for consideration of the verb.

2) In example #2, duck should intuitively be labeled as a noun and swim as a verb. However, this is not the case.

3) In example #3, it is known that an adverb should modify a verb. However, some of the taggers have an adverb modifying a noun.

In [None]:
# Has trouble with infinite markers like 'to'. Think about how this is typoically used in a sentence: to walk, to run, etc.