In this notebook, you'll explore part of speech tagging using the Penn Treebank tagset (along with the performance of POS tagging in Spacy.)

In [15]:
import spacy, glob, os

In [16]:
nlp = spacy.load('en_core_web_sm', disable=['ner,parser'])
nlp.remove_pipe('ner')
nlp.remove_pipe('parser')

('parser', <spacy.pipeline.dep_parser.DependencyParser at 0x16e618190>)

In [17]:
def get_spacy_tags(text):
    doc=nlp(text)
    for word in doc:
        print(word.text, word.tag_)

get_spacy_tags("Open the pod bay doors Hal")

Open VB
the DT
pod NNP
bay NNP
doors NNS
Hal NNP


In [20]:
def read_docs(inputDir, maxDocs=100):
    """ Read in movie documents (all ending in .txt) from an input folder
    and process with spacy """
    
    docs=[]
    for idx, filename in enumerate(glob.glob(os.path.join(inputDir, '*.txt'))):
        with open(filename) as file:
            docs.append((filename, nlp(file.read())))
        if idx >= maxDocs:
            break
    return docs

In [21]:
# directory with 2000 movies summaries from Wikipedia
inputDir="../data/movie_summaries/"
docs=read_docs(inputDir, maxDocs=100)

Here are the 45 tags used by the Penn Treebank:

|tag|meaning|
|---|---|
|CC|Coordinating conjunction|
|CD|Cardinal number|
|DT|Determiner|
|EX|Existential there|
|FW|Foreign word|
|IN|Preposition or subordinating conjunction|
|JJ|Adjective|
|JJR|Adjective, comparative|
|JJS|Adjective, superlative|
|LS|List item marker|
|MD|Modal|
|NN|Noun, singular or mass|
|NNS|Noun, plural|
|NNP|Proper noun, singular|
|NNPS|Proper noun, plural|
|PDT|Predeterminer|
|POS|Possessive ending|
|PRP|Personal pronoun|
|PRP\$|Possessive pronoun|
|RB|Adverb|
|RBR|Adverb, comparative|
|RBS|Adverb, superlative|
|RP|Particle|
|SYM|Symbol|
|TO|to|
|UH|Interjection|
|VB|Verb, base form|
|VBD|Verb, past tense|
|VBG|Verb, gerund or present participle|
|VBN|Verb, past participle|
|VBP|Verb, non-3rd person singular present|
|VBZ|Verb, 3rd person singular present|
|WDT|Wh-determiner|
|WP|Wh-pronoun|
|WP\$|Possessive wh-pronoun|
|WRB|Wh-adverb|
|.|period|
|,|comma|
|:|colon|
|(|left separator|
|)|right separator|
|$|dollar sign|
|\`\`|open double quotes|
|''|close double quotes|

Explore these tags below by searching for sentences in the (automatically tagged) movie summary corpus that have been tagged for each one.

In [22]:
def find_examples(docs, tag, num_examples=10, window=5):
    count=0
    for _, doc in docs:
        for idx, token in enumerate(doc[window:-window]):
            if token.tag_ == tag:
                print (' '.join(["%s" % context.text for context in doc[idx:idx+window ]]), "\033[91m%s\033[0m" % doc[idx+window].text, ' '.join(["%s" % context.text for context in doc[idx+window+1:idx+window+window+1] ]))
                # for windows users - you may want to use the following print statement
                # to highlight the middle token in each sentence using #s
                # print (' '.join(["%s" % context.text for context in doc[idx:idx+window ]]), "#%s#" % doc[idx+window].text, ' '.join(["%s" % context.text for context in doc[idx+window+1:idx+window+window+1] ]))
                count+=1
                if count >= num_examples:
                    return

In [23]:
find_examples(docs, "JJ", num_examples=10, window=5)

, ten - year - [91mold[0m Tre Styles   lives with
Styles   lives with his [91msingle[0m mother Reva Devereaux   in
that although Tre is rather [91mintelligent[0m , he is immature and
rather intelligent , he is [91mimmature[0m and lacks respect for classmates
classmates and adults alike . [91mFrightened[0m about the future of her
his 27 - year - [91mold[0m father , Furious Styles ,
eventually decide the crime is [91munimportant[0m because nothing was taken and
and the burglar escaped completely [91munharmed[0m . The police , particularly
The police , particularly the [91mAfrican[0m American officer , treat Furious
police , particularly the African [91mAmerican[0m officer , treat Furious with


What's the difference between the following?

* PRP and PRP$
* NN and NNP
* JJ and JJR
* VBZ and VB

In [25]:
find_examples(docs, "PRP")

# PRP is a personal pronoun

Tre is rather intelligent , [91mhe[0m is immature and lacks respect
her child , Reva sends [91mhim[0m to live in the Crenshaw
Furious Styles , from whom [91mshe[0m hopes Tre will learn life
of Tre 's arrival , [91mhe[0m hears his father firing at
and street - smart . [91mHe[0m soon gets into a fight
The ball is returned to [91mhim[0m later by a Crips gang
a fishing trip , where [91mthey[0m talk , and he asks
where they talk , and [91mhe[0m asks him about sexual nature
talk , and he asks [91mhim[0m about sexual nature and discusses
the responsibility of fatherhood to [91mhim[0m . The pair return to


In [26]:
find_examples(docs, "PRP$")

# PRP$ is a possessive pronoun

Tre Styles   lives with [91mhis[0m single mother Reva Devereaux  
a fight at school , [91mhis[0m teacher calls Reva . The
Frightened about the future of [91mher[0m child , Reva sends him
South Central Los Angeles with [91mhis[0m 27 - year - old
's arrival , he hears [91mhis[0m father firing at a burglar
" Doughboy " Baker , [91mhis[0m maternal half - brother Ricky
Doughboy and Ricky live with [91mtheir[0m mother across the street from
, lives at home with [91mhis[0m mother Brenda , girlfriend Shanice
, girlfriend Shanice , and [91mhis[0m newborn son . After the
walks home with leftovers for [91mhis[0m father . As he walks


In [28]:
find_examples(docs, "NN")
# noun

In 1984 , ten - [91myear[0m - old Tre Styles  
  lives with his single [91mmother[0m Reva Devereaux   in Inglewood
Tre gets involved in a [91mfight[0m at school , his teacher
involved in a fight at [91mschool[0m , his teacher calls Reva
fight at school , his [91mteacher[0m calls Reva . The teacher
teacher calls Reva . The [91mteacher[0m informs Reva that although Tre
he is immature and lacks [91mrespect[0m for classmates and adults alike
alike . Frightened about the [91mfuture[0m of her child , Reva
about the future of her [91mchild[0m , Reva sends him to
to live in the Crenshaw [91mneighborhood[0m of South Central Los Angeles
ten - year - old [91mTre[0m Styles   lives with his
lives with his single mother [91mReva[0m Devereaux   in Inglewood ,
with his single mother Reva [91mDevereaux[0m   in Inglewood , California
mother Reva Devereaux   in [91mInglewood[0m , California . After Tre
Devereaux   in Inglewood , [91mCalifornia[0m . After Tre gets involved
In

In [29]:
find_examples(docs, "NNP")
# proper noun

ten - year - old [91mTre[0m Styles   lives with his
lives with his single mother [91mReva[0m Devereaux   in Inglewood ,
with his single mother Reva [91mDevereaux[0m   in Inglewood , California
mother Reva Devereaux   in [91mInglewood[0m , California . After Tre
Devereaux   in Inglewood , [91mCalifornia[0m . After Tre gets involved
Inglewood , California . After [91mTre[0m gets involved in a fight
school , his teacher calls [91mReva[0m . The teacher informs Reva
Reva . The teacher informs [91mReva[0m that although Tre is rather
teacher informs Reva that although [91mTre[0m is rather intelligent , he
future of her child , [91mReva[0m sends him to live in


In [36]:
find_examples(docs, "JJ")
# adjective (all)

, ten - year - [91mold[0m Tre Styles   lives with
Styles   lives with his [91msingle[0m mother Reva Devereaux   in
that although Tre is rather [91mintelligent[0m , he is immature and
rather intelligent , he is [91mimmature[0m and lacks respect for classmates
classmates and adults alike . [91mFrightened[0m about the future of her
his 27 - year - [91mold[0m father , Furious Styles ,
eventually decide the crime is [91munimportant[0m because nothing was taken and
and the burglar escaped completely [91munharmed[0m . The police , particularly
The police , particularly the [91mAfrican[0m American officer , treat Furious
police , particularly the African [91mAmerican[0m officer , treat Furious with


In [31]:
find_examples(docs, "JJR")
# comparative adjective (says that something is greater or less than or somehow compares two clauses)

burglar . LAPD officers arrive [91mmore[0m than an hour later ,
he has stomach cancer and [91mless[0m than a year to live
Anything else is always something [91mbetter[0m . " While Brian has
named Bonnie , a wealthy [91molder[0m woman . Jordan catches Brian
, I suppose in some [91mgreater[0m way . " Alma constantly
to clean up after the [91mearlier[0m mess . Bogomil fabricates a
they are summoned to a [91mhigher[0m priority call . Vincent then
with the Weasleys and becomes [91mcloser[0m to Ginny . They almost
book may be filled with [91mmore[0m Dark Magic , Ginny and
he will be mortal once [91mmore[0m . Rather than return for


In [34]:
find_examples(docs, "VBZ")
# present tense verb (ends with s)

- old Tre Styles   [91mlives[0m with his single mother Reva
, California . After Tre [91mgets[0m involved in a fight at
at school , his teacher [91mcalls[0m Reva . The teacher informs
calls Reva . The teacher [91minforms[0m Reva that although Tre is
informs Reva that although Tre [91mis[0m rather intelligent , he is
is rather intelligent , he [91mis[0m immature and lacks respect for
, he is immature and [91mlacks[0m respect for classmates and adults
of her child , Reva [91msends[0m him to live in the
Styles , from whom she [91mhopes[0m Tre will learn life lessons
Tre 's arrival , he [91mhears[0m his father firing at a


In [43]:
find_examples(docs, "VB")
# present tense verbs

, Reva sends him to [91mlive[0m in the Crenshaw neighborhood of
whom she hopes Tre will [91mlearn[0m life lessons . On the
hour later , and eventually [91mdecide[0m the crime is unimportant because
the African American officer , [91mtreat[0m Furious with disrespect and contempt
gets into a fight to [91mretrieve[0m Ricky 's stolen football ,
Furious , who appears to [91mbe[0m the only father present in
street . He hurries to [91mpick[0m her up and brings her
angrily reminding her to " [91mkeep[0m the babies off the streets
his first failed attempt to [91mhave[0m sex . Tre 's father
from the USC comes to [91minterview[0m Ricky about college , with


Q2: Use the `find_examples` function to help understand the usage of each part-of-speech tag; after doing so, manually tag the following four sentences (if you're doing this in class, you can work with a partner!)

1. "Open[VB] the[DT] pod[NN] bay[NN] doors[NN], Hal[NNP]" 

2. "Frankly[RB], my[PRP$] dear[NN], I[PRP] don't[VB] give[VB] a[DT] damn[NN]"  

3. "May[?] the[DT] Force[NNP] be[VB] with[?] you[PRP]" 

4. One[CD] morning[NN] I[PRP] shot[VB] an elephant[NN] in my[PRP$] pajamas[NN]. How he[PRP] got[VB] in my[PRP$] pajamas[NN], I[PRP] don't know  

Q3. After tagging the sentences above by hand, run them through the spacy tagger; what's spacy's accuracy on these sentences?

The spacy tags are often more specific than my manual tags, but they are really accurate generally!

In [44]:
get_spacy_tags("Open the pod bay doors, Hal")

Open VB
the DT
pod NNP
bay NNP
doors NNS
, ,
Hal NNP


In [45]:
get_spacy_tags("Frankly, my dear, I don't give a damn")

Frankly RB
, ,
my PRP$
dear NN
, ,
I PRP
do VBP
n't RB
give VB
a DT
damn NN


In [46]:
get_spacy_tags("May the Force be with you")

May MD
the DT
Force NNP
be VB
with IN
you PRP


In [47]:
get_spacy_tags("One morning I shot an elephant in my pajamas. How he got in my pajamas, I don't know.")

One CD
morning NN
I PRP
shot VBD
an DT
elephant NN
in IN
my PRP$
pajamas NNS
. .
How WRB
he PRP
got VBD
in IN
my PRP$
pajamas NNS
, ,
I PRP
do VBP
n't RB
know VB
. .
