# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [2]:
with open('../TextFiles/peterrabbit.txt') as f:
    doc = nlp(f.read())

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [15]:
sent3 = list(doc.sents)[3]
for token in sent3:
    print(f'{token.text:{12}} {token.pos_:{8}} {token.tag_:{8}} {spacy.explain(token.tag_)}')

They         PRON     PRP      pronoun, personal
lived        VERB     VBD      verb, past tense
with         ADP      IN       conjunction, subordinating or preposition
their        DET      PRP$     pronoun, possessive
Mother       PROPN    NNP      noun, proper singular
in           ADP      IN       conjunction, subordinating or preposition
a            DET      DT       determiner
sand         NOUN     NN       noun, singular or mass
-            PUNCT    HYPH     punctuation mark, hyphen
bank         NOUN     NN       noun, singular or mass
,            PUNCT    ,        punctuation mark, comma
underneath   ADP      IN       conjunction, subordinating or preposition
the          DET      DT       determiner
root         NOUN     NN       noun, singular or mass
of           ADP      IN       conjunction, subordinating or preposition
a            DET      DT       determiner

            SPACE    _SP      None
very         ADV      RB       adverb
big          ADJ      JJ       adj

In [3]:
# Enter your code here:




They         PRON   PRP    pronoun, personal
lived        VERB   VBD    verb, past tense
with         ADP    IN     conjunction, subordinating or preposition
their        ADJ    PRP$   pronoun, possessive
Mother       PROPN  NNP    noun, proper singular
in           ADP    IN     conjunction, subordinating or preposition
a            DET    DT     determiner
sand         NOUN   NN     noun, singular or mass
-            PUNCT  HYPH   punctuation mark, hyphen
bank         NOUN   NN     noun, singular or mass
,            PUNCT  ,      punctuation mark, comma
underneath   ADP    IN     conjunction, subordinating or preposition
the          DET    DT     determiner
root         NOUN   NN     noun, singular or mass
of           ADP    IN     conjunction, subordinating or preposition
a            DET    DT     determiner

            SPACE         None
very         ADV    RB     adverb
big          ADJ    JJ     adjective
fir          NOUN   NN     noun, singular or mass
-            PUNCT 

**3. Provide a frequency list of POS tags from the entire document**

In [29]:
count_dict = doc.count_by(spacy.attrs.POS);count_dict
for key,value in count_dict.items():
    print(f'{key:{3}}. {doc.vocab[key].text:{10}} : {value}')

 90. DET        : 118
 96. PROPN      : 73
 85. ADP        : 123
 97. PUNCT      : 174
 93. NUM        : 8
103. SPACE      : 99
 86. ADV        : 67
 98. SCONJ      : 20
 92. NOUN       : 171
 95. PRON       : 81
 87. AUX        : 48
 84. ADJ        : 50
 89. CCONJ      : 61
100. VERB       : 136
 94. PART       : 29


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 92

In [33]:
all_tokens = sum(count_dict.values())
noun_total = count_dict[92]
print(f'{noun_total}/{all_tokens} = {100*noun_total/all_tokens}%')

171/1258 = 13.593004769475357%


176/1258 = 13.99%


**5. Display the Dependency Parse for the third sentence**

In [36]:
spacy.displacy.render(sent3,style = 'dep',jupyter=True)

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit***

In [46]:
for i,ent in enumerate(doc.ents):
    if i==2:
        break
    print(ent,ent.label_,spacy.explain(ent.label_))

Peter Rabbit PERSON People, including fictional
Beatrix Potter PERSON People, including fictional


The Tale of Peter Rabbit - WORK_OF_ART - Titles of books, songs, etc.
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [47]:
len(list(doc.sents))

68

56

**8. CHALLENGE: How many sentences contain named entities?**

In [53]:
count = 0
for sent in doc.sents:
    if nlp(sent.text).ents:
        count+=1
count

40

49

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [56]:
sent = next(doc.sents)
spacy.displacy.render(sent,style = 'ent')