___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [2]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [3]:
with open('../TextFiles/peterrabbit.txt') as f:
    doc = nlp(f.read())


**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [12]:
# Enter your code here:
sents = list(doc.sents)
for token in sents[2]:
    print(f"{token.text:{12}} {token.pos_:{6}} {token.tag_:{10}} {spacy.explain(token.tag_)}")


They         PRON   PRP        pronoun, personal
lived        VERB   VBD        verb, past tense
with         ADP    IN         conjunction, subordinating or preposition
their        ADJ    PRP$       pronoun, possessive
Mother       PROPN  NNP        noun, proper singular
in           ADP    IN         conjunction, subordinating or preposition
a            DET    DT         determiner
sand         NOUN   NN         noun, singular or mass
-            PUNCT  HYPH       punctuation mark, hyphen
bank         NOUN   NN         noun, singular or mass
,            PUNCT  ,          punctuation mark, comma
underneath   ADP    IN         conjunction, subordinating or preposition
the          DET    DT         determiner
root         NOUN   NN         noun, singular or mass
of           ADP    IN         conjunction, subordinating or preposition
a            DET    DT         determiner

            SPACE             None
very         ADV    RB         adverb
big          ADJ    JJ         adj

**3. Provide a frequency list of POS tags from the entire document**

In [15]:
POS_count = doc.count_by(spacy.attrs.POS)

for k,v in sorted(POS_count.items()):
    print(f"{k}. {doc.vocab[k].text:{5}} : {v}")


83. ADJ   : 83
84. ADP   : 127
85. ADV   : 75
88. CCONJ : 61
89. DET   : 90
91. NOUN  : 176
92. NUM   : 8
93. PART  : 36
94. PRON  : 72
95. PROPN : 75
96. PUNCT : 174
99. VERB  : 182
102. SPACE : 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [21]:
percent = 100 * POS_count[91] / len(doc)
print(f"{POS_count[91]}/{len(doc)} = {percent:{.4}}%")


176/1258 = 13.99%


**5. Display the Dependency Parse for the third sentence**

In [27]:
displacy.render(sents[2], style='dep', jupyter=True, options={'distence' : 110})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [29]:
for token in doc.ents[:2]:
    print(f"{token.text} - {token.label_} - {spacy.explain(token.label_)}")


The Tale of Peter Rabbit - WORK_OF_ART - Titles of books, songs, etc.
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [30]:
len(sents)

56

**8. CHALLENGE: How many sentences contain named entities?**

In [33]:
sent_list = [nlp(sent.text) for sent in sents]
len([doc for doc in sent_list if doc.ents])



49

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [34]:
displacy.render(sent_list[0], style='ent', jupyter=True)

### Great Job!