___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [21]:
with open('../TextFiles/peterrabbit.txt') as f:
    doc = nlp(f.read())

In [39]:
third_sent = list(doc.sents)[3]
#print(doc)
#print(third_sent)
for token in third_sent:
    print(f"{token.text:10} {token.pos_:10} {token.tag_:5} {spacy.explain(token.tag_)}" )

They       PRON       PRP   pronoun, personal
lived      VERB       VBD   verb, past tense
with       ADP        IN    conjunction, subordinating or preposition
their      PRON       PRP$  pronoun, possessive
Mother     PROPN      NNP   noun, proper singular
in         ADP        IN    conjunction, subordinating or preposition
a          DET        DT    determiner
sand       NOUN       NN    noun, singular or mass
-          PUNCT      HYPH  punctuation mark, hyphen
bank       NOUN       NN    noun, singular or mass
,          PUNCT      ,     punctuation mark, comma
underneath ADP        IN    conjunction, subordinating or preposition
the        DET        DT    determiner
root       NOUN       NN    noun, singular or mass
of         ADP        IN    conjunction, subordinating or preposition
a          DET        DT    determiner

          SPACE      _SP   None
very       ADV        RB    adverb
big        ADJ        JJ    adjective
fir        NOUN       NN    noun, singular or mass

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [3]:
# Enter your code here:




They         PRON   PRP    pronoun, personal
lived        VERB   VBD    verb, past tense
with         ADP    IN     conjunction, subordinating or preposition
their        ADJ    PRP$   pronoun, possessive
Mother       PROPN  NNP    noun, proper singular
in           ADP    IN     conjunction, subordinating or preposition
a            DET    DT     determiner
sand         NOUN   NN     noun, singular or mass
-            PUNCT  HYPH   punctuation mark, hyphen
bank         NOUN   NN     noun, singular or mass
,            PUNCT  ,      punctuation mark, comma
underneath   ADP    IN     conjunction, subordinating or preposition
the          DET    DT     determiner
root         NOUN   NN     noun, singular or mass
of           ADP    IN     conjunction, subordinating or preposition
a            DET    DT     determiner

            SPACE         None
very         ADV    RB     adverb
big          ADJ    JJ     adjective
fir          NOUN   NN     noun, singular or mass
-            PUNCT 

**3. Provide a frequency list of POS tags from the entire document**

In [40]:
doc.count_by(spacy.attrs.POS)

{90: 95,
 96: 75,
 85: 124,
 97: 174,
 93: 8,
 103: 95,
 86: 65,
 98: 16,
 92: 171,
 95: 105,
 87: 43,
 84: 53,
 89: 61,
 100: 139,
 94: 31}

In [41]:

POS_counts = doc.count_by(spacy.attrs.POS)

for k,v in sorted(POS_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{5}}: {v}')



84. ADJ  : 53
85. ADP  : 124
86. ADV  : 65
87. AUX  : 43
89. CCONJ: 61
90. DET  : 95
92. NOUN : 171
93. NUM  : 8
94. PART : 31
95. PRON : 105
96. PROPN: 75
97. PUNCT: 174
98. SCONJ: 16
100. VERB : 139
103. SPACE: 95


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [44]:
len_doc = len(doc)

for k,v in sorted(POS_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{5}}: {v/len_doc:.2%}')

POS_counts[92] / len_doc


84. ADJ  : 4.22%
85. ADP  : 9.88%
86. ADV  : 5.18%
87. AUX  : 3.43%
89. CCONJ: 4.86%
90. DET  : 7.57%
92. NOUN : 13.63%
93. NUM  : 0.64%
94. PART : 2.47%
95. PRON : 8.37%
96. PROPN: 5.98%
97. PUNCT: 13.86%
98. SCONJ: 1.27%
100. VERB : 11.08%
103. SPACE: 7.57%


0.1362549800796813

**5. Display the Dependency Parse for the third sentence**

In [50]:
displacy.render(third_sent, style="dep", jupyter=True, options={"distance":110})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [56]:
for ent in doc.ents[:2]:
    print(f"{ent.text:10} - {ent.label_} - {spacy.explain(ent.label_)}")

The Tale of Peter Rabbit - WORK_OF_ART - Titles of books, songs, etc.
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [57]:
len(list(doc.sents))

74

In [77]:
# for sent in doc.sents:
#     if sent.ents != []:
#         print(sent.ents)

print(len([nlp(sent.text) for sent in doc.sents if sent.ents != []]))

print(len([nlp(sent.text) for sent in doc.sents if  sent.ents]))

[nlp(sent.text) for sent in doc.sents if  sent.ents]

23
23


[The Tale of Peter Rabbit, by Beatrix Potter (1902).,
 
 
 Once upon a time there were four little Rabbits, and their names
 were--Flopsy, Mopsy, Cotton-tail, and Peter.,
 
 
 'Now my dears,' said old Mrs. Rabbit one morning, 'you may go into
 the fields or down the lane, but don't go into Mr. McGregor's garden:
 your Father had an accident there; he was put in a pie by Mrs.
 McGregor.',
 
 
 Then old Mrs. Rabbit took a basket and her umbrella, and went through
 the wood to the baker's.,
 She bought a loaf of brown bread and five
 currant buns.,
 
 
 Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
 down the lane to gather blackberries:
 
 But Peter, who was very naughty, ran straight away to Mr. McGregor's
 garden, and squeezed under the gate!,
 First he ate some lettuces and some French beans; and then he ate
 some radishes;,
 Mr.
 McGregor!
 
 Mr. McGregor was on his hands and knees planting out young cabbages,,
 but he jumped up and ran after Peter, waving a rake a

**8. CHALLENGE: How many sentences contain named entities?**

In [76]:
list_of_sents = [nlp(sent.text) for sent in doc.sents]
list_of_ners = [doc for doc in list_of_sents if doc.ents]
len(list_of_ners)

list_of_ners

[The Tale of Peter Rabbit, by Beatrix Potter (1902).,
 
 
 Once upon a time there were four little Rabbits, and their names
 were--Flopsy, Mopsy, Cotton-tail, and Peter.,
 
 
 'Now my dears,' said old Mrs. Rabbit one morning, 'you may go into
 the fields or down the lane, but don't go into Mr. McGregor's garden:
 your Father had an accident there; he was put in a pie by Mrs.
 McGregor.',
 
 
 Then old Mrs. Rabbit took a basket and her umbrella, and went through
 the wood to the baker's.,
 She bought a loaf of brown bread and five
 currant buns.,
 
 
 Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
 down the lane to gather blackberries:
 
 But Peter, who was very naughty, ran straight away to Mr. McGregor's
 garden, and squeezed under the gate!,
 First he ate some lettuces and some French beans; and then he ate
 some radishes;,
 Mr.
 McGregor!
 
 Mr. McGregor was on his hands and knees planting out young cabbages,,
 but he jumped up and ran after Peter, waving a rake a

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [68]:
displacy.render(list(doc.sents)[0], style="ent")

### Great Job!