___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [21]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [22]:
with open('C:/Users/hafee/OneDrive/Desktop/BootCamp Training/TextFiles/TextFiles/peterrabbit.txt', 'r') as f:
    doc = f.read().replace("\n\n", " ").replace("\n", " ")
    print("Files successfully open and read.")
doc

Files successfully open and read.


"The Tale of Peter Rabbit, by Beatrix Potter (1902). Once upon a time there were four little Rabbits, and their names were--           Flopsy,        Mopsy,    Cotton-tail, and Peter. They lived with their Mother in a sand-bank, underneath the root of a very big fir-tree. 'Now my dears,' said old Mrs. Rabbit one morning, 'you may go into the fields or down the lane, but don't go into Mr. McGregor's garden: your Father had an accident there; he was put in a pie by Mrs. McGregor.' 'Now run along, and don't get into mischief. I am going out.' Then old Mrs. Rabbit took a basket and her umbrella, and went through the wood to the baker's. She bought a loaf of brown bread and five currant buns. Flopsy, Mopsy, and Cottontail, who were good little bunnies, went down the lane to gather blackberries: But Peter, who was very naughty, ran straight away to Mr. McGregor's garden, and squeezed under the gate! First he ate some lettuces and some French beans; and then he ate some radishes; And then, fe

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [25]:
# Enter your code here:
sentence_third = [sent for sent in doc.sents][2]
sentence_third

They lived with their Mother in a sand-bank, underneath the root of a very big fir-tree.

In [28]:
# Enter your code here:
for token in sentence_third:
    print(f'{token.text:{10}} {token.pos_:{7}} {token.dep_:{7}} {spacy.explain(token.dep_)}')

They       PRON    nsubj   nominal subject
lived      VERB    ROOT    root
with       ADP     prep    prepositional modifier
their      PRON    poss    possession modifier
Mother     NOUN    pobj    object of preposition
in         ADP     prep    prepositional modifier
a          DET     det     determiner
sand       NOUN    compound compound
-          PUNCT   punct   punctuation
bank       NOUN    pobj    object of preposition
,          PUNCT   punct   punctuation
underneath ADP     prep    prepositional modifier
the        DET     det     determiner
root       NOUN    pobj    object of preposition
of         ADP     prep    prepositional modifier
a          DET     det     determiner
very       ADV     advmod  adverbial modifier
big        ADJ     amod    adjectival modifier
fir        NOUN    compound compound
-          PUNCT   punct   punctuation
tree       NOUN    pobj    object of preposition
.          PUNCT   punct   punctuation


**3. Provide a frequency list of POS tags from the entire document**

In [35]:
# Count the frequencies of different coarse-grained POS tags:
POS_counts = doc.count_by(spacy.attrs.POS)
POS_counts

for k,v in sorted(POS_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{5}}: {v}')

84. ADJ  : 52
85. ADP  : 125
86. ADV  : 63
87. AUX  : 49
89. CCONJ: 61
90. DET  : 90
92. NOUN : 171
93. NUM  : 8
94. PART : 28
95. PRON : 110
96. PROPN: 75
97. PUNCT: 174
98. SCONJ: 19
100. VERB : 134
103. SPACE: 3


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [52]:
nouns = []

for token in doc:
    if token.pos_ == "NOUN":
        nouns.append(token)
#print((len(nouns) / len(doc))*100)
length_Noun = len(nouns)
length_doc = len(doc)

print("length of Noun: " + str(length_Noun))
print("length of Doc: " + str(length_doc))
print((length_Noun / length_doc)*100)

length of Noun: 171
length of Doc: 1162
14.716006884681585


**5. Display the Dependency Parse for the third sentence**

In [55]:
for token in sentence_third:
    print(f'{token.text} {token.pos_} {token.dep_} {spacy.explain(token.dep_)}')

They PRON nsubj nominal subject
lived VERB ROOT root
with ADP prep prepositional modifier
their PRON poss possession modifier
Mother NOUN pobj object of preposition
in ADP prep prepositional modifier
a DET det determiner
sand NOUN compound compound
- PUNCT punct punctuation
bank NOUN pobj object of preposition
, PUNCT punct punctuation
underneath ADP prep prepositional modifier
the DET det determiner
root NOUN pobj object of preposition
of ADP prep prepositional modifier
a DET det determiner
very ADV advmod adverbial modifier
big ADJ amod adjectival modifier
fir NOUN compound compound
- PUNCT punct punctuation
tree NOUN pobj object of preposition
. PUNCT punct punctuation


They PRON lived VERB with ADP their ADJ Mother PROPN in ADP a DET sand- NOUN bank, NOUN underneath ADP the DET root NOUN of ADP a DET very ADV big ADJ fir- NOUN tree. NOUN SPACE nsubj prep poss pobj prep det compound pobj prep det pobj prep det advmod amod compound punct

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [65]:
entities = [ent for ent in doc.ents][:2]
entities

[The Tale of Peter Rabbit, Beatrix Potter]

**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [67]:
sentences = [sent for sent in doc.sents]
len(sentences)

57

**8. CHALLENGE: How many sentences contain named entities?**

In [90]:
entities = []
for sent in doc.sents:
    if sent.ents:
        entities.append(sent)
print(len(entities))

37


**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [93]:
entity_visualize = list(doc.sents)
displacy.render(entity_visualize[0], style='ent', jupyter=True)

### Great Job!