# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [5]:

with open("TextFiles\\peterrabbit.txt") as f:
    doc = nlp(f.read())
    print(doc)

The Tale of Peter Rabbit, by Beatrix Potter (1902).

Once upon a time there were four little Rabbits, and their names
were--

          Flopsy,
       Mopsy,
   Cotton-tail,
and Peter.

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.

'Now my dears,' said old Mrs. Rabbit one morning, 'you may go into
the fields or down the lane, but don't go into Mr. McGregor's garden:
your Father had an accident there; he was put in a pie by Mrs.
McGregor.'

'Now run along, and don't get into mischief. I am going out.'

Then old Mrs. Rabbit took a basket and her umbrella, and went through
the wood to the baker's. She bought a loaf of brown bread and five
currant buns.

Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
down the lane to gather blackberries:

But Peter, who was very naughty, ran straight away to Mr. McGregor's
garden, and squeezed under the gate!

First he ate some lettuces and some French beans; and then he ate
some radishes;

And

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [9]:
print(list(doc.sents)[2])

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.




In [15]:
for token in list(doc.sents)[2]:
    print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{15}} {str(spacy.explain(token.tag_))}")

They       PRON       PRP             pronoun, personal
lived      VERB       VBD             verb, past tense
with       ADP        IN              conjunction, subordinating or preposition
their      PRON       PRP$            pronoun, possessive
Mother     NOUN       NN              noun, singular or mass
in         ADP        IN              conjunction, subordinating or preposition
a          DET        DT              determiner
sand       NOUN       NN              noun, singular or mass
-          PUNCT      HYPH            punctuation mark, hyphen
bank       NOUN       NN              noun, singular or mass
,          PUNCT      ,               punctuation mark, comma
underneath ADP        IN              conjunction, subordinating or preposition
the        DET        DT              determiner
root       NOUN       NN              noun, singular or mass
of         ADP        IN              conjunction, subordinating or preposition
a          DET        DT              determ

**3. Provide a frequency list of POS tags from the entire document**

In [24]:
doc.vocab[9].text

'LIKE_URL'

In [29]:
POS_counts = doc.count_by(spacy.attrs.POS)
# print(POS_counts)

for key, value in POS_counts.items():
    print(f"{key} {doc.vocab[key].text} :{value}")

90 DET :90
96 PROPN :74
85 ADP :125
97 PUNCT :171
93 NUM :9
103 SPACE :99
86 ADV :63
98 SCONJ :19
92 NOUN :172
95 PRON :110
100 VERB :135
84 ADJ :53
89 CCONJ :61
87 AUX :49
94 PART :28


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [36]:
doc.vocab[92].text

'NOUN'

In [None]:
len(doc)

In [None]:
POS_counts[92]

In [39]:
percent = 100*POS_counts[92]/len(doc)

print(f"{POS_counts[92]}/{len(doc)} = {percent: {.4}}%")

# print(f'{POS_counts[92]}/{len(doc)} = {percent:{.4}}%')

172/1258 =  13.67%


**5. Display the Dependency Parse for the third sentence**

In [42]:
displacy.render(list(doc.sents)[2], style="dep", jupyter=True, options={"distance":90})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [44]:
for ent in doc.ents[:2]:
    print(ent.text + " - " + ent.label_ + " - " + str(spacy.explain(ent.label_)))

The Tale of Peter Rabbit - WORK_OF_ART - Titles of books, songs, etc.
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [45]:
len([sent for sent in doc.sents])

55

In [None]:
# OR 
len(list(doc.sents))

**8. CHALLENGE: How many sentences contain named entities?**

In [47]:
list_sentences = [nlp(sent.text) for sent in doc.sents]
list_NERs = [doc for doc in list_sentences if doc.ents]
len(list_NERs)

36

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [48]:
displacy.render(list_sentences[0], style='ent', jupyter=True)