___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [2]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [3]:
with open('peterrabbit.txt','r') as f:
    doc = nlp(f.read())
doc

The Tale of Peter Rabbit, by Beatrix Potter (1902).

Once upon a time there were four little Rabbits, and their names
were--

          Flopsy,
       Mopsy,
   Cotton-tail,
and Peter.

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.

'Now my dears,' said old Mrs. Rabbit one morning, 'you may go into
the fields or down the lane, but don't go into Mr. McGregor's garden:
your Father had an accident there; he was put in a pie by Mrs.
McGregor.'

'Now run along, and don't get into mischief. I am going out.'

Then old Mrs. Rabbit took a basket and her umbrella, and went through
the wood to the baker's. She bought a loaf of brown bread and five
currant buns.

Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
down the lane to gather blackberries:

But Peter, who was very naughty, ran straight away to Mr. McGregor's
garden, and squeezed under the gate!

First he ate some lettuces and some French beans; and then he ate
some radishes;

And

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [25]:
for token in list(doc.sents)[2]:
    print(f'{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_)}')

They       PRON       PRP        pronoun, personal
lived      VERB       VBD        verb, past tense
with       ADP        IN         conjunction, subordinating or preposition
their      ADJ        PRP$       pronoun, possessive
Mother     PROPN      NNP        noun, proper singular
in         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner
sand       NOUN       NN         noun, singular or mass
-          PUNCT      HYPH       punctuation mark, hyphen
bank       NOUN       NN         noun, singular or mass
,          PUNCT      ,          punctuation mark, comma
underneath ADP        IN         conjunction, subordinating or preposition
the        DET        DT         determiner
root       NOUN       NN         noun, singular or mass
of         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner

          SPACE                 None
very       ADV        RB         adver

In [3]:
# Enter your code here:




They         PRON   PRP    pronoun, personal
lived        VERB   VBD    verb, past tense
with         ADP    IN     conjunction, subordinating or preposition
their        ADJ    PRP$   pronoun, possessive
Mother       PROPN  NNP    noun, proper singular
in           ADP    IN     conjunction, subordinating or preposition
a            DET    DT     determiner
sand         NOUN   NN     noun, singular or mass
-            PUNCT  HYPH   punctuation mark, hyphen
bank         NOUN   NN     noun, singular or mass
,            PUNCT  ,      punctuation mark, comma
underneath   ADP    IN     conjunction, subordinating or preposition
the          DET    DT     determiner
root         NOUN   NN     noun, singular or mass
of           ADP    IN     conjunction, subordinating or preposition
a            DET    DT     determiner

            SPACE         None
very         ADV    RB     adverb
big          ADJ    JJ     adjective
fir          NOUN   NN     noun, singular or mass
-            PUNCT 

**3. Provide a frequency list of POS tags from the entire document**

In [13]:
pos_counts = doc.count_by(spacy.attrs.POS)
pos_counts

{96: 174,
 99: 182,
 102: 99,
 83: 83,
 84: 127,
 85: 75,
 88: 61,
 89: 90,
 91: 176,
 92: 8,
 93: 36,
 94: 72,
 95: 75}

In [16]:
for k,v in sorted(pos_counts.items()):
    print(f'{k:4}.{doc.vocab[k].text:8}:{v}')

  83.ADJ     :83
  84.ADP     :127
  85.ADV     :75
  88.CCONJ   :61
  89.DET     :90
  91.NOUN    :176
  92.NUM     :8
  93.PART    :36
  94.PRON    :72
  95.PROPN   :75
  96.PUNCT   :174
  99.VERB    :182
 102.SPACE   :99


83. ADJ  : 83
84. ADP  : 127
85. ADV  : 75
88. CCONJ: 61
89. DET  : 90
91. NOUN : 176
92. NUM  : 8
93. PART : 36
94. PRON : 72
95. PROPN: 75
96. PUNCT: 174
99. VERB : 182
102. SPACE: 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [17]:
noun = pos_counts.get(91)
noun

176

In [19]:
pos_counts.values()

dict_values([174, 182, 99, 83, 127, 75, 61, 90, 176, 8, 36, 72, 75])

In [18]:
sum(pos_counts.values())

1258

In [21]:
noun/sum(pos_counts.values()) *100

13.990461049284578

In [28]:
# or
pos_counts.get(91)/len(doc)*100

13.990461049284578

176/1258 = 13.99%


**5. Display the Dependency Parse for the third sentence**

In [36]:
displacy.render(list(doc.sents)[2],style='dep',jupyter=True,options={'distance':50,'compact':True})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [41]:
for ents in doc.ents[:2]:
    print(ents.text,ents.label_,spacy.explain(ents.label_))

The Tale of Peter Rabbit WORK_OF_ART Titles of books, songs, etc.
Beatrix Potter PERSON People, including fictional


The Tale of Peter Rabbit - WORK_OF_ART - Titles of books, songs, etc.
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [43]:
len([sents for sents in doc.sents])

56

56

**8. CHALLENGE: How many sentences contain named entities?**

In [45]:
list_of_sents =[nlp(sents.text) for sents in doc.sents]
type(list_of_sents[0])

spacy.tokens.doc.Doc

In [47]:
list_of_sents[:3]

[The Tale of Peter Rabbit, by Beatrix Potter (1902).
 , Once upon a time there were four little Rabbits, and their names
 were--
 
           Flopsy,
        Mopsy,
    Cotton-tail,
 and Peter.
 , They lived with their Mother in a sand-bank, underneath the root of a
 very big fir-tree.
 ]

In [52]:
ents = [doc for doc in list_of_sents if doc.ents]

In [54]:
len(ents)

49

49

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [55]:
list_of_sents[0]

The Tale of Peter Rabbit, by Beatrix Potter (1902).


In [59]:
displacy.render(list_of_sents[0],style='ent',jupyter=True)

### Great Job!