___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [5]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [6]:
with open('peterrabbit.txt') as f:
    doc = nlp(f.read())

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [18]:
# Enter your code here:
for token in list(doc.sents)[3]:
    print(f"{token.text:{10}} {token.pos_:{10}} {token.tag_:{10}} {spacy.explain(token.tag_)}")

They       PRON       PRP        pronoun, personal
lived      VERB       VBD        verb, past tense
with       ADP        IN         conjunction, subordinating or preposition
their      PRON       PRP$       pronoun, possessive
Mother     PROPN      NNP        noun, proper singular
in         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner
sand       NOUN       NN         noun, singular or mass
-          PUNCT      HYPH       punctuation mark, hyphen
bank       NOUN       NN         noun, singular or mass
,          PUNCT      ,          punctuation mark, comma
underneath ADP        IN         conjunction, subordinating or preposition
the        DET        DT         determiner
root       NOUN       NN         noun, singular or mass
of         ADP        IN         conjunction, subordinating or preposition
a          DET        DT         determiner

          SPACE      _SP        None
very       ADV        RB         adver

**3. Provide a frequency list of POS tags from the entire document**

In [30]:
POS_count = doc.count_by(spacy.attrs.POS)
for k,v in sorted(POS_count.items()):
    print(f"ID : {k:{5}}. \t POS : {doc.vocab[k].text:{10}} - Counts : {v}")

ID :    84. 	 POS : ADJ        - Counts : 53
ID :    85. 	 POS : ADP        - Counts : 124
ID :    86. 	 POS : ADV        - Counts : 65
ID :    87. 	 POS : AUX        - Counts : 43
ID :    89. 	 POS : CCONJ      - Counts : 61
ID :    90. 	 POS : DET        - Counts : 95
ID :    92. 	 POS : NOUN       - Counts : 171
ID :    93. 	 POS : NUM        - Counts : 9
ID :    94. 	 POS : PART       - Counts : 31
ID :    95. 	 POS : PRON       - Counts : 105
ID :    96. 	 POS : PROPN      - Counts : 75
ID :    97. 	 POS : PUNCT      - Counts : 173
ID :    98. 	 POS : SCONJ      - Counts : 16
ID :   100. 	 POS : VERB       - Counts : 138
ID :   103. 	 POS : SPACE      - Counts : 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [31]:
len(doc)

1258

In [32]:
POS_count[92]

171

In [35]:

100 * POS_count[92] / len(doc)


13.593004769475357

**5. Display the Dependency Parse for the third sentence**

In [36]:
displacy.render(list(doc.sents)[3], style="dep", jupyter=True)

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [39]:
doc.ents[:2]

(The Tale of Peter Rabbit, Beatrix Potter)

In [46]:
for ent in doc.ents[:2]:
    print(f"{ent.text:{30}} {ent.label_:{20}} {spacy.explain(ent.label_)}")

The Tale of Peter Rabbit       WORK_OF_ART          Titles of books, songs, etc.
Beatrix Potter                 PERSON               People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [47]:
len(list(doc.sents))

74

**8. CHALLENGE: How many sentences contain named entities?**

In [49]:
list_of_sents = [nlp(sent.text) for sent in doc.sents]
list_of_ners = [doc for doc in list_of_sents if doc.ents]
len(list_of_ners)

25

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [50]:
displacy.render(list_of_sents[0], style="ent", jupyter=True)

### Great Job!