# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [20]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [21]:
with open('../TextFiles/peterrabbit.txt') as f:
  text = f.read()
  doc = nlp(text)

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [22]:
# Enter your code here:
doc1 = list(doc.sents)

print(f'{"Text":{12}} {"POS":{12}} {"Tag":{10}} {"Explanation":}')
for token in doc1[2]:
  print(f'{token.text:{12}} {token.pos_:{12}} {token.tag_:{10}} {spacy.explain(token.tag_)}')



Text         POS          Tag        Explanation
They         PRON         PRP        pronoun, personal
lived        VERB         VBD        verb, past tense
with         ADP          IN         conjunction, subordinating or preposition
their        PRON         PRP$       pronoun, possessive
Mother       NOUN         NN         noun, singular or mass
in           ADP          IN         conjunction, subordinating or preposition
a            DET          DT         determiner
sand         NOUN         NN         noun, singular or mass
-            PUNCT        HYPH       punctuation mark, hyphen
bank         NOUN         NN         noun, singular or mass
,            PUNCT        ,          punctuation mark, comma
underneath   ADP          IN         conjunction, subordinating or preposition
the          DET          DT         determiner
root         NOUN         NN         noun, singular or mass
of           ADP          IN         conjunction, subordinating or preposition
a         

**3. Provide a frequency list of POS tags from the entire document**

In [23]:
POS_counts = doc.count_by(spacy.attrs.POS)

print("\nPOS Tag Counts:")
for tag_id, count in sorted(POS_counts.items()):
    print(f'{doc.vocab[tag_id].text:{5}}: {count}')


POS Tag Counts:
ADJ  : 53
ADP  : 125
ADV  : 63
AUX  : 49
CCONJ: 61
DET  : 90
NOUN : 172
NUM  : 9
PART : 28
PRON : 110
PROPN: 74
PUNCT: 171
SCONJ: 19
VERB : 135
SPACE: 99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [24]:
noun_percent = 100 * POS_counts[92] / len(doc)
print(f'{POS_counts[92]}/{len(doc)} = {noun_percent:.4}%')

172/1258 = 13.67%


**5. Display the Dependency Parse for the third sentence**

In [25]:

displacy.render(doc1[2], style='dep', jupyter=True, options={'distance': 100})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [26]:
ents_list = list(doc.ents)

for ent in ents_list[:2]:
    print(ent.text, ent.label_, spacy.explain(ent.label_))

The Tale of Peter Rabbit WORK_OF_ART Titles of books, songs, etc.
Beatrix Potter PERSON People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [27]:
len(doc1)

55

**8. CHALLENGE: How many sentences contain named entities?**

In [28]:
sentences_with_entities = 0
for sent in doc1:
    if any(ent.label_ for ent in sent.ents):
        sentences_with_entities += 1

print("Number of sentences containing named entities:", sentences_with_entities)

Number of sentences containing named entities: 35


**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [29]:
entity = [sent for sent in doc1 if sent.ents]

displacy.render(entity[0], style='ent', jupyter=True)