# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [None]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [None]:
with open('tale.txt', 'r', encoding='utf-8') as file:
    text = file.read()
doc = nlp(text)

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [None]:
# Enter your code here:
for token in list(doc.sents)[2]:
    print(f'{token.text:{10}} {token.pos_:{8}} {token.tag_:{6}} {spacy.explain(token.tag_)}')


You        PRON     PRP    pronoun, personal
may        AUX      MD     verb, modal auxiliary
copy       VERB     VB     verb, base form
it         PRON     PRP    pronoun, personal
,          PUNCT    ,      punctuation mark, comma
give       VERB     VB     verb, base form
it         PRON     PRP    pronoun, personal
away       ADV      RB     adverb
or         CCONJ    CC     conjunction, coordinating
re         VERB     VB     verb, base form
-          VERB     VB     verb, base form
use        VERB     VB     verb, base form
it         PRON     PRP    pronoun, personal
under      ADP      IN     conjunction, subordinating or preposition
the        DET      DT     determiner
terms      NOUN     NNS    noun, plural

          SPACE    _SP    whitespace
of         ADP      IN     conjunction, subordinating or preposition
the        DET      DT     determiner
Project    PROPN    NNP    noun, proper singular
Gutenberg  PROPN    NNP    noun, proper singular
License    PROPN    NNP    n

**3. Provide a frequency list of POS tags from the entire document**

In [None]:
POS_counts = doc.count_by(spacy.attrs.POS)
POS_counts



{92: 862,
 96: 484,
 85: 541,
 90: 450,
 103: 445,
 87: 182,
 95: 267,
 86: 118,
 89: 212,
 84: 243,
 97: 538,
 100: 511,
 93: 55,
 98: 82,
 94: 102,
 101: 81,
 99: 7,
 91: 2}

**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [None]:
total_tokens = len(doc)
print(((POS_counts[91])/total_tokens)*100)


0.0385951370127364


**5. Display the Dependency Parse for the third sentence**

In [None]:
displacy.render(list(doc.sents)[2], style='dep', jupyter=True, options={'distance': 110})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [None]:
def show_ents(doc):
    if doc.ents:
        for ent in doc.ents:
          print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))
    else:
        print('No named entities found.')
for sent in list(doc.sents)[:2]:
    show_ents(nlp(sent.text))





Project Gutenberg eBook - PERSON - People, including fictional
The Tale of Peter Rabbit - WORK_OF_ART - Titles of books, songs, etc.
the United States - GPE - Countries, cities, states


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [None]:
num_sentences = len(list(doc.sents))

print(f"Number of sentences in the document: {num_sentences}")

Number of sentences in the document: 186


**8. CHALLENGE: How many sentences contain named entities?**

In [None]:
num = 0

for sent in doc.sents:
    if any(token.ent_type_ for token in sent):
        num += 1

print(f"Number of sentences with named entities: {num}")




Number of sentences with named entities: 128


**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [None]:
displacy.render(list(doc.sents)[0], style='ent', jupyter=True)