___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___

# Parts of Speech Assessment

For this assessment we'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [39]:
# RUN THIS CELL to perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**<br>
> HINT: Use `with open('../TextFiles/peterrabbit.txt') as f:`

In [10]:
with open('../TextFiles/peterrabbit.txt') as f:
    doc = nlp(f.read())

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [40]:
# Enter your code here:
# for sent in doc.sents:
#     for token in sent:
#         print(token.text,token.pos_,token.tag_,spacy.explain(token.tag_))

sent3 = list(doc.sents)[3]
for token in sent3:
    print(f'{token.text:{12}} {token.pos_:{6}} {token.tag_:{6}} {spacy.explain(token.tag_)}')


They         PRON   PRP    pronoun, personal
lived        VERB   VBD    verb, past tense
with         ADP    IN     conjunction, subordinating or preposition
their        DET    PRP$   pronoun, possessive
Mother       PROPN  NNP    noun, proper singular
in           ADP    IN     conjunction, subordinating or preposition
a            DET    DT     determiner
sand         NOUN   NN     noun, singular or mass
-            PUNCT  HYPH   punctuation mark, hyphen
bank         NOUN   NN     noun, singular or mass
,            PUNCT  ,      punctuation mark, comma
underneath   ADP    IN     conjunction, subordinating or preposition
the          DET    DT     determiner
root         NOUN   NN     noun, singular or mass
of           ADP    IN     conjunction, subordinating or preposition
a            DET    DT     determiner

            SPACE  _SP    None
very         ADV    RB     adverb
big          ADJ    JJ     adjective
fir          NOUN   NN     noun, singular or mass
-            PUNCT 

**3. Provide a frequency list of POS tags from the entire document**

In [31]:
POS_counts=doc.count_by(spacy.attrs.POS)

for k,v in sorted(POS_counts.items()):
    print(f'{k:{5}} {doc.vocab[k].text:{8}} {v}')



   84 ADJ      57
   85 ADP      129
   86 ADV      75
   89 CCONJ    61
   90 DET      118
   92 NOUN     166
   93 NUM      8
   94 PART     34
   95 PRON     78
   96 PROPN    75
   97 PUNCT    173
  100 VERB     185
  103 SPACE    99


**4. CHALLENGE: What percentage of tokens are nouns?**<br>
HINT: the attribute ID for 'NOUN' is 91

In [36]:
percent = 100*POS_counts[92]/len(doc)

print(f'100*{POS_counts[92]}/{len(doc)}={percent:{.4}}%')

100*166/1258=13.2%


**5. Display the Dependency Parse for the third sentence**

In [52]:
sentence = list(doc.sents)[3]
type(sentence)
#displacy.render(list(doc.sents)[3],style='dep',jupyter=True,options={'distance':50})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit* **

In [63]:
for ent in doc.ents[:2]:
    print(f'{ent.text} {ent.label_} {spacy.explain(ent.label_)}')


Peter Rabbit PERSON People, including fictional
Beatrix Potter PERSON People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [58]:
len([sent for sent in doc.sents])

60

**8. CHALLENGE: How many sentences contain named entities?**

In [65]:
list_of_sent = [nlp(sent.text) for sent in doc.sents]
list_of_ner = [doc for doc in list_of_sent if doc.ents]
len(list_of_ner)



39

**9. CHALLENGE: Display the named entity visualization for `list_of_sents[0]` from the previous problem**

In [67]:
displacy.render(list_of_sent[0],style='ent',jupyter=True)

### Great Job!