___
# POS Project 

### This is the training to learn how to use Spacy

**copyright by pierian Data Inc.**
___

I'll be using the short story [The Tale of Peter Rabbit](https://en.wikipedia.org/wiki/The_Tale_of_Peter_Rabbit) by Beatrix Potter (1902). <br>The story is in the public domain; the text file was obtained from [Project Gutenberg](https://www.gutenberg.org/ebooks/14838.txt.utf-8).

In [1]:
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy import displacy

**1. Create a Doc object from the file `peterrabbit.txt`**

In [2]:
with open('peterrabbit.txt') as f:
    DOC = nlp(f.read())

**2. For every token in the third sentence, print the token text, the POS tag, the fine-grained TAG tag, and the description of the fine-grained tag.**

In [28]:
doc_cents = [sent for sent in DOC.sents]
doc_cents[2]

They lived with their Mother in a sand-bank, underneath the root of a
very big fir-tree.


In [4]:
for token in doc_cents[2]:
    print(f'{token.text:{15}} {token.pos_:{5}} {token.tag_:{10}} {spacy.explain(token.tag_)}')

They            PRON  PRP        pronoun, personal
lived           VERB  VBD        verb, past tense
with            ADP   IN         conjunction, subordinating or preposition
their           PRON  PRP$       pronoun, possessive
Mother          NOUN  NN         noun, singular or mass
in              ADP   IN         conjunction, subordinating or preposition
a               DET   DT         determiner
sand            NOUN  NN         noun, singular or mass
-               PUNCT HYPH       punctuation mark, hyphen
bank            NOUN  NN         noun, singular or mass
,               PUNCT ,          punctuation mark, comma
underneath      ADP   IN         conjunction, subordinating or preposition
the             DET   DT         determiner
root            NOUN  NN         noun, singular or mass
of              ADP   IN         conjunction, subordinating or preposition
a               DET   DT         determiner

               SPACE _SP        whitespace
very            ADV   RB       

**3. Provide a frequency list of POS tags from the entire document**

In [5]:
for token in doc_cents[2]:
    print(token.pos)

95
100
85
95
92
85
90
92
97
92
97
85
90
92
85
90
103
86
84
92
97
92
97
103


In [8]:
DOC.count_by(spacy.attrs.POS)

{90: 90,
 96: 74,
 85: 125,
 97: 171,
 93: 9,
 103: 99,
 86: 63,
 98: 19,
 92: 172,
 95: 110,
 100: 135,
 84: 53,
 89: 61,
 87: 49,
 94: 28}

In [10]:
counts = DOC.count_by(spacy.attrs.POS)
counts.items()

dict_items([(90, 90), (96, 74), (85, 125), (97, 171), (93, 9), (103, 99), (86, 63), (98, 19), (92, 172), (95, 110), (100, 135), (84, 53), (89, 61), (87, 49), (94, 28)])

In [14]:
counts = DOC.count_by(spacy.attrs.POS)

for key, value in sorted(counts.items()):
    print(f'{key}.\t{DOC.vocab[key].text}\t: {value}')
    

84.	ADJ	: 53
85.	ADP	: 125
86.	ADV	: 63
87.	AUX	: 49
89.	CCONJ	: 61
90.	DET	: 90
92.	NOUN	: 172
93.	NUM	: 9
94.	PART	: 28
95.	PRON	: 110
96.	PROPN	: 74
97.	PUNCT	: 171
98.	SCONJ	: 19
100.	VERB	: 135
103.	SPACE	: 99


**4. what percentage of tokens are nouns?**

In [17]:
counts.values()

dict_values([90, 74, 125, 171, 9, 99, 63, 19, 172, 110, 135, 53, 61, 49, 28])

In [21]:
counts[92]

172

In [25]:
Noun = round(counts[92]/sum(value for value in counts.values()),3)
Noun


0.137

In [26]:
total = sum(value for value in counts.values())
print(f'{counts[92]}/{total} = {Noun}%')

172/1258 = 0.137%


**5. Display the Dependency Parse for the third sentence**

In [32]:
displacy.render(doc_cents[2],style='dep',jupyter=True,options={'distance':100})

**6. Show the first two named entities from Beatrix Potter's *The Tale of Peter Rabbit***

In [41]:
ent_coll = [ent for ent in DOC.ents]

In [50]:
for ent in ent_coll[:2]:
    print(ent.text+' - ' + ent.label_+' - ' + str(spacy.explain(ent.label_)))

The Tale of Peter Rabbit - WORK_OF_ART - Titles of books, songs, etc.
Beatrix Potter - PERSON - People, including fictional


**7. How many sentences are contained in *The Tale of Peter Rabbit*?**

In [45]:
sum(1 for token in DOC if token.is_sent_start)

55

**8. How many sentences contain named entities?**

In [48]:
doc_cents

[The Tale of Peter Rabbit, by Beatrix Potter (1902).
 ,
 Once upon a time there were four little Rabbits, and their names
 were--
 
           Flopsy,
        Mopsy,
    Cotton-tail,
 and Peter.
 ,
 They lived with their Mother in a sand-bank, underneath the root of a
 very big fir-tree.
 ,
 'Now my dears,' said old Mrs. Rabbit one morning, 'you may go into
 the fields or down the lane, but don't go into Mr. McGregor's garden:
 your Father had an accident there; he was put in a pie by Mrs.
 McGregor.'
 
 'Now run along, and don't get into mischief.,
 I am going out.'
 ,
 Then old Mrs. Rabbit took a basket and her umbrella, and went through
 the wood to the baker's.,
 She bought a loaf of brown bread and five
 currant buns.
 ,
 Flopsy, Mopsy, and Cottontail, who were good little bunnies, went
 down the lane to gather blackberries:
 
 But Peter, who was very naughty, ran straight away to Mr. McGregor's
 garden, and squeezed under the gate!
 ,
 First he ate some lettuces and some French b

In [46]:
sum(1 for sent in doc_cents if sent.ents)

35

**9.Display the named entity visualization for `list_of_sents[0]`**

In [49]:
displacy.render(doc_cents[0], style='ent', jupyter=True)

### Fin!