# POS-Tagging and Its Applications

- part-of-speach tagging
- process of tagging word in a texual input with theri appropriate par tof speach

examples

- Noun - The name of a person, place, thing, or idea
- Verb - The action or being
- Adjective - This modifies or describes a noun or a pronoun
- Adverb - This modifies or describes a verb, adjective, or another adverb
- Pronoun - The word to be used in place of a noun
- Preposition - The word placed before a noun or pronoun to form a phrase modifying another word in the sentence
- Conjunction - This joins words, phrases, or clauses
- Interjection - A word used to express emotion


Even within the English language, POS-tagging isn't always a straightforward task and words have different POS-tags depending on the context. A simple example is the word refuse, where if it used as a verb it means to decline an offer, and when used as a noun it is used to refer to something you throw away or rubbish.


In spaCy, for a more detailed analysis, we also have the .tag_ attribute, which adds more information to the previously given .pos_ attribute. The following table gives the breakup of the categories spaCy has to annotate its words.

![](https://i.imgur.com/hXfl5Sa.png)



### why

-  speach to text conversion and lang translations
- dependency parsing
    - process of identifyin gdependecis or relationships between wors in a sentence or phrase

## POS tagging in python

NLTK's fairly straightforward API for playing around or sandboxing is what usually tends to make it an attractive choice for beginners. To get the appropriate tags for a sentence, all we have to run is this:

In [1]:
import nltk
text = nltk.word_tokenize("And now for something completely different")
nltk.pos_tag(text)

[('And', 'CC'),
 ('now', 'RB'),
 ('for', 'IN'),
 ('something', 'NN'),
 ('completely', 'RB'),
 ('different', 'JJ')]

- import particular tagger, use the `train_sents` object ate the tainingsentences you wish to use to treain the `bigram`tagger

```python
bigram_tagger = nltk.BigramTagger(train_sents)
bigram_tagger.tag(text)
```

## POS tagging with spaCy

In [3]:
# setup model

import spacy

nlp = spacy.load('en')

- sentences we woudl like to pos-tag

In [4]:
sent_0 = nlp(u'Mathieu and I went to the park.')
sent_1 = nlp(u'If Clement was asked to take out the garbage, he would refuse.')
sent_2 = nlp(u'Baptiste was in charge of the refuse treatment center.')
sent_3 = nlp(u'Marie took out her rather suspicious and fishy cat to go fish for fish.')

In [6]:
# sent_0

for token in sent_0:
    print((token.text, token.pos_, token.tag_))

('Mathieu', 'PROPN', 'NNP')
('and', 'CCONJ', 'CC')
('I', 'PRON', 'PRP')
('went', 'VERB', 'VBD')
('to', 'ADP', 'IN')
('the', 'DET', 'DT')
('park', 'NOUN', 'NN')
('.', 'PUNCT', '.')


### results for `sent_0`
- matiew is a name and correcly marked as proper noun
- went is a verb
- park is a noun

In [7]:
# sent_1

for token in sent_1:
    print((token.text, token.pos_, token.tag_)) 

('If', 'ADP', 'IN')
('Clement', 'PROPN', 'NNP')
('was', 'VERB', 'VBD')
('asked', 'VERB', 'VBN')
('to', 'PART', 'TO')
('take', 'VERB', 'VB')
('out', 'PART', 'RP')
('the', 'DET', 'DT')
('garbage', 'NOUN', 'NN')
(',', 'PUNCT', ',')
('he', 'PRON', 'PRP')
('would', 'VERB', 'MD')
('refuse', 'VERB', 'VB')
('.', 'PUNCT', '.')


- refuse is a verb
- garbage is a noun

In [8]:
# sent_2

for token in sent_2:
    print((token.text, token.pos_, token.tag_))

('Baptiste', 'PROPN', 'NNP')
('was', 'VERB', 'VBD')
('in', 'ADP', 'IN')
('charge', 'NOUN', 'NN')
('of', 'ADP', 'IN')
('the', 'DET', 'DT')
('refuse', 'ADJ', 'JJ')
('treatment', 'NOUN', 'NN')
('center', 'NOUN', 'NN')
('.', 'PUNCT', '.')


- refuse is incorrecly tagged as a noun
- something `Baptiste` in charge of 
- three words are all nouns called *nounphrase*

In [11]:
# sent_3

for token in sent_3:
    print((token.text, token.pos_, token.tag_))

('Marie', 'PROPN', 'NNP')
('took', 'VERB', 'VBD')
('out', 'PART', 'RP')
('her', 'PRON', 'PRP')
('rather', 'ADV', 'RB')
('suspicious', 'ADJ', 'JJ')
('and', 'CCONJ', 'CC')
('fishy', 'ADJ', 'JJ')
('cat', 'NOUN', 'NN')
('to', 'PART', 'TO')
('go', 'VERB', 'VB')
('fish', 'NOUN', 'NN')
('for', 'ADP', 'IN')
('fish', 'NOUN', 'NN')
('.', 'PUNCT', '.')


- made to fool our tagger with diffrent variations of the word fish


# Trainign our won POS-taggers


![](https://i.imgur.com/RsgNAkz.png)


a simple trainign loop would look like this

```python
TRAIN_DATA = [ 
     ("Facebook has been accused for leaking personal data of users.", {'entities': [(0, 8, 'ORG')]}), 
     ("Tinder uses sophisticated algorithms to find the perfect match.", {'entities': [(0, 6, "ORG")]})]

nlp = spacy.blank('en')
optimizer = nlp.begin_training()
for i in range(20):
    random.shuffle(TRAIN_DATA)
    for text, annotations in TRAIN_DATA:
        nlp.update([text], [annotations], sgd=optimizer)
nlp.to_disk('/model')
```


Training a POS-tagger isn't any different in theory, and we will be using the example code (train_tagger.py [18]) in the spaCy GitHub page which guides us in how to do this.

In [12]:
import plac
import random
from pathlib import Path
import spacy


TAG_MAP = {
    'N': {'pos': 'NOUN'},
    'V': {'pos': 'VERB'},
    'J': {'pos': 'ADJ'}
}

- setup basic imports
- initialize the TAG_MAP dict
- we need to define a mapping form the datas part-of-speach tag names to the **universal part-of-speach tag** set
-  in this example we only intedn to train nouns, verbs and adjetives so we includ ethese in our tag map

In [28]:
# small set, mre data results ina better model
# jsut an idea of how the trainig data should look like
TRAIN_DATA = [
    ("I like green eggs", {'tags': ['N', 'V', 'J', 'N']}),
    ("Eat blue ham", {'tags': ['V', 'J', 'N']})
]

# We set up some annotations for the language, output directory, and a number of training iterations.


plac.annotations(
    lang=("ISO Code of language to use", "option", "l", str),
    output_dir=("Optional output directory", "option", "o", Path),
    n_iter=("Number of training iterations", "option", "n", int))


def main(lang='en', output_dir=None, n_iter=25):
    """Create a new model, set up the pipeline and train the tagger. In order to
    train the tagger with a custom tag map, we're creating a new Language
    instance with a custom vocab.
    """
    nlp = spacy.blank(lang)
    # add the tagger to the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    tagger = nlp.create_pipe('tagger')
    # Add the tags. This needs to be done before you start training.
    for tag, values in TAG_MAP.items():
        tagger.add_label(tag, values)
    nlp.add_pipe(tagger)

    optimizer = nlp.begin_training()
    for i in range(n_iter):
        random.shuffle(TRAIN_DATA)
        losses = {}
        for text, annotations in TRAIN_DATA:
            nlp.update([text], [annotations], sgd=optimizer, losses=losses)
        print(losses)

    # test the trained model
    test_text = "I like blue eggs"
    doc = nlp(test_text)
    print('Tags', [(t.text, t.tag_, t.pos_) for t in doc])

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the save model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
        doc = nlp2(test_text)
        print('Tags', [(t.text, t.tag_, t.pos_) for t in doc])

In [32]:
if __name__ == '__main__':
    plac.call(main)

usage: ipykernel_launcher.py [-h] [lang] [output_dir] [n_iter]
ipykernel_launcher.py: error: unrecognized arguments: -f


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [31]:
plac.call(main)

usage: ipykernel_launcher.py [-h] [lang] [output_dir] [n_iter]
ipykernel_launcher.py: error: unrecognized arguments: -f


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
