# POS tagging and Chunking

To help the machine understand a sentence, we will tell it what each word is.
For that we use **P**art **O**f **S**peech tagging and **Chunking**.

- [More info here](https://medium.com/greyatom/learning-pos-tagging-chunking-in-nlp-85f7f811a8cb)


## What is Part of Speech?

"The part of speech explains how a word is used in a sentence. There are 8 main POS tags: nouns, pronouns, adjectives, verbs, adverbs, prepositions, conjunctions and interjections."

## How to do that?

A lot of tools are performing this task. But SpaCy (again...) does it quite well. When you use the `nlp` object from it, it applies a complete preprocessing pipeline, including POS tagging.

#### Let's practice: can you find the POS tag for each word using SpaCy?

In [None]:

import spacy
nlp = spacy.load("en_core_web_sm")

text = "I am a junior data scientist at Becode and my ultimate dream is to become a famous NLP engineer"

doc = nlp(text)

for token in doc:
    
    pos = ## TO COMPLETE
    print(token, "--", pos)

## What is chunking?

"Chunking is a process of extracting phrases from unstructured text. Instead of just simple tokens which may not represent the actual meaning of the text, its advisable to use phrases such as “South Africa” as a single word instead of ‘South’ and ‘Africa’ separate words."

## How to do that?

Well, every library has its own way of doing it. Let's see how SpaCy does it with [displacy, their vizaluazation tool](https://spacy.io/usage/visualizers):

In [None]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
text = """In ancient Rome, some neighbors live in three adjacent houses. In the center is the house of Senex, who lives there with wife Domina, son Hero, and several slaves, including head slave Hysterium and the musical's main character Pseudolus."""

# Preprocess the text
doc = nlp(text)
# Create a list of sentence
sentence_spans = list(doc.sents)
# Display SpaCy vizualizer for each sentence
displacy.render(sentence_spans, style="dep")

Now, search how SpaCy chunks the text. 

In [None]:
# Print the text's chunking by using the Doc object

# Additional resources
* [Learning POS tagging & chunking in NLP](https://medium.com/greyatom/learning-pos-tagging-chunking-in-nlp-85f7f811a8cb)
* [Spacy API](https://spacy.io/api)