Parts of Speech Tagging
---
Group Name: Destiny's Child



![speech](images/speech.jpg "First slide")


- Miguel Romero Calvo
- Jenny Kong
- Louise Lai


Outline
---

What is POS tagging?
    - Old approaches

Machine Learning Solution
    - Grid search
    - Hyperparameter tuning

Experimental approach with Deep Learning

## What is Part-Of-Speech tagging?

![posttagging illustrated](images/posstaggingExample.jpeg "Parts Of Speech Tagging")


Examples of part-of-speech tags 
---
|  Tag  | Description|
|-------| -----------|
|CC	|Coordinating conjunction|
|CD	|Cardinal number|
|DT	|Determiner|
|EX	|Existential there|
|FW	|Foreign word|
|IN	|Preposition or subordinating conjunction|
|JJ	|Adjective|
|JJR|	Adjective, comparative|
|JJS|	Adjective, superlative|
|LS	|List item marker|

## Why is Part-Of-Speech tagging hard?

"They refuse to grant us a refuse permit."
![Refuse](images/refuseBoth.png "Refuse")

How do we know which one to tag?
- refUSE (/rəˈfyo͞oz/) is a verb meaning “deny” 
- REFuse (/ˈrefˌyo͞os/) is a noun meaning “trash”



# Why is Part-Of-Speech tagging hard?
![Time Flies](images/timeflies.png "Syntactic Ambiguity")
Ambiguity and context dependence

## Old Approaches

**Rule-based tagging**: <br>

"If word X is preceded by a determiner and followed by a noun, tag it as an adjective." <br>

**Stochastic tagging**:

"The best tag for a given word is determined by the probability that it occurs with the n previous tags."


## What we did
1. A Machine Learning approach (KNN, DecisionTree, LogisticRegression)
2. Deep Learning

Show me the Data
---

Example of one line of data:

![sample data line](images/sampleLine.png "One line of data")


Numbers correspond to these tags:

![POS classes](images/classes.png "Our POS tag classes")



Data
----
![word cloud of frequent words](images/freqwordcloud.png "Word Cloud of Freuent Words")
![barplot of frequent words](images/wordBarplot.png "Word Cloud of Freuent Words")


## Data

![charts of tags](images/tagsBotj.png "Frequent Tags")

|#|POS|
|---|---|
|NNPS| Proper noun, plural|
|PDT| Predeterminer|
|JJS| Adjective, superlative|
|JJ| Adjective|
|JS| List item marker|
|PRP| Personal pronoun|
|CD| Cardinal number|
|VBP| Verb, non-3rd person singular present|

# Again, the goal is to accurately predict a tag for each word
*tags: verb, adjective, noun, preposition etc.*

In [2]:
# Step one: feature extraction
import pprint 
def features(sentence, index):
    return {
        'word': sentence[index],
        'is_first': index == 0,
        'is_last': index == len(sentence) - 1,
        'is_capitalized': sentence[index][0].upper() == sentence[index][0],
        'is_all_caps': sentence[index].upper() == sentence[index],
        'is_all_lower': sentence[index].lower() == sentence[index],
        'prefix-1': sentence[index][0],
        'prefix-2': sentence[index][:2],
        'prefix-3': sentence[index][:3],
        'suffix-1': sentence[index][-1],
        'suffix-2': sentence[index][-2:],
        'suffix-3': sentence[index][-3:],
        'prev_word': '' if index == 0 else sentence[index - 1],
        'next_word': '' if index == len(sentence) - 1 else sentence[index + 1],
        'has_hyphen': '-' in sentence[index],
        'is_numeric': sentence[index].isdigit(),
        'capitals_inside': sentence[index][1:].lower() != sentence[index][1:]
    }
 
pprint.pprint(features(['This', 'is', 'a', 'sentence'], 0)) 

{'capitals_inside': False,
 'has_hyphen': False,
 'is_all_caps': False,
 'is_all_lower': False,
 'is_capitalized': True,
 'is_first': True,
 'is_last': False,
 'is_numeric': False,
 'next_word': 'is',
 'prefix-1': 'T',
 'prefix-2': 'Th',
 'prefix-3': 'Thi',
 'prev_word': '',
 'suffix-1': 's',
 'suffix-2': 'is',
 'suffix-3': 'his',
 'word': 'This'}


# Train Test Split 75/25

train: 26,975 lines

test:   8,992 lines

# Built a Pipeline and Grid Searched models & hypterparameters:

1. Decision Tree Classifier <br>


   
2. K-NN Classifier <br>
    


3. Logistic Regression (*penalty, solver, multi_class*) <br>

Model Comparison
---
|Models|Training Time(sec)|Accuracy|F-Score|
|---|---|---|---|
|Base|-|0.1390|0.0001|
|SVM|63.9|0.1395|0.2892|
|KNN|15.3|0.8678|0.8786|
|Decision Tree|3.5|0.9072|0.9069|
|Tuned Decision Tree|3.4|0.9077|0.9065|
|Logistic Regression|6.0|0.9286|0.9134|
|Tuned Logistic Regression|7.2|0.9304|0.9222|

- Perfomance on unseen data is excellent
- Logistic Regression vs. Decision Trees are both very good (0.9077 v. 0.9304)
- Decision Trees are faster, more interpretable

Model Comparison
-----------

![accuracy](images/AccurayGraph.png "Accuracy")

## Experimental approach with Deep Learning

<br>

### Motivations:

#### - Overcome a plateau. 

<br>


In [7]:
import spacy
from spacy import displacy
nlp = spacy.load('en')
doc = nlp(u"Not immediately close relations")
displacy.render(doc, style='dep', jupyter=True)

## Experimental approach with Deep Learning

<br>

### Motivations:

#### - Overcome the plateau. 

<br>

### Goals:

#### 1. Experimenting a new approach with Transfer Learning and Deep Learning. 

#### 2. Contribute to the FastAI library.


<br>
<br>
<br>


## Approach: ULMFiT

<br>

#### Universal Languge Model Fine-tuning for Text classification, Howard et. al., May 2018.

## Approach: ULMFiT

<br>

#### Universal Languge Model Fine-tuning for Text classification, Howard et. al., May 2018.
                                                       TRAINING DATASET
##### Language Model
<br>

        Hello, how are _ --> you                       Wikipedia corpus

## Approach: ULMFiT

<br>

#### Universal Languge Model Fine-tuning for Text classification, Howard et. al., May 2018.
                                                                 TRAINING DATASET
##### Language Model
<br>

        Hello, how are _ --> you                                 Wikipedia corpus

##### Sentence Classification
<br>

       “The beauty of me is that I’m very rich.” --> Negative    -


## Approach: ULMFiT

<br>

#### Universal Languge Model Fine-tuning for Text classification, Howard et. al., May 2018.
                                                         TRAINING DATASET
##### Language Model
<br>

        Hello, how are _ --> you                         Wikipedia corpus

##### Word Tagging
<br>

        Bob made a book --> noun, verb, article, noun    CoNLL 2003


## Problems with the ULMFiT approach for this task

Current pre-rules, tokenization (Spacy's default) and post-rules not suitable for this task.

#### Example

    'These aren't mine Jeorge !' --->  [xxmaj, these, are, n't, mine, xxmaj, xxunk, !]
                        
While we need:

    'These aren't mine Jeorge !'---> [These, aren't, mine, Jeorge, !]
                        
So we could map each token to a label.

## Implications of the approach


With the `ULMFiT` approach:
          
         1. Pre-trained text treatment is problem dependent.

With Deep Learning and `word2vec-type` embeddings:

         1. Loosing known information when initialized.  


## Conclusion

In summary for this particular Task:

* A ML approach with custom preprocesssing --> Good performance and relatively quick training.
* A DL approach --> It can potentially achieve better performance and expensive in time.

## Thank you !