# Named Entity Recognition
The chatbot should also be able, based on the intent, to label entities that it stores in it's dialog management so that it's replies are more accurate. This notebook is where for every Tweet, I try to extract the entities. This should be added to my pipeline later.

More particularly, from the utterance as input, I want the output to be all the entities in that utterance stored in a dictionary.

In [4]:
# Data science
import pandas as pd
print(f"Pandas: {pd.__version__}")
import numpy as np
print(f"Numpy: {np.__version__}")

# Deep Learning 
import tensorflow as tf
print(f"Tensorflow: {tf.__version__}")
from tensorflow import keras
print(f"Keras: {keras.__version__}")
import sklearn
print(f"Sklearn: {sklearn.__version__}")

# NER
import spacy
print(f'spaCy: {spacy.__version__}')

# Visualization 
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks", color_codes=True)

# Cool progress bars
from tqdm import tqdm_notebook as tqdm
tqdm().pandas()  # Enable tracking of execution progress

import collections
import yaml

# Reading back in intents
with open(r'objects/intents.yml') as file:
    intents = yaml.load(file, Loader=yaml.FullLoader)

# Reading in representative intents
# with open(r'objects/intents_repr.yml') as file:
#     intents_repr = yaml.load(file, Loader=yaml.FullLoader)
    
# Reading in training data
train = pd.read_pickle('objects/train.pkl')

print(train.head())
print(f'\nintents:\n{intents}')

Pandas: 1.0.5
Numpy: 1.18.5
Tensorflow: 2.2.0
Keras: 2.3.0-tf
Sklearn: 0.23.1
spaCy: 2.3.0


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))




  from pandas import Panel


                                           Utterance   Intent
0  [could, drain, beer, quickly, io, update, drai...  Battery
1  [annnd, release, best, update, yet, new, emoji...   Update
2        [iphone, connect, wifi, anymore, since, io]   iphone
3  [able, update, facebook, status, via, siri, io...      app
4  [run, high, sierra, can, not, view, iphone, x,...      mac

intents:
{'battery': ['battery', 'power'], 'forgot_password': ['password', 'account', 'login'], 'lost_replace': ['replace', 'lost', 'gone', 'missing', 'trade', 'trade-in'], 'payment': ['credit', 'card', 'payment', 'pay'], 'repair': ['repair', 'fix', 'broken'], 'update': ['update']}


# Entities to Label:

<img src="visualizations/entity_list.png" alt="Drawing" style="width: 500px;"/>

# NER Methods

Named Entities are just real world objects that are assigned a name.

This is often seen as a sequential prediction problem.

Schemes:
* **BIO scheme** (Beginning, Inside, Outside)
* **Bilou scheme** is more rigorous, but requires more training examples.

There are multiple ways to get this done:
* Linear Chain Conditional Random Fields (Linear Chain CRF)
* Maximum Entropy Markov Models
* Bidirectional-LSTM

SpaCy's entity extractor is good for utilizing pretrained NERs. In their words:
    
    spaCy features an extremely fast statistical entity recognition system, that assigns labels to contiguous spans of tokens. The default model identifies a variety of named and numeric entities, including companies, locations, organizations and products. You can add arbitrary classes to the entity recognition system, and update the model with new examples.
   
However, I would like more custom entities. CRF or Rasa's DIET Classifier is good for this. Almost every chatbot will have some custom entities.

# Conditional Random Field
I implement this formula:

<img src="visualizations/CRF.png" alt="Drawing" style="width: 400px;"/>

Sources:
* [NER Using CRF](https://medium.com/data-science-in-your-pocket/named-entity-recognition-ner-using-conditional-random-fields-in-nlp-3660df22e95c)
* [CRF NER with Python](https://www.aitimejournal.com/@akshay.chavan/complete-tutorial-on-named-entity-recognition-ner-using-python-and-keras)
    * [His Git Repo](https://github.com/Akshayc1/named-entity-recognition)
* [Rasa Entity Extraction Docs](https://rasa.com/docs/rasa/nlu/entity-extraction/)
* [Training my own custom NER model](https://towardsdatascience.com/custom-named-entity-recognition-using-spacy-7140ebbb3718)
* [Apple Tagging](https://heartbeat.fritz.ai/natural-language-in-ios-12-customizing-tag-schemes-and-named-entity-recognition-caf2da388a9f) - This guy did what I want to do, but it's super verbose
* [Tagging schemes or types](https://natural-language-understanding.fandom.com/wiki/Named_entity_recognition)

In [None]:
# Looks like I have to make my own training data


In [7]:
train

Unnamed: 0,Utterance,Intent
0,"[could, drain, beer, quickly, io, update, drai...",Battery
1,"[annnd, release, best, update, yet, new, emoji...",Update
2,"[iphone, connect, wifi, anymore, since, io]",iphone
3,"[able, update, facebook, status, via, siri, io...",app
4,"[run, high, sierra, can, not, view, iphone, x,...",mac
...,...,...
5995,"[phone, work, perfectly, fine, stupid, as, upd...",Update
5996,"[dear, please, let, u, downgrade, io, io, make...",iphone
5997,"[hey, software, update, battery, go, battery, ...",app
5998,"[hi, since, update, download, apps, keep, load...",mac


# Updating training data
Here I add an `entity` as well to each Tweet in addition to it's intent.