# Chapter 3 Named Entity Recognition (NER) With BiLSTMs, CRFs & Viterbi Decoding

Named Entity Recognition (NER) is One of the fundamental building blocks of NLU
The names of people, companies, products, and quantities can be tagged in a piece of text with NER, which is very useful in chatbot among others

## Named Entity Recognition

the objective of an NER model is to locate and classify text tokens as named entities in categories such as people's names, organizations and companies, physical locations, quantities, monetary quantities, times, dates, and even protein or DNA sequences

| Type | Example Tag | Example |
| Person | PER | Gregory went to the castle. |
| Organization | ORG | WHO just issued an epidemic advisory. |
| Location | LOC | She lives in Seattle. |
| Money | MONEY | You owe me twenty dollars. |
| Percentage | PERCENT | Stocks have risen 10% today. |
| Date | DATE | Let's meet on Wednesday. |
| Time | TIME | Is it 5 pm already? |

Data Set Examples
 random collection -> https://github.com/juand-r/entity-recognition-datasets
 re3d -> The Defence Science Technology Laboratory (https://github.com/dstl/re3d)

There are a few different ways to build an NER model
- Part of Speech (POS) tagging are applicable, The POS of a word and its neighboring words are the most straightforward features to add
- Word shape features that model lowercase letters can add a lot of information, principally because a lot of the entity types deal with proper nouns, such as those for people and organizations

Another vital feature involves checking a word in a gazetteer.
A gazetteer is like a database of important geographical entities http://geonames.org/

### The GMB data set
This dataset is not considered a gold standard. This means that this data set is built using automatic tagging software, followed by human raters updating subsets of the data

geo = Geographical entity
org = Organization
per = Person
gpe = Geopolitical entity
tim = Time indicator
art = Artifact
eve = Event
nat = Natural phenomenon

### Loading the data

here there was alot about download the data from a site, and extracting the data from the directories.

### Normalizing and vectorizing data
For this section, pandas and numpy methods will be used. 

- The first step is to load the contents of the processed files into one DataFrame:
- The next step is Both the text and NER tags need to be tokenized and encoded into numbers for use in training
- The last step above is to ensure that shapes are correct before moving to the next step.
- There is an additional step that needs to be performed on the labels. Since there are multiple labels, each label token needs to be one-hot encoded

Now, we are ready to build and train a model.

### A BiLSTM model
build the model using layers
split the data into train and test
train the model
and evaluate the model

### Conditional random fields (CRFs)
consider a subset of NER tags: O, B-Per, I-Per, B-Geo, and I-Geo.
here we use a wieght from on tag to the other and build a matrix of these weights.
from the weights we can determine the likelyhood of one word be followed by another word based on that words tags

example I-Org to B-Org has a weight of -1.38, implying that this transition is extremely unlikely to happen.


### NER with BiLSTM and CRFs
here we have to build a model layer using keras, create a custom class that passes the output from one layer as the input into another layer

example:
        inputs = self.embedding(text)
        bilstm = self.biLSTM(inputs)
        logits = self.dense(bilstm)
        outputs = self.crf(logits, seq_lengths, training)

### Viterbi decoding
The Viterbi algorithm is used to take the predictions for each word in the sequence and apply a maximization algorithm so that the output sequence has the highest likelihood