---
Named Entity Recognition (NER)
---

![](https://researchkb.files.wordpress.com/2014/02/ner.png)

> Knowledge worker adds value to information.   
> \- Peter Drucker 

---

> Data Scientist adds value to data.  
> \- Brian Spiering


By The End Of This Session You Should Be Able To:
----

- Explain how NER builds on POS tagging
- Describe conceptually how to train NER
- Pick the best system for NER tagging

---
NER Overview
---

- NER is a "Strict Type" system for human language



- NER is "easy" in English 

---
Common NE Types
---

- ORGANIZATION: Georgia-Pacific Corp., WHO
- PERSON: Eddy Bonte, President Obama
- LOCATION: Murray River, Mount Everest
- DATE: June, 2008-06-29
- TIME: two fifty a m, 1:30 p.m.
- MONEY: 175 million Canadian Dollars, GBP 10.40
- PERCENT: twenty pct, 18.75 %
- FACILITY: Washington Monument, Stonehenge
- GPE (Geo-political entity): South East Asia, Midlothian

Feel free to define your own
-----

- PRESIDENT: Trump, Washington, Lincoln
- COUNTRY: Thailand
- POSITION / JOB_TITLE: Product Manager, Data Scientist
- PRODUCT: Apple Watch

---
NER Methods (POS flashback)
---

1. Rule-based, aka make a dictionary
2. Statistical Models, aka using Graphical Models
3. Deep Learning, aka what everyone does now

---
1. Rule based
----

Use a combination of lists and regular expressions to identify named entities. 

Examples:

```python
{"Dick": PERSON,
"Jane": PERSON}
```

Gazetteers
-----

![](images/gazetteer.png)

> A gazetteer consists of a set of lists containing names of entities such as cities, organizations, days of the week, etc. These lists are used to find occurrences of these names in text, e.g. for the task of named entity recognition.

Gazetteers
-----

[How to make a gazetteer](http://www.aclweb.org/anthology/P08-1047)

Then use it to train other models

![](images/extend)

Gazetteers
-----

__Pros__:

- Simplest model (that could possible work)
- Minimum Viable Solution (MVP)
- Works for most cases overall
- Performs nicely within specific, well-understood, static domains

Gazetteers
-----

__Cons__:

- Deterministic
- Brittle
- Maintaining the lists is labor intensive
- Moving to other languages or domains may involve repeating much of the work.
- Many proper nouns are also valid in other ways (such as Will or Hope). 
- Names of people and places are often the same — Washington (state, D.C., or George) or Cicero (the ancient philosopher, the town in New York, or some other place).Remember - dealing with ambiguity is hard.
- Many names are conjunctions of other names, such as the Scottish Exhibition and Conference Center. It’s not always clear where the name ends.
- It’s difficult to model dependencies between names across a document using rules based on regular expressions.



---
2. Statistical Models
---

Typically the classifier looks at each word in a sentence and decides whether it’s the start of a named entity, the continuation of an already started entity, or not part of a name at all. By combining these predictions, you can use a classifier to identify a sequence of words that make up a name.

- Currently most common
- "Good enough" for performance and speed
- Examples:
    - Conditional random fields (CRFs)
    - Hidden Markov model (HMM)
    - Viterbi

Statistical Models
-----

__Pros__

- Rule-based methods can be incorporated as features and as such are only one source of information.
- Moving to other languages or domains may only involve minimal code changes.
- It’s easier to model the context within a sentence and in a document.
- The classifier can be retrained to incorporate additional text or other features.
- With sufficient amounts of training data, the performance can be near to human quality, even if people are less than perfect at the task of identifying names. 
- Good NER systems are usually capable of better than 90% of the time in evaluation.



Statistical Models
-----

__Cons__

- The main disadvantage of such approaches is the need for human-annotated data.
- Also you may not have enough data

3. Deep Learning via Recurrent Neural Network (RNN)
----

![](https://s3.amazonaws.com/poly-screenshots.angel.co/Project/56/200582/51ca1ecf91f8477a4d8f0a796f6c62c4-original.png)

----
High Level Overview
---

![](images/ie-architecture.png)

Summary
----

- NER is more specific version of POS tagging
- NER can be done with rules, graphical models, or deep learning

<br>
<br>
----