# Entity linking

## Scope
- Named entity recognition
    - Detect person, organization, location, etc.
- Disambiguation
    - Map each detected entity to corresponding entity in knowledge base.
    - For example, “Michael Jordan is a machine learning professor at UC Berkeley.”
        - Michael Jordan linked to the professor at the University of California, Berkeley entity in the knowledge base.
        - UC Berkeley is linked to the University of California entity in the knowledge base.

## Metric
- There will be separate metric for each of three components.
    - Named entity recognition
    - Disambiguation
    - Entity linking as a whole
    
### Named entity recognition (Offline)

<img src="img/entity-linking1.png" style="width:800px;height:800px;">

- Precision = # of correctly recognized named entities / # of total recognized named entitied
- Recall = # of correctly recognized named entities / # of named entities in corpus
- F1 score = 2 * precision * recall / (precision + recall)

### Disambiguation (Offline)

<img src="img/entity-linking2.png" style="width:400px;height:200px;">
<img src="img/entity-linking3.png" style="width:400px;height:200px;">
<img src="img/entity-linking4.png" style="width:400px;height:200px;">

- Recall doesn't make sense.
- Precision = # of mentions correctly linked / # of total mentions

### Micro average (Offline)
- Aggregates contributions of all documents to compute average.
- Precision = sum of TP / (sum of TP + sum of FP)
- Recall = sum of TP / (sum of TP + sum of FN)
- Micro-averaged F1-score is computed using above.

### Macro average (Offline)
- Computes metrics independently for each document and takes the average.
- Precision = sum of Precision over documents / n 
- Recall = sum of Recall over documents / n 
- Macro-averaged F1-score is computed using above.

## Architecture

<img src="img/entity-linking5.png" style="width:1000px;height:600px;">

### Model
- Traditional word embedding like Word2vec does not understand the context.

### ELMo (Embeddings from Language Models)
- Starts with something like Word2vec.
- Raw vectors are fed into bidirectional LSTM layer.
- Forward and backward LSTMs are trained independently.
- Word representations cannot take advantage of left and right context simultaneously.

### BERT (Bidirectional encoder representations from transformers)
- Take input sentenses, which can be multiple sentences separated by SEP tag.
- Each word is converted to embedding and fed into transformer encoder layer.
- All words are processed simultaneously in the layer.
- Final transformer layer outputs the contextualized representation of each word.

### NER modelling
- Option 1. Use embeddings generated by BERT as features in NER modelling.
- Option 2. Take pre-trained models and fine-tune them based on NER dataset.

### Disambiguation modeling

#### Candidate generation
- Build an index where terms are mapped to knowledge base entities.
- Index should include all terms that could possibly refer to an entity.

#### Linking
- Build a model that gives the probability of a candidate being true match for a recognized entity.
- Inputs to this model should be represented by BERT/ELMo embeddings.

## Training data generation

### Open dataset
- Named entity recognition
    - CoNLL-2003
- Disambiguation
    - AIDA CoNLL-YAGO Dataset