# Week 6

# ― NER in action with Flair

# Why Flair?

There are several arguments:

+ It builds upon PyTorch, which excels on many dimensions
+ It's flexible ― users can choose among several models
+ It draws upon novel, context-aware models (BERT & friends)  
+ It has its own embeddings
+ In relative terms, it's very accurate

<img src='images/_22.png' width=50%>

# Flair's competitors

<img src='images/_21.png' width=50%>

Source: https://towardsdatascience.com/benchmark-ner-algorithm-d4ab01b2d4c3

# spaCy's prodigy library

<img src='images/_23.png' width=50%>

Source: https://prodi.gy/

# Installation of Flair

Flair depends on a fair number of complex libraries; here are some tips to
to succesfully go through the installation process:

+ create an ad-hoc environment
+ Python 3.6/3.7 are preferrable
+ Install PyTorch (torch) first
+ `pip install flair` (even if you're Conda user)

# Load Flair + some modules

In [57]:
import flair.datasets
from flair.models import SequenceTagger
from flair.embeddings import WordEmbeddings
from flair.embeddings import FlairEmbeddings

In [58]:
tagger = SequenceTagger.load('ner')
framing = SequenceTagger.load('frame')
glove_embedding = WordEmbeddings('glove')
flair_embedding_forward = FlairEmbeddings('news-forward')
corpus = flair.datasets.UD_ENGLISH()

2020-06-24 13:46:56,798 loading file /home/simone/.flair/models/en-ner-conll03-v0.4.pt
2020-06-24 13:46:57,852 loading file /home/simone/.flair/models/en-frame-ontonotes-v0.4.pt
2020-06-24 13:47:00,731 Reading data from /home/simone/.flair/datasets/ud_english
2020-06-24 13:47:00,731 Train: /home/simone/.flair/datasets/ud_english/en_ewt-ud-train.conllu
2020-06-24 13:47:00,732 Dev: /home/simone/.flair/datasets/ud_english/en_ewt-ud-dev.conllu
2020-06-24 13:47:00,733 Test: /home/simone/.flair/datasets/ud_english/en_ewt-ud-test.conllu


# Sentences are the fundamental unit of analysis in Flair

In [59]:
from flair.data import Sentence

In [60]:
# sample sentence (by S. Johnson)
document = 'when a man is tired of London, he is tired of life'

In [61]:
# make a Flair sentence
sentence = Sentence(document)

In [62]:
# print sentence
print(sentence)

Sentence: "when a man is tired of London, he is tired of life"   [− Tokens: 12]


In [63]:
# ...OR run NER over sentence
tagger.predict(sentence)

[Sentence: "when a man is tired of London, he is tired of life"   [− Tokens: 12  − Token-Labels: "when a man is tired of London, <S-LOC> he is tired of life"]]

# Tokenizer

In [64]:
# tokenize your sentence
sentence = Sentence(document, use_tokenizer=True)

In [65]:
# inspect tokens
for token in sentence:
    print(token)

Token: 1 when
Token: 2 a
Token: 3 man
Token: 4 is
Token: 5 tired
Token: 6 of
Token: 7 London
Token: 8 ,
Token: 9 he
Token: 10 is
Token: 11 tired
Token: 12 of
Token: 13 life


Notes:

+ the segtock library is the default tokenizer of Flair
+ custom tokenizers can be used

# Text annotation

In [69]:
# sample sentence
sentence = Sentence('Red wine is my favourite')

# get token 1 in the sentence 
token = sentence[0]

# add label
token.add_tag('ner', 'color')

# get the 'ner' tag of the token
tag = token.get_tag('ner')

# print token
print(f'"{token}" is tagged as "{tag.value}" with confidence score "{tag.score}"')

"Token: 1 Red" is tagged as "color" with confidence score "1.0"


# Adding labels

In [70]:
sentence = Sentence('France is the current world cup winner.')

# this sentence has multiple "topic" labels
sentence.add_label('topic', 'sports')
sentence.add_label('topic', 'soccer')

# this sentence has a "language" labels
sentence.add_label('language', 'English')

print(sentence)

Sentence: "France is the current world cup winner."   [− Tokens: 7  − Sentence-Labels: {'topic': [sports (1.0), soccer (1.0)], 'language': [English (1.0)]}]


In [71]:
for label in sentence.labels:
    print(label)

sports (1.0)
soccer (1.0)
English (1.0)


# Tagging with pre-trained models

In [72]:
sentence = Sentence('George Washington went to Washington .')

# predict NER tags
tagger.predict(sentence)

# print sentence with predicted tags
print(sentence.to_tagged_string())

George <B-PER> Washington <E-PER> went to Washington <S-LOC> .


In [73]:
for entity in sentence.get_spans('ner'):
    print(entity)

Span [1,2]: "George Washington"   [− Labels: PER (0.9968)]
Span [5]: "Washington"   [− Labels: LOC (0.9994)]


In [74]:
from pprint import pprint
pprint(sentence.to_dict(tag_type='ner'))

{'entities': [{'end_pos': 17,
               'labels': [PER (0.9968)],
               'start_pos': 0,
               'text': 'George Washington'},
              {'end_pos': 36,
               'labels': [LOC (0.9994)],
               'start_pos': 26,
               'text': 'Washington'}],
 'labels': [],
 'text': 'George Washington went to Washington .'}


# Word sense disambiguation

In [75]:
# make English sentence
sentence_1 = Sentence('George returned to Berlin to return his hat .')
sentence_2 = Sentence('He had a look at different hats .')

# predict NER tags
framing.predict(sentence_1)
framing.predict(sentence_2)

# print sentence with predicted tags
pprint(sentence_1.to_tagged_string())
pprint(sentence_2.to_tagged_string())

('George <_> returned <return.01> to <_> Berlin <_> to <_> return <return.02> '
 'his <_> hat <_> . <_>')
'He <_> had <have.03> a <_> look <look.01> at <_> different <_> hats <_> . <_>'


# Embeddings

## Glove embeddings

In [77]:
# sample sentence
sentence = Sentence('Red wine is my favourite')

# embed a sentence using glove.
glove_embedding.embed(sentence)

# now check out the embedded tokens.
for token in sentence:
    print(token)
    print(token.embedding)

Token: 1 Red
tensor([-0.3002,  0.5015, -0.1275, -0.8164,  0.3361,  0.3221, -0.0474,  0.0371,
        -0.6158, -0.2233, -0.3913, -0.3189,  0.8709,  0.7445,  0.2371,  0.3177,
         0.6132, -0.4816,  0.5545, -0.4877, -0.1187,  0.1520, -0.4388,  0.0452,
         0.6666,  0.6442, -0.2181, -0.2422,  0.1765, -0.7179,  0.4889,  0.2287,
         0.0800,  0.1224,  0.1864,  0.2052, -0.3514,  0.8317,  0.8658,  0.3340,
         0.4451, -0.9813, -0.1045, -0.1020,  0.6549,  0.1068, -0.0953,  0.5637,
         0.0488, -0.1084,  0.1054,  0.0412, -0.2939,  1.0227, -0.8657, -2.5878,
        -0.5008,  0.9758,  1.5560,  0.4521, -0.5428,  0.8199, -0.6083,  0.1992,
         0.7497, -0.3914,  0.0605, -0.0569, -0.0121,  0.0621,  0.0706, -0.4798,
        -0.8661, -0.5934,  0.5765,  0.9837, -0.0351,  0.4203, -0.4059,  0.3510,
         0.8739, -0.0694, -0.6869,  0.1860, -0.3690, -0.0218, -0.1014, -0.0376,
         0.5682,  0.7438, -0.2871, -1.0705, -0.5070, -0.1258, -0.9040, -0.2559,
        -1.3706,  0.1731,  

## Flair embeddings

Contextual string embeddings are powerful embeddings that capture latent syntactic-semantic information that goes beyond standard word embeddings.

Key differences are: 

1. they are trained without any explicit notion of words and thus fundamentally model words as sequences of characters
2. they are contextualized by their surrounding text, meaning that the same word will have different embeddings depending on its contextual use.

In [78]:
# init embedding
flair_embedding_forward = FlairEmbeddings('news-forward')

# create a sentence
sentence = Sentence('Red wine is my favourite')

# embed words in sentence
flair_embedding_forward.embed(sentence)

[Sentence: "Red wine is my favourite"   [− Tokens: 5]]

In [79]:
# now check out the embedded tokens.
for token in sentence:
    print(token)
    print(token.embedding)

Token: 1 Red
tensor([-0.0002,  0.0018, -0.0321,  ...,  0.0012,  0.0179,  0.0066])
Token: 2 wine
tensor([-0.0008,  0.0027,  0.0160,  ..., -0.0028, -0.0090,  0.0397])
Token: 3 is
tensor([ 0.0008, -0.0015,  0.0603,  ..., -0.0050,  0.0121,  0.0107])
Token: 4 my
tensor([ 8.9246e-05,  8.9543e-05,  2.1088e-02,  ..., -6.4441e-04,
         4.2823e-02,  1.8494e-03])
Token: 5 favourite
tensor([-9.1581e-04, -4.5748e-05,  1.9024e-02,  ..., -3.2326e-04,
        -1.3051e-04,  2.0492e-02])


The following word embeddings are currently supported: 

| Class | Type | Paper | 
| ------------- | -------------  | -------------  | 
| [`BytePairEmbeddings`](/resources/docs/embeddings/BYTE_PAIR_EMBEDDINGS.md) | Subword-level word embeddings | [Heinzerling and Strube (2018)](https://www.aclweb.org/anthology/L18-1473)  |
| [`CharacterEmbeddings`](/resources/docs/embeddings/CHARACTER_EMBEDDINGS.md) | Task-trained character-level embeddings of words | [Lample et al. (2016)](https://www.aclweb.org/anthology/N16-1030) |
| [`ELMoEmbeddings`](/resources/docs/embeddings/ELMO_EMBEDDINGS.md) | Contextualized word-level embeddings | [Peters et al. (2018)](https://aclweb.org/anthology/N18-1202)  |
| [`FastTextEmbeddings`](/resources/docs/embeddings/FASTTEXT_EMBEDDINGS.md) | Word embeddings with subword features | [Bojanowski et al. (2017)](https://aclweb.org/anthology/Q17-1010)  |
| [`FlairEmbeddings`](/resources/docs/embeddings/FLAIR_EMBEDDINGS.md) | Contextualized character-level embeddings | [Akbik et al. (2018)](https://www.aclweb.org/anthology/C18-1139/)  |
| [`OneHotEmbeddings`](/resources/docs/embeddings/ONE_HOT_EMBEDDINGS.md) | Standard one-hot embeddings of text or tags | - |
| [`PooledFlairEmbeddings`](/resources/docs/embeddings/FLAIR_EMBEDDINGS.md) | Pooled variant of `FlairEmbeddings` |  [Akbik et al. (2019)](https://www.aclweb.org/anthology/N19-1078/)  | 
| [`TransformerWordEmbeddings`](/resources/docs/embeddings/TRANSFORMER_EMBEDDINGS.md) | Embeddings from pretrained [transformers](https://huggingface.co/transformers/pretrained_models.html) (BERT, XLM, GPT, RoBERTa, XLNet, DistilBERT etc.) | [Devlin et al. (2018)](https://www.aclweb.org/anthology/N19-1423/) [Radford et al. (2018)](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)  [Liu et al. (2019)](https://arxiv.org/abs/1907.11692) [Dai et al. (2019)](https://arxiv.org/abs/1901.02860) [Yang et al. (2019)](https://arxiv.org/abs/1906.08237) [Lample and Conneau (2019)](https://arxiv.org/abs/1901.07291) |  
| [`WordEmbeddings`](/resources/docs/embeddings/CLASSIC_WORD_EMBEDDINGS.md) | Classic word embeddings |  |

# Training a model

## Scenario A: using annotated (prepared) datasets

<img src='images/_24.png' width=50%>

## Scenario B: using your own annotated data

```console
George N B-PER
Washington N I-PER
went V O
to P O
Washington N B-LOC

Sam N B-PER
Houston N I-PER
stayed V O
home N O
```

In [56]:
from flair.data import Corpus
from flair.datasets import ColumnCorpus

# define columns
columns = {0: 'text', 1: 'pos', 2: 'ner'}

# this is the folder in which train, test and dev files reside
data_folder = '/path/to/data/folder'

# # init a corpus using column format, data folder and the names of the train, dev and test files
# corpus: Corpus = ColumnCorpus(data_folder, columns,
#                               train_file='train.txt',
#                               test_file='test.txt',
#                               dev_file='dev.txt')

## Let's get back to Scenario A.

In [53]:
from flair.data import Corpus
from flair.datasets import UD_ENGLISH
from flair.embeddings import TokenEmbeddings, WordEmbeddings, StackedEmbeddings

# 1. get the corpus
corpus: Corpus = UD_ENGLISH().downsample(0.1)
print(corpus)

# 2. what tag do we want to predict?
tag_type = 'pos'

# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
print(tag_dictionary)

# 4. initialize embeddings
embedding_types = [

    WordEmbeddings('glove'),

    # comment in this line to use character embeddings
    # CharacterEmbeddings(),

    # comment in these lines to use flair embeddings
    # FlairEmbeddings('news-forward'),
    # FlairEmbeddings('news-backward'),
]

embeddings: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_types)

# 5. initialize sequence tagger
from flair.models import SequenceTagger

tagger: SequenceTagger = SequenceTagger(hidden_size=256,
                                        embeddings=embeddings,
                                        tag_dictionary=tag_dictionary,
                                        tag_type=tag_type,
                                        use_crf=True)

# 6. initialize trainer
from flair.trainers import ModelTrainer

trainer: ModelTrainer = ModelTrainer(tagger, corpus)

# 7. start training
trainer.train('resources/taggers/example-pos',
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=150)

2020-06-24 11:34:59,511 Reading data from /home/simone/.flair/datasets/ud_english
2020-06-24 11:34:59,512 Train: /home/simone/.flair/datasets/ud_english/en_ewt-ud-train.conllu
2020-06-24 11:34:59,512 Dev: /home/simone/.flair/datasets/ud_english/en_ewt-ud-dev.conllu
2020-06-24 11:34:59,513 Test: /home/simone/.flair/datasets/ud_english/en_ewt-ud-test.conllu
Corpus: 1254 train + 200 dev + 208 test sentences
Dictionary with 53 tags: <unk>, O, PRP, RB, VBP, IN, VB, VBN, JJR, NNP, ., DT, JJ, NN, ,, VBD, NNS, CC, VBG, MD, EX, CD, PRP$, WP, POS, VBZ, TO, WRB, JJS, UH
2020-06-24 11:35:11,812 ----------------------------------------------------------------------------------------------------
2020-06-24 11:35:11,813 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings('glove')
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=100, out_features=100, bias=True)
  (rnn): LSTM(100, 256, 

{'test_score': 0.834558093346574,
 'dev_score_history': [0.26642501132759405,
  0.4461538461538461,
  0.539768864717879,
  0.6363636363636364,
  0.6601853945286005,
  0.6957110609480812,
  0.7068654019873533,
  0.7168960072185879,
  0.7465583389754006,
  0.753780185059806,
  0.7657555906934719,
  0.778505305938135,
  0.780388612742883,
  0.7717317678934297,
  0.7920523820275457,
  0.7978314885927265,
  0.7847300655071154,
  0.8006314839873703,
  0.8094808126410835,
  0.8121896162528216,
  0.8159855497855046,
  0.8220415537488708,
  0.8189616252821671,
  0.8131917777275808],
 'train_loss_history': [53.158365631103514,
  40.7812376499176,
  33.89535856246948,
  28.774631690979003,
  25.575098514556885,
  23.40834789276123,
  21.724063682556153,
  20.738570284843444,
  19.54983322620392,
  19.030347990989686,
  18.23484809398651,
  17.77282521724701,
  16.75617399215698,
  16.35137629508972,
  16.097274255752563,
  15.75730516910553,
  15.434643626213074,
  15.073633074760437,
  14.601285