# Neural NER with flair
In this hands-on, we use https://github.com/zalandoresearch/flair, a state-of-the-art framework for NLP sequence labeling tasks with excellent quality
Their site contains several tutorials that show how to train your own models with your own data.
However, this requires GPUs and several hours of training. It is not feasible on this machine.


## Using a standard flair models for English
You can play with a few sentences. The model is trained on CONLL 2003 data. For details, see [Documentation](https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_2_TAGGING.md#list-of-pre-trained-sequence-tagger-models)

In [None]:
from flair.data import Sentence
from flair.models import SequenceTagger

# make a sentence
sentence = Sentence('I love Utrecht.', use_tokenizer=True)

# load the NER tagger
tagger = SequenceTagger.load('ner')

In [None]:
# run NER over sentence
tagger.predict(sentence)

# iterate over entities and print
for entity in sentence.get_spans('ner'):
    print(entity)

Show detailed information

In [None]:
print(sentence.to_dict(tag_type='ner'))

### Contextualized embeddings at work...
Playing with ambiguous words

In [None]:
sentence = Sentence('Washington went to Washington .')
tagger.predict(sentence)
print(sentence.to_dict(tag_type='ner'))

## Applying a purely character-based NER model trained on French QUAERO corpus
We trained a purely character-based NER model using the QUAERO corpus.
The underlying character language model was trained on Swiss newspaper texts from the 19th century and on French Wikipedia.
Download the model (250MB) and a few scripts for running und testing it.

In [None]:
! git clone https://gitlab.ifi.uzh.ch/siclemat/dh2019-ner-tutorial-flair-quaero-material.git ~/flair-quaero

In [None]:
%cd ~/flair-quaero

Let's look at some real newspaper data from the 19th century

In [None]:
!head -n 20 data.d/test_short.txt

Tagging a verticalized file with reference annotations for evaluation. 
Output format:
 1. Token
 2. Gold NER IOB tag
 3. Computed NER IOB tag
 4. Probability/confidence of computed IOB tag

In [None]:
! python lib/flair_ner_tagger.py \
  --model resources.d/taggers/ner/pressfr-wikifr/raw-stringemb-crf/best-model.pt \
  data.d/test_short.txt

Save the relevant columns for evaluation in a file.

In [None]:
! python lib/flair_ner_tagger.py \
  --model resources.d/taggers/ner/pressfr-wikifr/raw-stringemb-crf/best-model.pt \
  data.d/test_short.txt |cut -d " " -f 1,2,3 > test_short_ner_tagged.txt

In [None]:
! head test_short_ner_tagged.txt

In [None]:
! perl lib/conlleval.pl < test_short_ner_tagged.txt

### Possible hands-on
Modify the input file `data.d/test_short.txt` via Jupyter text editor (e.g. add or remove OCR noise) and look at the effect


## Using the French Quaero model on historical Swiss newspaper texts
Let's test the model trained on French newspapers on some historical Swiss newspapers

In [None]:
%cd ~/flair-quaero
french_tagger = SequenceTagger.load('resources.d/taggers/ner/pressfr-wikifr/raw-stringemb-crf/best-model.pt')

In [None]:
! cat ~/datasets/impresso/raw/GDL-1848-07-11-a-i0001.txt

In [None]:
sentence = Sentence('CONFÉDÉRATION SUISSE. DIÈTE FÉDÉRALE .. Séance du 6 juillet.', use_tokenizer=True)

In [None]:
french_tagger.predict(sentence)

In [None]:
print(sentence.to_dict(tag_type='ner'))

## Next steps
Work through more of the tutorial at https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_1_BASICS.md