### Tutorial: using finetuned est-roberta's NER model

This tutorial shows how to use finetuned est-roberta's NER model with EstNLTK. 

### Prerequisites

For running this code, you'll need:

* Python 3.7+
* [estnltk v1.6.9+](https://github.com/estnltk/estnltk)
* [pytorch v1.7.0+](https://pytorch.org/)
* [transformers v4.0.0+](https://huggingface.co/transformers)
* Python modules `data_preprocessing` and `bert_ner_tagger` from [this repository](https://github.com/soras/vk_ner_lrec_2022)


In [1]:
import os
from estnltk import Text

# Load preprocessing utils
from data_preprocessing import TokenizationPreprocessorFixed
preprocessor = TokenizationPreprocessorFixed()

There are 2 ways for initializing `BertNERTagger`:

In [2]:
# Option 1: load model from the local directory
bert_models_dir = 'bert_models'
bertner_model_location = os.path.join( bert_models_dir, 'model_est-roberta_10_bs16_lr5e-05_ep8' )
assert os.path.exists(bertner_model_location)

# Load BertNERTagger with the model
from bert_ner_tagger import BertNERTagger
bert_ner_tagger = BertNERTagger(bert_tokenizer_location=bertner_model_location, 
                                bert_ner_location=bertner_model_location, output_layer='bert_ner',
                                token_level=False, do_lower_case=False, use_fast=False)

In [3]:
# Option 2: download the model from the huggingface
from bert_ner_tagger import BertNERTagger
hf_model_id = 'tartuNLP/est-roberta-hist-ner'
bert_ner_tagger = BertNERTagger(bert_tokenizer_location=hf_model_id, 
                                bert_ner_location=hf_model_id, output_layer='bert_ner',
                                token_level=False, do_lower_case=False, use_fast=False)

And now we can use the tagger:

In [4]:
# Create analysable text
text = Text('Kaewas Aru Ropka rentnik G. Sarw, et tema wana teender Peter Mitt teda see sügis warastanu. Kaebusalune Peter Mitt ette kutsutu wastas, et tema Aru Ropka mõisas oma teenistuse ajal kiige wähemat warastanu ei olla.')

# Preprocess text and apply NER
preprocessor.preprocess(text)
bert_ner_tagger.tag(text)

# Browse resulting layer
text.bert_ner

layer name,attributes,parent,enveloping,ambiguous,span count
bert_ner,"bert_tokens, nertag",,,True,7

text,bert_tokens,nertag
Aru,['▁Aru'],LOC
Ropka,"['▁Ro', 'p', 'ka']",LOC_ORG
G. Sarw,"['▁G', '.', '▁Sar', 'w']",PER
Peter Mitt,"['▁Peter', '▁Mit', 't']",PER
Peter Mitt,"['▁Peter', '▁Mit', 't']",PER
Aru,['▁Aru'],LOC_ORG
Ropka mõisas,"['▁Ro', 'p', 'ka', '▁mõisas']",LOC_ORG


In [5]:
# Display entities in text
text.bert_ner.display()

In [6]:
# Exact locations of entities in text
text.bert_ner[['start', 'end', 'text', 'nertag']]

Unnamed: 0,start,end,text,nertag
0,7,10,Aru,LOC
1,11,16,Ropka,LOC_ORG
2,25,32,G. Sarw,PER
3,55,65,Peter Mitt,PER
4,104,114,Peter Mitt,PER
5,144,147,Aru,LOC_ORG
6,148,160,Ropka mõisas,LOC_ORG
