# This is a Jupyter notebook for testing how well the [NER model](https://huggingface.co/dbmdz/flair-historic-ner-onb) for historic German performs on our Travelogues corpus.
## Please note the comments that are given in each cell.


At first, please try downloading the flair and other packages that are needed for using this model.

In [4]:
from flair.data import Sentence
from flair.models import SequenceTagger
import spacy
from spacy.tokens import Doc

from correct_ocr import single_characters, delete_specials, correct_s

The following cell is only needed if problems with spacy.load occur.

In [None]:
import spacy.cli
spacy.cli.download("de_core_news_md")

This part will try out the NER on sentences from the Travelogues texts.

In [None]:
# Downloading language model for the spacy pipeline
nlp = spacy.load("de_core_news_md")

# Read in files – can be noisy OCR
file: str = open('../data/test/bossmann_gvinea_1708.txt', 'r').read()[100:2000]

# Corrections as implemented by @Lisa Braune
file = correct_s(file)
file = single_characters(file)
file = delete_specials(file)


# Throw document into spacy pipeline, sentencise file
doc: Doc = nlp(file)
sents: list = [sent.text for sent in doc.sents]

tagger: SequenceTagger = SequenceTagger.load("dbmdz/flair-historic-ner-onb")

for sent in sents:
    # Transform each sentence in the list into type Sentence for function availability
    sentence = Sentence(sent)
    tagger.predict(sentence)
    print(sentence.to_tagged_string())


In [None]:
sentence = Sentence(file)
tagger.predict(sentence)
print(sentence.to_dict())
