# This is a Jupyter notebook for testing how well the [NER model](https://huggingface.co/dbmdz/flair-historic-ner-onb) for historic German performs on our Travelogues corpus.
## Please note the comments that are given in each cell.


At first, please try downloading the flair and other packages that are needed for using this model.

In [4]:
from flair.data import Sentence
from flair.models import SequenceTagger
import spacy
from spacy.tokens import Doc

from correct_ocr import single_characters, delete_specials, correct_s

The following cell is only needed if problems with spacy.load occur.

In [None]:
import spacy.cli
spacy.cli.download("de_core_news_md")

This part will try out the NER on sentences from the Travelogues texts.

In [5]:
# Downloading language model for the spacy pipeline
nlp = spacy.load("de_core_news_md")

# Read in files – can be noisy OCR
file: str = open('../data/test/bossmann_gvinea_1708.txt', 'r').read()[100:2000]

# Corrections as implemented by @Lisa Braune
file = correct_s(file)
file = single_characters(file)
file = delete_specials(file)


# Throw document into spacy pipeline, sentencise file
doc: Doc = nlp(file)
sents: list = [sent.text for sent in doc.sents]

tagger: SequenceTagger = SequenceTagger.load("dbmdz/flair-historic-ner-onb")

for sent in sents:
    # Transform each sentence in the list into type Sentence for function availability
    sentence = Sentence(sent)
    tagger.predict(sentence)
    print(sentence.to_tagged_string())




2022-12-01 08:37:25,705 loading file /Users/sarahreb/.flair/models/flair-historic-ner-onb/63111d37e8f19b08b01200ec38cd2b093d72026e56bbe99a7b25b6e3f8b7da8d.d53b1d9a206921442955a318ba5bbef2af5aabb93c4713d1ed3b8fe8c28cda3f
2022-12-01 08:37:29,722 SequenceTagger predicts: Dictionary with 16 tags: <unk>, O, S-PER, S-LOC, B-PER, E-PER, S-ORG, B-LOC, E-LOC, I-PER, B-ORG, E-ORG, I-LOC, I-ORG, <START>, <STOP>
Sentence: "0001 0002 0003 0004 0005 0006 Abbildung 0007"
Sentence: "Abbildung 0008 0009 Reyse nach GVINEA , oder ausfu hrliche Beschreibung dasiger GoldGruben ElephantenZa hn und SclavenHandels nebst derer Einwohner" → ["GVINEA"/LOC]
Sentence: "Sitten Religion"
Sentence: "Regiment"
Sentence: "Kriegen"
Sentence: "Heyrathen und Begra bnissen auch allen hieselbst befindlichen Thieren so bishero in Europa unbekandt gewesen ." → ["Europa"/LOC]
Sentence: "Jm Frantzo sischen herausgegeben durch Wilhelm Boßmann gewesenen Rahtsherrn OberKauffmann und Landes" → ["Wilhelm Boßmann"/PER]
Sentence: "Unt

KeyboardInterrupt: 

In [None]:
sentence = Sentence(file)
tagger.predict(sentence)
print(sentence.to_dict())
