This project creates and stores data sets for Finnish Named Entity Recoginition.
Conversion from html-format to plain text relies on the python library html2text
Tokenization relies on hfst3
+++ Labeled data
The labeled data can be found in DATASET/XYZ where DATASET is one of the available datasets (currently only digitoday is available).