Prodigy is a new tool for radically efficient machine teaching. It addresses the big remaining problem: annotation and training.
Prodigy is not free, but you can submit a request for a research license here.
Greek language and Prodigy
Unfortunately, for Greek language there were available datasets only for POS and DEP tagger but not for NER. So, we had to create the data ourselves.
Prodigy helped a lot in this direction. The final data for NER can be found here.
Get info about your dataset(s)
python3 -m prodigy stats ner_train -l
Drop a dataset
python3 -m prodigy drop ner_dev
python3 -m prodigy dataset ner python3 -m prodigy db-in ner ner.jsonl python3 -m prodigy ner.batch-train ner el_core_web_sm --output models/ner/ --label "ORG, PRODUCT, LOC, GPE, EVENT, PERSON" --no-missing --dropout 0.2 --n-iter 15
Retrain NER from scratch
Firstly, you will need to annotate manually the dataset.
python3 -m prodigy ner.manual ner_train el_core_web_sm path_to_data --label "ORG, PRODUCT, PERSON, LOC, GPE, EVENT"
After a significant amount of annotations, you can start using model predictions to accelerate the annotation procedure.
python3 -m prodigy ner.make-gold ner_train el_core_web_sm path_to_data
When the performance of your model is good enough, you can use another recipe, ner.teach, to accelerate even more the annotation procedure:
python3 -m prodigy ner.teach ner_train el_core_web_sm path_to_data --label "ORG, PRODUCT, PERSON, LOC, GPE, EVENT"
python3 -m prodigy ner.batch-train ner_train el_core_web_sm --output models/small_with_entities --n-iter 20 --eval-split 0.2 --dropout 0.2
Note: If you haven't used ner.make-gold, you can use --no-missing optional argument for better performance.