Nested NER problem on paperswithcode.
General datasets are GENIA and ACE2005
Dataset has 29 categories:
NUMBER, WORK_OF_ART, PROFESSION, LANGUAGE, PRODUCT, DISEASE, MONEY, NATIONALITY, ORGANIZATION, DATE, AWARD, DISTRICT, FACILITY, AGE, LOCATION, PERSON, STATE_OR_PROVINCE, EVENT, COUNTRY, LAW, PENALTY, FAMILY, TIME, PERCENT, CRIME, IDEOLOGY, ORDINAL, CITY, RELIGION
Category stats per train/dev/test: TBA
To convert dataset into different formats look at notebooks/dataset_conversion.ipynb
Preconverted datasets (from 10.04.2021):
- jsonlines for Biaffine NER - link
- json for Pyramid NER - link
- conll-like for seq2seq NER - link
- json for mrc-for-flat-nested-ner - link
Main metric is F1-score. The named entity is considered correct when both boundary and category are predicted correctly.
link to article
link to implementation (tensorflow>=2.2)
link to original implementation (tensorflow<2.0 and python 2)
- Look through experiment configurations at experiments.conf
- And run:
cd repos/biaffine-ner
python train.py experiment5
- char-cnn + fasttext embeddings (cc300ru) - link
- char-cnn + fasttext (cc300ru) + ruBERT embeddings - link
cd repos/biaffine-ner
python evaluate.py experiment5
python inference.py experiment5 data/10.04.2021/train.jsonlines logs/experiment5/exp5-train-inference.jsonlines
link to original implementation (pytorch)
train-dummy.sh - training with fasttext embeddings only train.sh - training with ruBERT and fasttext embeddings if u want to use bert, then precompute bert embeddings with
- char-cnn + fasttext embeddings (cc300ru) - link
- char-cnn + fasttext (cc300ru) + ruBERT embeddings - link
how to infer
aka Neural Architectures for Nested NER through Linearization
link to article
link to original implementation (tensorflow<2.0)
train-dummy.sh - training with fasttext embeddings only
- char-cnn + fasttext embeddings (cc300ru) - link
- char-cnn + fasttext (cc300ru) + ruBERT embeddings - TBA
how to infer
Add files with results per category link to files with per cat results
From 10.04.2021 on test set
MODEL | PREC | REC | F1 | SCORES | CHECKPOINT |
---|---|---|---|---|---|
biaffine-ner + fasttext | 78.8 | 71.8 | 75.13 | link | link |
biaffine-ner + fasttext + ruBERT | 81.92 | 71.54 | 76.38 | link | link |
pyramid-ner + fasttext | 72.70 | 63.01 | 67.51 | link | link |
pyramid-ner + fasttext + ruBERT | 77.73 | 70.97 | 74.19 | link | link |
seq2seq-ner + fasttext | 74.01 | 71.51 | 72.74 | TBA | link |
seq2seq-ner + fasttext + ruBERT | TBA | TBA | TBA | TBA | TBA |