List of nested NER benchmarks

Nested NER problem on paperswithcode.
General datasets are GENIA and ACE2005

Dataset

Dataset has 29 categories:
NUMBER, WORK_OF_ART, PROFESSION, LANGUAGE, PRODUCT, DISEASE, MONEY, NATIONALITY, ORGANIZATION, DATE, AWARD, DISTRICT, FACILITY, AGE, LOCATION, PERSON, STATE_OR_PROVINCE, EVENT, COUNTRY, LAW, PENALTY, FAMILY, TIME, PERCENT, CRIME, IDEOLOGY, ORDINAL, CITY, RELIGION

Category stats per train/dev/test: TBA

To convert dataset into different formats look at notebooks/dataset_conversion.ipynb
Preconverted datasets (from 10.04.2021):

jsonlines for Biaffine NER - link
json for Pyramid NER - link
conll-like for seq2seq NER - link
json for mrc-for-flat-nested-ner - link

Language models

FastText
cc300ru
BERT
ruBERT by DeepPavlov
ruBERT cased by DeepPavlov + huggingface

Methods

Main metric is F1-score. The named entity is considered correct when both boundary and category are predicted correctly.

Biaffine NER

link to article
link to implementation (tensorflow>=2.2)
link to original implementation (tensorflow<2.0 and python 2)

Training

Look through experiment configurations at experiments.conf
And run:

cd repos/biaffine-ner
python train.py experiment5

Pretrained models

char-cnn + fasttext embeddings (cc300ru) - link
char-cnn + fasttext (cc300ru) + ruBERT embeddings - link

Inference and evaluation

cd repos/biaffine-ner
python evaluate.py experiment5
python inference.py experiment5 data/10.04.2021/train.jsonlines logs/experiment5/exp5-train-inference.jsonlines

Pyramid NER

link to article

link to original implementation (pytorch)

Training

train-dummy.sh - training with fasttext embeddings only train.sh - training with ruBERT and fasttext embeddings if u want to use bert, then precompute bert embeddings with

Pretrained models

char-cnn + fasttext embeddings (cc300ru) - link
char-cnn + fasttext (cc300ru) + ruBERT embeddings - link

Inference and evaluation

how to infer

Seq2Seq

aka Neural Architectures for Nested NER through Linearization link to article
link to original implementation (tensorflow<2.0)

Training

train-dummy.sh - training with fasttext embeddings only

Pretrained models

char-cnn + fasttext embeddings (cc300ru) - link
char-cnn + fasttext (cc300ru) + ruBERT embeddings - TBA

Inference and evaluation

how to infer

Results

Add files with results per category link to files with per cat results

From 10.04.2021 on test set

MODEL	PREC	REC	F1	SCORES	CHECKPOINT
biaffine-ner + fasttext	78.8	71.8	75.13	link	link
biaffine-ner + fasttext + ruBERT	81.92	71.54	76.38	link	link
pyramid-ner + fasttext	72.70	63.01	67.51	link	link
pyramid-ner + fasttext + ruBERT	77.73	70.97	74.19	link	link
seq2seq-ner + fasttext	74.01	71.51	72.74	TBA	link
seq2seq-ner + fasttext + ruBERT	TBA	TBA	TBA	TBA	TBA

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
repos		repos
resources		resources
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

List of nested NER benchmarks

Dataset

Language models

Methods

Biaffine NER

Training

Pretrained models

Inference and evaluation

Pyramid NER

Training

Pretrained models

Inference and evaluation

Seq2Seq

Training

Pretrained models

Inference and evaluation

Results

About

Releases

Packages

Languages

nerel-ds/nested-ner-benchmarks

Folders and files

Latest commit

History

Repository files navigation

List of nested NER benchmarks

Dataset

Language models

Methods

Biaffine NER

Training

Pretrained models

Inference and evaluation

Pyramid NER

Training

Pretrained models

Inference and evaluation

Seq2Seq

Training

Pretrained models

Inference and evaluation

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages