Skip to content

nerel-ds/nested-ner-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

List of nested NER benchmarks

Nested NER problem on paperswithcode.
General datasets are GENIA and ACE2005

Dataset

Dataset has 29 categories:
NUMBER, WORK_OF_ART, PROFESSION, LANGUAGE, PRODUCT, DISEASE, MONEY, NATIONALITY, ORGANIZATION, DATE, AWARD, DISTRICT, FACILITY, AGE, LOCATION, PERSON, STATE_OR_PROVINCE, EVENT, COUNTRY, LAW, PENALTY, FAMILY, TIME, PERCENT, CRIME, IDEOLOGY, ORDINAL, CITY, RELIGION

Category stats per train/dev/test: TBA

To convert dataset into different formats look at notebooks/dataset_conversion.ipynb
Preconverted datasets (from 10.04.2021):

  • jsonlines for Biaffine NER - link
  • json for Pyramid NER - link
  • conll-like for seq2seq NER - link
  • json for mrc-for-flat-nested-ner - link

Language models

Methods

Main metric is F1-score. The named entity is considered correct when both boundary and category are predicted correctly.

Biaffine NER

link to article
link to implementation (tensorflow>=2.2)
link to original implementation (tensorflow<2.0 and python 2)

Training

  1. Look through experiment configurations at experiments.conf
  2. And run:
cd repos/biaffine-ner
python train.py experiment5

Pretrained models

  • char-cnn + fasttext embeddings (cc300ru) - link
  • char-cnn + fasttext (cc300ru) + ruBERT embeddings - link

Inference and evaluation

cd repos/biaffine-ner
python evaluate.py experiment5
python inference.py experiment5 data/10.04.2021/train.jsonlines logs/experiment5/exp5-train-inference.jsonlines

Pyramid NER

link to article

link to original implementation (pytorch)

Training

train-dummy.sh - training with fasttext embeddings only train.sh - training with ruBERT and fasttext embeddings if u want to use bert, then precompute bert embeddings with

Pretrained models

  • char-cnn + fasttext embeddings (cc300ru) - link
  • char-cnn + fasttext (cc300ru) + ruBERT embeddings - link

Inference and evaluation

how to infer

Seq2Seq

aka Neural Architectures for Nested NER through Linearization link to article
link to original implementation (tensorflow<2.0)

Training

train-dummy.sh - training with fasttext embeddings only

Pretrained models

  • char-cnn + fasttext embeddings (cc300ru) - link
  • char-cnn + fasttext (cc300ru) + ruBERT embeddings - TBA

Inference and evaluation

how to infer

Results

Add files with results per category link to files with per cat results

From 10.04.2021 on test set

MODEL PREC REC F1 SCORES CHECKPOINT
biaffine-ner + fasttext 78.8 71.8 75.13 link link
biaffine-ner + fasttext + ruBERT 81.92 71.54 76.38 link link
pyramid-ner + fasttext 72.70 63.01 67.51 link link
pyramid-ner + fasttext + ruBERT 77.73 70.97 74.19 link link
seq2seq-ner + fasttext 74.01 71.51 72.74 TBA link
seq2seq-ner + fasttext + ruBERT TBA TBA TBA TBA TBA

About

List of nested NER benchmarks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages