Skip to content

nleguillarme/snr_tools_and_methods

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluation of methods and tools for taxonomic NER (species name recognition)

This repository contains evaluation scripts, docker images and links to the corpora used for the paper TaxoNERD: deep neural models for the recognition of taxonomic entities in the ecological and evolutionary literature (link).

In this paper, we propose a new tool for taxonomic NER based on deep neural networks, called TaxoNERD and perform a comparative evaluation with existing taxon name recognition systems.

Corpora

The corpora can be publicly accessed at the following links:

Corpora Text Genre Standard Entities Publication
Linnaeus Scientific Article Gold species link
S800 Scientific Article Gold species link
COPIOUS Scientific Article Gold taxon, geographical location, habitat, temporal expression, and person link
Bacteria Biotope Scientific Article Gold microorganism, habitat, geographical location, phenotype link

Preprocessing

Corpora pre-processing operations were collected in a single jupyter notebook for ease-of-use.

Train/test/dev split

  • LINNAEUS: we used the train, test and validation sets of Giorgi and Bader, 2018.
  • S800: we used this script to generate the subsets.
  • COPIOUS: the COPIOUS corpus is already splitted into train, test and validation sets.
  • BB task: we used the validation set for testing, and randomly split the train set into train/validation subsets with a 85:15 ratio.

Images

To facilitate the install of existing taxonomic NER tools written in different languages, we provide a Dockerfile for each tool. This means you will need Docker to run the evaluation scripts. Code for building Docker images is provided as part of the evaluation scripts, so you do not have to build the images yourself.

Evaluation

All scripts used for evaluation are provided as jupyter notebooks, one per evaluated method.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages