This repository contains evaluation scripts, docker images and links to the corpora used for the paper TaxoNERD: deep neural models for the recognition of taxonomic entities in the ecological and evolutionary literature (link).
In this paper, we propose a new tool for taxonomic NER based on deep neural networks, called TaxoNERD and perform a comparative evaluation with existing taxon name recognition systems.
The corpora can be publicly accessed at the following links:
Corpora | Text Genre | Standard | Entities | Publication |
---|---|---|---|---|
Linnaeus | Scientific Article | Gold | species | link |
S800 | Scientific Article | Gold | species | link |
COPIOUS | Scientific Article | Gold | taxon, geographical location, habitat, temporal expression, and person | link |
Bacteria Biotope | Scientific Article | Gold | microorganism, habitat, geographical location, phenotype | link |
Corpora pre-processing operations were collected in a single jupyter notebook for ease-of-use.
- LINNAEUS: we used the train, test and validation sets of Giorgi and Bader, 2018.
- S800: we used this script to generate the subsets.
- COPIOUS: the COPIOUS corpus is already splitted into train, test and validation sets.
- BB task: we used the validation set for testing, and randomly split the train set into train/validation subsets with a 85:15 ratio.
To facilitate the install of existing taxonomic NER tools written in different languages, we provide a Dockerfile for each tool. This means you will need Docker to run the evaluation scripts. Code for building Docker images is provided as part of the evaluation scripts, so you do not have to build the images yourself.
All scripts used for evaluation are provided as jupyter notebooks, one per evaluated method.