2017BioNLPEvaluation

This repository contains code and data for the following article:

Wojciech Kusa, Michael Spranger. External Evaluation of Event Extraction Classifiers for Automatic Pathway Curation: An extended study of the mTOR pathway. In Proceedings of the 2017 Workshop on Biomedical Natural Language Processing (BioNLP 2017), pages 247–256. Association for Computational Linguistics, 2017.

1. Installation

This software was tested on Ubuntu 16.04

1.1 Python

This project requires Python 2.7.

$ conda create -n bioNLP2017 python=2.7
$ conda activate bioNLP2017
(bioNLP2017) $ pip install -r requiremnts.txt

1.2 TEES

This project uses Turku Event Extraction System (TEES) in version 2.2.1. To properly install all dependencies (classifiers, models, corpora and preprocessing tools) you need to run

(bioNLP2017) $ python tees/configure.py

TEES installs the following dependencies:

GENIA Sentence Splitter
BANNER named entity recognizer
BLLIP parser
Stanford Parser

Stanford parser and BLLIP requires java, g++, flex and ruby. Use the following command to install them if they are missing on your system:

(bioNLP2017) $ sudo apt-get install g++ ruby flex default-jre

After succesfull installation of TEES you need to export the path

(bioNLP2017) $ export TEES_SETTINGS=/home/${USER}/.tees_local_settings.py

2. Data

2.1 Training data

Training datasets were created using three different sources:

ANN - consists of 60 abstracts of scientific papers from Pubmed database related to the mTORpathway map.
GE11 consists of 908 abstracts and full texts of scientific papers used in BioNLP ST 2011 GENIA Event Extraction task
PC13 consists of 260 abstracts of scientific papers used in BioNLP ST 2013 Pathway Curation task

All train datasets are stored in data/ directory

data/GE11-train.tar.gz - standalone GE11
data/GE11_mTOR-ann-train.tar.gz - GE11+ANN - combined GE11 and ANN
data/GE11_PC13_mTOR-ann-train.tar.gz - GE11+PC13+ANN - combined GE11, PC13 and ANN
data/PC13_mTOR-ann-train.tar.gz - PC13+ANN - combined PC13 and ANN

For hyperparameter optimization of all classifiers "GE11-Devel BioNLP ST2011" dataset was used.

2.2 Test data

Test data consists of 449 full text papers mentioned in the mTOR pathway map. Original paper pdfs were downloaded and translated into raw txt files using CERMINE.

Test data in a format of preprocessed txt files can be downloaded from here. Documents should be extracted to: data/evaluation_mTOR_full/ directory.

3. Training and testing event extraction models

First you need to preprocess training datasets:

(bioNLP2017) $ python preprocess.py

To train event extraction model run:

(bioNLP2017) $ python train.py \
--output_path results/GE11-SVM/ \
--classifier svm \
--train_data data/GE11-train/GE11-train.xml

To run predictions on mTOR papers with the model trained from the previous step:

(bioNLP2017) $ python predict.py \
--model_path results/GE11-SVM/ \
--input_data data/evaluation_mTOR_full/ \
--output_path results/GE11-SVM/evaluation_mTOR_full/

If you don't want to train your models you can use models included in TEES, e.g.:

(bioNLP2017) $ python predict.py \
--model_path /home/${USER}/.tees/models/GE11-devel \
--input_data data/evaluation_mTOR_full/ \
--output_path results/GE11/SVM/evaluation_mTOR_full/

3.1 Event extraction results

Results from all 17 pretrained models from the paper can be downloaded from here.

4. Evaluation

Evaluation was done with scripts from sbnlp/mTOR-evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
tees		tees
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
predict.py		predict.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

tees

tees

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

init.py

init.py

predict.py

predict.py

preprocess.py

preprocess.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

2017BioNLPEvaluation

1. Installation

1.1 Python

1.2 TEES

2. Data

2.1 Training data

2.2 Test data

3. Training and testing event extraction models

3.1 Event extraction results

4. Evaluation

About

Releases

Packages

Contributors 2

Languages

License

sbnlp/2017BioNLPEvaluation

Folders and files

Latest commit

History

Repository files navigation

2017BioNLPEvaluation

1. Installation

1.1 Python

1.2 TEES

2. Data

2.1 Training data

2.2 Test data

3. Training and testing event extraction models

3.1 Event extraction results

4. Evaluation

About

Resources

License

Stars

Watchers

Forks

Languages