Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

This project provides a convolutional neural network model for relation extraction.

See before filing an issue or creating a pull request.

Getting Started

These instructions will get you a copy of the project up and run on your local machine for development and testing purposes.


  1. copy the project on your local machine
git clone
  1. install the required packages
pip install -r requirements.txt
  1. install stanford_corenlp_pywrapper

  2. Follow the instruction to install geniatagger to GENIATAGGER_PATH

  3. Download Stanford CoreNLP and unpack the compressed file to CORENLP_PATH.

  4. Download the word2vec model to WORD2VEC_PATH

Prepare the dataset

The program needs three separated datasets in JSON format: training, development, and test. Each file contains sentences with annotations and relations. deeprel_schema.json describes the data format. The folder examples contains some examples.

To validate the dataset format, run

jsonschema -i examples/aimed-dev.json deeprel_schema.json

Prepare the configuration file

The program needs INI_FILE to configure the locations of Genia tagger, Stanford CoreNLP, etc. An example of INI_FILE can be found in examples. It is a good practice to place the INI_FILE in the same folder of model_dir, but it is not required.

Preparse the datasets

In most cases, run the following program will parse the datasets and create input matrices for training and testing.

python -pfvmsd INI_FILE

The program will generate intermediate files in model_dir specified in the INI_FILE.

  • all - store parsed documents in JSON
  • DATASET.npz - input matrix of sentences
  • DATASET-sp.npz - input matrix of shortest paths between two annotations
  • DATASET-doc.npz - input matrix of doc2vec
  • vocabs.json - vocabulary
  • word2vec.npz - maps from words to vectors
  • pos.npz - maps from part-of-speeches to vectors
  • chunk.npz - maps from chunks to vectors
  • arg1_dis.npz - maps from the distances between argument1 and current word to vectors
  • arg2_dis.npz - maps from the distances between argument2 and current word to vectors
  • dependency.npz - maps from dependencies to vectors
  • type.npz - maps from named entities to vectors

You can also run the program step by step, so you can modify and check different parts of the inputs. For example, to check how different parsers will affect the performance, you can replace the parse tree in each JSON file in all and run -fvmstd to regenerate the matrices.

python deeprel/ -h

Usage: [options] INI_FILE
    --log <str>  Log option. One of DEBUG, INFO, WARNING, ERROR, and CRITICAL. [default: INFO]
    -p           preparse [default: False]
    -f           create features [default: False]
    -v           create vocabularies [default: False]
    -m           create matrix [default: False]
    -s           create shortest path matrix [default: False]
    -d           create doc2vec [default: False]
    -t           test matrix format [default: False]
    -k           skip pre-parsed documents [default: False]

Train the model

python deeprel/ INI_FILE

The program will train a CNN model using the training and development sets. The model will be stored at model_dir specified in the INI_FILE.

Test the model

python deeprel/ model_dir

This will print a report of results using the model and test set.



Please read CONTRIBUTING for details on our code of conduct, and the process for submitting pull requests to us.


see LICENSE.txt.


This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine. We are also grateful to Robert Leaman for the helpful discussion.


  • Peng Y, Rios A, Kavuluru R, Lu Z. Extracting chemical-protein relations with ensembles of SVM and deep learning models. Database. 2018, 1-9. bay073.
  • Peng Y, Lu Z. Deep learning for extracting protein-protein interactions from biomedical literature. In Proceedings of BioNLP workshop. 2017.


A convolutional neural network model for relation extraction.





No releases published


No packages published