This repository contains the source code of DEXTER, a system to automatically extract Gene-Disease Associations from biomedical abstracts. The work was originally presented in the paper by Gupta et al. 'DEXTER: Disease-Expression Relation Extraction from Text'. This repository contains a reproduced version of the system described in the above-mentioned paper.
If you make use of this code in your work, please kindly cite the following paper:
@inproceedings{menotti2023dexter,
author = {Menotti, Laura},
booktitle = {Proc. of the 2nd Italian Conference on Big Data and Data Science (ITADATA 2023)},
series = {CEUR-WS Proceedings},
volume = {3606},
publisher = {CEUR},
title = {Reproducibility and Generalization of a Relation Extraction System for Gene-Disease Associations},
year = {2023},
URL = {https://ceur-ws.org/Vol-3606/invited78.pdf}
}
Contents
DEXTER is based on the SpaCy library therefore make sure to install the spacy library and the "en-sci-sm" model from SciSpaCy before running the code. A full list of library requirements is available in the file requirements.txt.
To install the requirements run:
pip install -r requirements.txt
DEXTER takes as input a csv file with the following column names:
- PMID: PubMed ID of the abstract.
- Sentence: sentence to be parsed.
The output is a csv files with the following columns:
- PMID: PubMed ID of the abstract where the sentence is contained.
- geneMen: gene mentioned in the sentence.
- geneID: identifier of the gene mentioned in the sentence.
- DOID: DOID of the associated disease.
- DOID_Name: name of the associated disease.
- DiseaseMention: associated disease mention in the sentence
- DiseaseDetectedFrom: location of the disease (i.e, Sentence_ARG, Title, Sentence)
- ExpressionLevel: gene expression level (UP/DOWN)
- SentenceType: sentence type (TypeA/TypeB)
- Sample1: Compared Entity 1 extracted by the RE module
- Sample2: Compared Entity 2 extracted by the RE module (NA for TypeB sentences)
- Sentence: sentence text.
To execute the code run:
cd py
python dexter_pipeline.py [path_to_input_file] [path_to_output_file]
If you wish to run the code on the original data, unzip the data folder and run:
cd py
python dexter_pipeline.py ../data/input/DEXTER_DATA.csv ../data/output/[filename].csv