DEXTER

This repository contains the source code of DEXTER, a system to automatically extract Gene-Disease Associations from biomedical abstracts. The work was originally presented in the paper by Gupta et al. 'DEXTER: Disease-Expression Relation Extraction from Text'. This repository contains a reproduced version of the system described in the above-mentioned paper.

If you make use of this code in your work, please kindly cite the following paper:

@inproceedings{menotti2023dexter,
	author = {Menotti, Laura},
	booktitle = {Proc. of the 2nd Italian Conference on Big Data and Data Science (ITADATA 2023)},
	series = {CEUR-WS Proceedings},
        volume = {3606},
	publisher = {CEUR},
	title = {Reproducibility and Generalization of a Relation Extraction System for Gene-Disease Associations},
	year = {2023},
        URL = {https://ceur-ws.org/Vol-3606/invited78.pdf}
}

Contents

System Requirements
Data format
Running DEXTER

System Requirements

DEXTER is based on the SpaCy library therefore make sure to install the spacy library and the "en-sci-sm" model from SciSpaCy before running the code. A full list of library requirements is available in the file requirements.txt.

To install the requirements run:

pip install -r requirements.txt

Data Format

DEXTER takes as input a csv file with the following column names:

PMID: PubMed ID of the abstract.
Sentence: sentence to be parsed.

The output is a csv files with the following columns:

PMID: PubMed ID of the abstract where the sentence is contained.
geneMen: gene mentioned in the sentence.
geneID: identifier of the gene mentioned in the sentence.
DOID: DOID of the associated disease.
DOID_Name: name of the associated disease.
DiseaseMention: associated disease mention in the sentence
DiseaseDetectedFrom: location of the disease (i.e, Sentence_ARG, Title, Sentence)
ExpressionLevel: gene expression level (UP/DOWN)
SentenceType: sentence type (TypeA/TypeB)
Sample1: Compared Entity 1 extracted by the RE module
Sample2: Compared Entity 2 extracted by the RE module (NA for TypeB sentences)
Sentence: sentence text.

Running DEXTER

To execute the code run:

cd py
python dexter_pipeline.py [path_to_input_file] [path_to_output_file]

If you wish to run the code on the original data, unzip the data folder and run:

cd py
python dexter_pipeline.py ../data/input/DEXTER_DATA.csv ../data/output/[filename].csv

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
py		py
.gitignore		.gitignore
README.md		README.md
data.zip		data.zip
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

py

py

.gitignore

.gitignore

README.md

README.md

data.zip

data.zip

requirements.txt

requirements.txt

Repository files navigation

DEXTER

System Requirements

Data Format

Running DEXTER

About

Releases

Packages

Languages

mntlra/DEXTER

Folders and files

Latest commit

History

Repository files navigation

DEXTER

System Requirements

Data Format

Running DEXTER

About

Resources

Stars

Watchers

Forks

Languages