CSCI 493.71: Big Data
Project I: Model of HetioNet
Khan_Rafi: Khinshan Khan and Shakil Rafi
- Python 3+
- mongo 4+
- neo4j 3.5+
- Java 8
- pymongo 3.9.0
- py2neo 4.3.0
The following instructions were tested on Arch Linux:
- start the MongoDB service:
sudo systemctl enable mongodb.service
- start the neo4j service:
sudo systemctl start neo4j.service
- in the file
/etc/neo4j/neo4j.conf
make sure thatdbms.directories.import
is set to/var/lib/neo4j/import
. - the directory
/var/lib/neo4j/import/
should exist and your user should have read and write access to it:- by default, this directory will be owned by the
neo4j
user and theneo4j
group usermod -a -G neo4j $(whoami)
- by default, this directory will be owned by the
- the default neo4j username is
neo4j
and the password ispassword
- modify
utils/common.py
to match your neo4j username and password
- modify
From download:
cd Khan_Rafi
python app.py
From GitHub:
git clone https://github.com/kkhan01/hetio-net
cd hetio-net
python app.py
This project models Hetionet to answer the following queries:
-
Given a disease, what is its name, what are drug names that can treat or palliate this disease, what are gene names that cause this disease, and where this disease occurs? Obtain and output this information in a single query.
- This is done by creating a MongoDB database that stores every disease in a document with its relevant information
-
Supposed that a drug can treat a disease if the drug or its similar drugs up-regulate/down-regulate a gene, but the location down-regulates/up-regulates the gene in an opposite direction where the disease occurs. Find all drugs that can treat new diseases (i.e. the missing edges between drug and disease). Obtain and output the drug-disease pairs in a single query.
- This is done by creating a neo4j database representing a graph of Hetionet and then querying the relevant paths
- CrC = Compound Resembles Compound
- CtD = Compound Treats Disease
- CpD = Compound Palliates Diseases
- CuG = Compound Upregulates Genes
- CbG = Compound Binds Genes
- CdG = Compound Downregulates Gene
- DrD = Disease Resembles Disease
- DlA = Disease Localizes Anatomy
- DuG = Disease Upregulates Gene
- DaG = Disease Associates Genes
- DdG = Disease Downregulates Genes
- AuG = Anatomy Upregulates Genes
- AeG = Anatomy Expresses Gene
- AdG = Anatomy Downregulates Genes
- Gr>G = Gene Regulates Gene
- GcG = Gene Covaries Gene
- GiG = Gene Interacts Gene