## Working with DRKG in Deep Graph Library (DGL)
This notebook provides an example of building a heterograph from DRKG in DGL; and some examples of queries on the DGL heterograph. For more information about using DGL please refer to https://www.dgl.ai/ "

In [1]:
import pandas as pd
import numpy as np
import dgl
import sys
sys.path.insert(1, '../utils')
from utils import download_and_extract
download_and_extract()
drkg_file = '../data/drkg/drkg.tsv'
df = pd.read_csv(drkg_file, sep ="\t", header=None)
triplets = df.values.tolist()

Using backend: pytorch


Assign an ID to each node (entity): create a dictionary of node-types: each dictionary further consists of a dictionary mapping node to an ID.

In [2]:
entity_dictionary = {}
def insert_entry(entry, ent_type, dic):
    if ent_type not in dic:
        dic[ent_type] = {}
    ent_n_id = len(dic[ent_type])
    if entry not in dic[ent_type]:
         dic[ent_type][entry] = ent_n_id
    return dic

for triple in triplets:
    src = triple[0]
    split_src = src.split('::')
    src_type = split_src[0]
    dest = triple[2]
    split_dest = dest.split('::')
    dest_type = split_dest[0]
    insert_entry(src,src_type,entity_dictionary)
    insert_entry(dest,dest_type,entity_dictionary)

Create a dictionary of relations: the key is the relation and the value is the list of (source node ID, destimation node ID) tuples.

In [3]:
edge_dictionary={}
for triple in triplets:
    src = triple[0]
    split_src = src.split('::')
    src_type = split_src[0]
    dest = triple[2]
    split_dest = dest.split('::')
    dest_type = split_dest[0]
    
    src_int_id = entity_dictionary[src_type][src]
    dest_int_id = entity_dictionary[dest_type][dest]
    
    pair = (src_int_id,dest_int_id)
    etype = (src_type,triple[1],dest_type)
    if etype in edge_dictionary:
        edge_dictionary[etype] += [pair]
    else:
        edge_dictionary[etype] = [pair]

## Create a DGL heterograph using the dictionary of relations

In [4]:
graph = dgl.heterograph(edge_dictionary);

## Print the statistics of the created graph

Number of nodes for each node-type

In [5]:
total_nodes = 0;
for ntype in graph.ntypes:
    print(ntype, '\t', graph.number_of_nodes(ntype));
    total_nodes += graph.number_of_nodes(ntype);
print("Graph contains {} nodes from {} node-types.".format(total_nodes, len(graph.ntypes)))

Anatomy 	 400
Atc 	 4048
Biological Process 	 11381
Cellular Component 	 1391
Compound 	 24313
Disease 	 5103
Gene 	 39220
Molecular Function 	 2884
Pathway 	 1822
Pharmacologic Class 	 345
Side Effect 	 5701
Symptom 	 415
Tax 	 215
Graph contains 97238 nodes from 13 node-types.


Number of edges for each relation (edge-type)

In [6]:
total_edges = 0;
for etype in graph.etypes:
    print(etype, '\t', graph.number_of_edges(etype))
    total_edges += graph.number_of_edges(etype);
print("Graph contains {} edges from {} edge-types.".format(total_edges, len(graph.etypes)))

bioarx::HumGenHumGen:Gene:Gene 	 58094
bioarx::VirGenHumGen:Gene:Gene 	 535
bioarx::DrugVirGen:Compound:Gene 	 1165
bioarx::DrugHumGen:Compound:Gene 	 24501
bioarx::Covid2_acc_host_gene::Disease:Gene 	 332
bioarx::Coronavirus_ass_host_gene::Disease:Gene 	 129
DGIDB::INHIBITOR::Gene:Compound 	 5971
DGIDB::ANTAGONIST::Gene:Compound 	 3006
DGIDB::OTHER::Gene:Compound 	 11070
DGIDB::AGONIST::Gene:Compound 	 3012
DGIDB::BINDER::Gene:Compound 	 143
DGIDB::MODULATOR::Gene:Compound 	 243
DGIDB::BLOCKER::Gene:Compound 	 979
DGIDB::CHANNEL BLOCKER::Gene:Compound 	 352
DGIDB::ANTIBODY::Gene:Compound 	 188
DGIDB::POSITIVE ALLOSTERIC MODULATOR::Gene:Compound 	 618
DGIDB::ALLOSTERIC MODULATOR::Gene:Compound 	 317
DGIDB::ACTIVATOR::Gene:Compound 	 316
DGIDB::PARTIAL AGONIST::Gene:Compound 	 75
DRUGBANK::x-atc::Compound:Atc 	 15750
DRUGBANK::ddi-interactor-in::Compound:Compound 	 1379271
DRUGBANK::target::Compound:Gene 	 19158
DRUGBANK::enzyme::Compound:Gene 	 4923
DRUGBANK::carrier::Compound:Gene 	 7

Just printing the graph ("print(graph)") will also print the graph summary

In [7]:
print(graph)

Graph(num_nodes={'Anatomy': 400, 'Atc': 4048, 'Biological Process': 11381, 'Cellular Component': 1391, 'Compound': 24313, 'Disease': 5103, 'Gene': 39220, 'Molecular Function': 2884, 'Pathway': 1822, 'Pharmacologic Class': 345, 'Side Effect': 5701, 'Symptom': 415, 'Tax': 215},
      num_edges={('Gene', 'bioarx::HumGenHumGen:Gene:Gene', 'Gene'): 58094, ('Gene', 'bioarx::VirGenHumGen:Gene:Gene', 'Gene'): 535, ('Compound', 'bioarx::DrugVirGen:Compound:Gene', 'Gene'): 1165, ('Compound', 'bioarx::DrugHumGen:Compound:Gene', 'Gene'): 24501, ('Disease', 'bioarx::Covid2_acc_host_gene::Disease:Gene', 'Gene'): 332, ('Disease', 'bioarx::Coronavirus_ass_host_gene::Disease:Gene', 'Gene'): 129, ('Gene', 'DGIDB::INHIBITOR::Gene:Compound', 'Compound'): 5971, ('Gene', 'DGIDB::ANTAGONIST::Gene:Compound', 'Compound'): 3006, ('Gene', 'DGIDB::OTHER::Gene:Compound', 'Compound'): 11070, ('Gene', 'DGIDB::AGONIST::Gene:Compound', 'Compound'): 3012, ('Gene', 'DGIDB::BINDER::Gene:Compound', 'Compound'): 143, ('Gen