## Working with CTKG in Deep Graph Library (DGL)
This notebook provides an example of building a heterograph from CTKG in DGL; and some examples of queries on the DGL heterograph. For more information about using DGL please refer to https://www.dgl.ai/ "
This notebook builds on the notebook from DRKG: https://github.com/gnn4dr/DRKG

In [None]:
import pandas as pd
import numpy as np
import dgl
import sys
sys.path.insert(1, 'utils')
from utils import download_and_extract
#download_and_extract()
ctkg_file = 'ctkg.tsv'
df = pd.read_csv(ctkg_file, sep ="\t", header=None)
triplets = df.values.tolist()

Assign an ID to each node (entity): create a dictionary of node-types: each dictionary further consists of a dictionary mapping node to an ID.

In [None]:
entity_dictionary = {}
def insert_entry(entry, ent_type, dic):
    if ent_type not in dic:
        dic[ent_type] = {}
    ent_n_id = len(dic[ent_type])
    if entry not in dic[ent_type]:
         dic[ent_type][entry] = ent_n_id
    return dic

for triple in triplets:
    src = triple[0]
    split_src = src.split('::')
    src_type = split_src[0]
    dest = triple[2]
    split_dest = dest.split('::')
    dest_type = split_dest[0]
    insert_entry(src,src_type,entity_dictionary)
    insert_entry(dest,dest_type,entity_dictionary)

Create a dictionary of relations: the key is the relation and the value is the list of (source node ID, destimation node ID) tuples.

In [None]:
edge_dictionary={}
for triple in triplets:
    src = triple[0]
    split_src = src.split('::')
    src_type = split_src[0]
    dest = triple[2]
    split_dest = dest.split('::')
    dest_type = split_dest[0]
    
    src_int_id = entity_dictionary[src_type][src]
    dest_int_id = entity_dictionary[dest_type][dest]
    
    pair = (src_int_id,dest_int_id)
    etype = (src_type,triple[1],dest_type)
    if etype in edge_dictionary:
        edge_dictionary[etype] += [pair]
    else:
        edge_dictionary[etype] = [pair]

## Create a DGL heterograph using the dictionary of relations

In [None]:
graph = dgl.heterograph(edge_dictionary);

## Print the statistics of the created graph

Number of nodes for each node-type

In [None]:
total_nodes = 0;
for ntype in graph.ntypes:
    print(ntype, '\t', graph.number_of_nodes(ntype));
    total_nodes += graph.number_of_nodes(ntype);
print("Graph contains {} nodes from {} node-types.".format(total_nodes, len(graph.ntypes)))

Number of edges for each relation (edge-type)

In [None]:
total_edges = 0;
for etype in graph.etypes:
    print(etype, '\t', graph.number_of_edges(etype))
    total_edges += graph.number_of_edges(etype);
print("Graph contains {} edges from {} edge-types.".format(total_edges, len(graph.etypes)))

Just printing the graph ("print(graph)") will also print the graph summary

In [None]:
print(graph)