### Load Knowledge graph

In [None]:
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

In [None]:
fp = "/content/drive/MyDrive/Project/data/"

df = pd.read_csv(fp+"triples_full.csv")
df.head()

Unnamed: 0,head_entity,relation,tail_entity,head_label,tail_label
0,ARF5,interacts_with,ACAP2,gene,gene
1,ARF5,interacts_with,RAB1A,gene,gene
2,ARF5,interacts_with,COPE,gene,gene
3,ARF5,interacts_with,ACAP1,gene,gene
4,ARF5,interacts_with,COPZ1,gene,gene


### Create graph object
The following codes creates a graph from the triples df by adding nodes and connecting edges as depicted in the df. NetworkX lets us perform graph operations seamlessly


In [None]:
G = nx.MultiDiGraph()

for _, row in df.iterrows():
    head = row['head_entity']
    relation = row['relation']
    tail = row['tail_entity']
    head_label = row['head_label']
    tail_label = row['tail_label']
    
    # Add edge
    G.add_edge(head, tail, relation=relation)
    
    # Add label to head node if it doesn't already have one
    if 'label' not in G.nodes[head]:
        G.nodes[head]['label'] = head_label
    
    # Add label to tail node if it doesn't already have one
    if 'label' not in G.nodes[tail]:
        G.nodes[tail]['label'] = tail_label

- `predicted_drugs.txt` contains list of predicted drug from ML Model
- `DEGs_AD.txt` contains list of differentially expressed genes from Alzhemiers brain tissue vs control

In [None]:
with open(fp+'predicted_drugs.txt') as f:
    drugs = [line.strip() for line in f.readlines()]

with open(fp+'DEGs_AD.txt') as f:
    genes = [line.strip() for line in f.readlines()]

disease = ["MESH:D000544"] # Id for Alzheimer's Disease

In [None]:
len(genes)

529

There are 529 DEGs from the AD vs control analysis. Let's check how many are present in the knowledge graph

In [None]:
degs_in_kg = [gene for gene in genes if gene in G.nodes()]
len(degs_in_kg)

497

- 32 DEGs were not present in the graph
- Next, is to create a subgraph (subnetwork) consisting of AD node, the top predicted drugs, and the degs_in_kg

In [None]:
subgraph_nodes = drugs + degs_in_kg + disease

Include one hop neighbour of drugs nodes to subgraph. This is to ensure the subgraph has enough full connections. I could do 2 hop neighbours or include neigbours of degs_in_kg but it will become computationaly expensive. This study is for demonstration purpose and not a comprehensive study

In [None]:
#for drug in drugs:
#    neighbors = list(nx.neighbors(G, drug))
#    subgraph_nodes.extend(neighbors)

# remove duplicates
#subgraph_nodes = list(set(subgraph_nodes))

In [None]:
len(subgraph_nodes)

### Subgraph

In [None]:
subgraph = G.subgraph(subgraph_nodes)

### Closeness Centrality

Closeness centrality is a measure of how "close" or "central" a node is in a network. It is calculated as the reciprocal of the average shortest path distance from a node to all other nodes in the network.

Closeness centrality of a node u is the reciprocal of the average shortest path distance to u over all n-1 reachable nodes.

$$
C(u) = \frac{n - 1}{\sum_{v=1}^{n-1} d(v, u)}
$$


 
where d(v, u) is the shortest-path distance between v and u, and n-1 is the number of nodes reachable from u.

Generally, a higher closeness centrality value indicates that a node is closer, in terms of average shortest path distance, to all other nodes in the network. This can imply that the node has better access or influence over information or resources in the network. And have more influence over other nodes in the network, as they can potentially reach them more quickly and influence their decision-making.





In [None]:
res = nx.closeness_centrality(subgraph)

df = pd.DataFrame(list(res.items()), columns=["Drug", "Closeness Centrality"])
df = df[(df.Drug.isin(drugs))]

# Sort values in descending order
df.sort_values(by="Closeness Centrality", ascending=False, inplace=True)
df.reset_index(drop=True, inplace=True)

df.head()

Unnamed: 0,Drug,Closeness Centrality
0,5300-03-8,0.003407
1,4759-48-2,0.003407
2,302-79-4,0.003407
3,50-02-2,0.001704
4,378-44-9,0.001704


Selected top 20 candidates as final result

In [None]:
df1 = df.head(20)

In [None]:
drug_info = pd.read_csv(fp+"drugs_TTD.csv")
drug_info.head()

In [None]:
drugs_pred = drug_info[(drug_info.cas_rn.isin(df1.Drug))]
drugs_pred = drugs_pred[["drug_name", "status","cas_rn", "smiles"]].reset_index(drop=True)
drugs_pred

Unnamed: 0,drug_name,status,cas_rn,smiles
0,Triclosan,Approved,3380-34-5,C1=CC(=C(C=C1Cl)O)OC2=C(C=C(C=C2)Cl)Cl
1,Isotretinoin,Approved,4759-48-2,CC1=C(C(CCC1)(C)C)C=CC(=CC=CC(=CC(=O)O)C)C
2,Tretinoin,Approved,302-79-4,CC1=C(C(CCC1)(C)C)C=CC(=CC=CC(=CC(=O)O)C)C
3,Doxazosin,Approved,74191-85-8,COC1=C(C=C2C(=C1)C(=NC(=N2)N3CCN(CC3)C(=O)C4CO...
4,Pioglitazone,Approved,111025-46-8,CCC1=CN=C(C=C1)CCOC2=CC=C(C=C2)CC3C(=O)NC(=O)S3
5,Dexamethasone,Approved,50-02-2,CC1CC2C3CCC4=CC(=O)C=CC4(C3(C(CC2(C1(C(=O)CO)O...
6,Niclosamide,Approved,50-65-7,C1=CC(=C(C=C1[N+](=O)[O-])Cl)NC(=O)C2=C(C=CC(=...
7,Hydralazine,Approved,86-54-4,C1=CC=C2C(=C1)C=NN=C2NN
8,Hydrocortisone,Approved,50-23-7,CC12CCC(=O)C=C1CCC3C2C(CC4(C3CCC4(C(=O)CO)O)C)O
9,Cabergoline,Approved,81409-90-7,CCNC(=O)N(CCCN(C)C)C(=O)C1CC2C(CC3=CNC4=CC=CC2...


Save predicted drugs to file. Next is to validate drug categories from literatures

In [None]:
drugs_pred.to_csv(fp+"pred_drugs_top20.csv", index = False)

Write subgraph to file; to be visualised on cytoscape

In [None]:
import csv

subgraph_edges = subgraph.edges(data=True)

# Open a CSV file for writing
with open(fp+'subgraph.csv', 'w', newline='') as csvfile:
    fieldnames = ['source', 'relation', 'target', 'head_label', 'tail_label']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()

    # Write each edge to the CSV file
    for edge in subgraph_edges:
        source = edge[0]
        target = edge[1]
        relation = edge[2]['relation']
        head_label = subgraph.nodes[source]['label']
        tail_label = subgraph.nodes[target]['label']
        writer.writerow({'source': source, 'relation': relation, 'target': target, 'head_label': head_label, 'tail_label': tail_label})

print("Subgraph has been saved as subgraph.csv")


Subgraph has been saved as subgraph.csv


### Add current indication of pred_drugs

In [None]:
drug_indication = pd.read_csv("/content/drive/MyDrive/Project/data/drug_indication.csv")
drug_indication = drug_indication[drug_indication.cas_rn.isin(drugs_pred.cas_rn.values)]
drug_indication

In [None]:
cas2dx = {row[3]:row[1] for _,row in drug_indication.iterrows()}

In [None]:
drugs_pred["Indication"] = drugs_pred.cas_rn.map(cas2dx)
drugs_pred

In [None]:
drugs_pred.to_excel(fp+'predicted_drugs.xlsx', index=False)

# Thank You