# On the foundations of network-based drug repurposing

## The paradigm of network-based drug reporpusing (NetDR)

1. Construct **heterogenoeus disease-gene-protein-drug networks**.
    - **Disease-gene** edges indicate disease-gene associations.
    - **Gene-protein** edges indicate that a protein is encoded by a gene.
    - **Disease-protein** edges indicate that a protein in encoded by a gene associated with a disease.
    - **Drug-protein** edges indicate that a drug targets a protein.
    - **Protein-protein** edges indicate PPIs.
    - **Gene-gene** edges indicate that the encoded proteins interact.


2. Suggest **drug repurposing candidates** $C$ for disease $D$ such that:
    - Drug $C$ targets a proteins which is (close to a protein) associated with disease $D$.
    - Drug $C$ is indicated for a disease $D^\prime$ which is close to disease $D$.

## Background

- NetDR is **intuitively appealing** because it leverages **molecular disease profiles** for drug prediction.
- **However:** The underlying **data cannot necessarily be trusted**: literature bias in PPI networks, possibly misleading symptom- or organ-based disease ontologies, etc.

## Research question

- Is there a **data-informed foundation** for NetDR **beyond its intuitive appeal**? 
- Is there **systematic evidence** in the data which would lend **prior plausibility** to drug-repurposing candidates generated via NetDR?

## Overall approach

- Compare heterogeneous networks used by NetDR with three **reference networks**:
    - A **disease-disease network**, where edges indicate **comorbidity** and are attributed with the $\phi$-correlation.
    - A **disease-disease network**, where edges indicate that there is at least one drug indicated for both diseases and edges are attributed with the Jaccard index over the **shared drugs**.
    - A **drug-drug network**, where edges indicate that there is at least one common indication and edges are attributed with the Jaccard index over the **shared indications**.

## Distances vs. reference networks: test protocol (I)

### Disease view

1. For all disease pairs $(D_1,D_2)$, compute **shortest-path distances** in network with **disease-gene** and **gene-gene** edges (ignoring all disease nodes except $D_1$ and $D_2$ such that the path from $D_1$ to $D_2$ contains only genes as inner nodes).
2. Relate the obtained distances to **comorbidity data** (binary and $\phi$-correlation) and **shared drug indications** (binary and Jaccard index).

### Drug view

1. For all drug pairs $(D_1,D_2)$, compute **shortest-path distances** in network with **drug-protein** and **protein-protein** edges (ignoring all drug nodes except $D_1$ and $D_2$).
2. Relate the obtained distances to **shared indications** (binary and Jaccard index).

## Distances vs. reference networks: test protocol (II)

### Alternative hypotheses for disease view

- Distances for disease pairs with comorbidity edges or shared drug edges are **significantly shorter** than distances for disease pairs without such edges.
- The distances are **negatively associated** with the $\phi$-correlation and the Jaccard index.

### Alternative hypotheses for drug view

- Distances for drug pairs with shared indication edges are **significantly shorter** than distances for drug pairs without such edges.
- The distances are **negatively associated** with the Jaccard index.

## Distances vs. reference networks: reference edge vs. no reference edge

<img src="distances_vs_reference_edges.png">

**Null-hypothesis rejected:** Distances for disease pairs with comorbidity edges or shared drug edges are significantly shorter.

## Distances vs. reference data: reference edges scores

<img src="distances_vs_reference_scores.png" width="1200">

**Null-hypothesis rejected:** The distances are negatively associated with the $\phi$-correlation and the Jaccard index.

## Similarities with reference networks: test protocol (I)

## Similarities with reference networks: global distances

<img src="difference_global_distances.png">

**Null-hypotheses rejected:** For all network pairs and all distance types, the **global distances for the real networks were always smaller** than the distances for the randomized counterparts.

## Similarities with reference networks: global view on local distances

<img src="local_distances.png">

**Null-hypotheses rejected only for drug-drug network and normalized rank distance in comparison agains comorbiditome:** For all other network pairs and all distance types, the **local distances for the real networks were not significantly smaller** than the distances for the randomized counterparts.

## Similarities with reference networks: local view on local distances

<img src="local_empirical_p_values.png">

**Null-hypotheses rejected for less than 50 % of nodes:** Even without adjustment, the local empirical $p$-values **reached the 0.05 threshold only for 46.29 %** of all nodes.

## Conclusions

- **Globally**, the heterogeneous networks used in NetDR clearly mirror the information contained in independent reference networks based on comorbidities and shared drug-disease indications.
- Hence, we have shown that **NetDR does have a data-informed foundation** beyond its intuitive appeal and that the obtained candidate drugs **do have prior plausibility**.
- **However:** If we zoom-in on individual diseases or drugs, the correlation with the reference networks is often no longer visible.
- Consequently, **individual predictions must always be scrutinized** for plausibility **by domain experts**.
- Hence, user-friendly **expert-in-the-loop** solutions are indespensable in NetDR.