## Link Prediction

Link prediction is the problem of predicting the existence of a link between two nodes in a network.

<center><img src="images/link_prediction_problem.png" width="300"></center>

It has many applications such as: 
- **friend recommendation in social networks:** The goal is to predict which pairs of users, who are not currently friends, are most likely to become friends based on the existing structure of the network and possibly other user-specific data.
- **co-authorship prediction in citation networks:** Link prediction in co-authorship prediction within citation networks refers to the process of forecasting potential future collaborations between authors based on the existing structure and patterns of the network. In this context, a citation network is a graph where nodes represent authors (or papers), and edges represent co-authorship or citation relationships between them.
- **movie recommendation in Netflix:** Link prediction in the context of movie recommendation on platforms like Netflix involves predicting the likelihood of a user forming a "link" (i.e., an interaction such as watching, rating, or liking) with a movie they have not yet interacted with. The goal is to recommend movies that the user is most likely to enjoy based on their past behavior and the behavior of other users with similar preferences.
- **protein interaction prediction in biological networks:** Protein interaction prediction in biological networks, using link prediction, involves forecasting potential interactions between proteins within a biological network. This is crucial for understanding the molecular mechanisms underlying various biological processes and for identifying new targets for drug discovery.
- **drug response prediction:** Link prediction in drug response prediction involves forecasting how different drugs will interact with various targets (such as proteins, genes, or cells) and how these interactions will result in a therapeutic response or adverse effect. The primary goal is to predict the effectiveness or toxicity of a drug on a specific biological target, which can help in personalized medicine, drug discovery, and understanding disease mechanisms.

### Traditional Link Prediction Methods

These methods can be categorized into three classes: 
- heuristic methods 
- latent-feature methods
- content-based methods

The link prediction problem has been studied extensively, leading to the development of numerous techniques. We will first explore popular heuristics that utilize both local and global neighborhood information. Can you think of a simple rule of thumb to predict whether two nodes should be connected?

        
- Common neighbors: It is based on intuition that "the more neighbors you have in common, the more likely you are to be connected". This heuristic simply counts the number of neighbors two nodes ($x$ and $y$) have in common:

\begin{equation*}
f_{CN}(x,y) = |\mathcal{N}(x) \cap \mathcal{N}(y)|
\end{equation*}

- Jaccard coefficient: Jaccard’s coefficient measures the proportion of shared neighbors between two nodes. It builds on the idea of common neighbors but normalizes the count by the total number of neighbors. This method favors nodes with fewer neighbors over those with a high degree.

\begin{equation*}
f_{Jaccard}(x,y) = \frac{|\mathcal{N}(x) \cap \mathcal{N}(y)|}{|\mathcal{N}(x) \cup \mathcal{N}(y)|}
\end{equation*}
