# Link Prediction Task and Features

Ref : https://www.youtube.com/watch?v=4dVwlE9jYxY&list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn&index=6

## Two formulations of the link prediction task

### 1. Links Missing at Random (MAR)

- The Scenario: Imagine you have a social network where some friendships are hidden. Maybe people haven't connected yet, or their connection is private. The goal is to predict these missing links based on the existing connections.

- The Assumption: The missing links are assumed to be randomly distributed. This means that the absence of a link doesn't tell us anything about the likelihood of it existing.

- Example: If you know two people have similar interests and are in the same city, but aren't friends yet, MAR assumes their lack of connection is purely random.

### 2. Links Over Time (LOT)

- The Scenario: Now, imagine a network where connections evolve over time. Think of a social network where friendships form, break, and change. The goal is to predict how these connections will change in the future.

- The Assumption: The evolution of links is influenced by factors like time, user behavior, and network dynamics.

- Example: If two people were friends in the past but haven't interacted recently, LOT would consider this information to predict if their friendship will continue or fade.

### Key Differences:

- MAR focuses on static networks: It assumes the network structure is fixed, and the task is to fill in missing pieces.

- LOT focuses on dynamic networks: It considers the temporal aspect of connections, predicting how they change over time.


##  Link prediction using proximity methods

1. Define "Proximity": We need to figure out what "close" means in the context of our network. This could be:

- Physical proximity: For example, people living in the same city or countries.
- Social proximity: People sharing similar interests, belonging to the same groups, or having common friends.
- Content proximity: Websites with similar content, documents with overlapping keywords, or products with similar features.

2. Measure Proximity: We use various metrics to quantify how close two entities are. Some common ones include:

- Shortest path distance: The number of hops (connections) between two nodes in a network.
- Common neighbors: The number of shared connections between two nodes.
- Jaccard similarity: The ratio of shared features (like interests or keywords) to the total number of features.
- Cosine similarity: A measure of the angle between two vectors representing entities, often used for content-based proximity.

3. Predict Links: Based on the proximity scores, we predict links between entities that are "close" together. For example:

- High shortest path distance: Two nodes with a short path between them are more likely to be connected.
- Many common neighbors: Nodes with many shared connections are more likely to be linked.
- High Jaccard similarity: Entities with similar features are more likely to be connected.

### Example:

Imagine a social network where people are connected based on their interests. Two people with many shared interests (high Jaccard similarity) are more likely to be friends than two people with very different interests.

### Advantages of Proximity Methods:

- Intuitive: The concept of proximity is easy to understand and relate to.
- Simple to implement: Many proximity metrics are straightforward to calculate.
- Effective for various network types: They can be applied to social networks, knowledge graphs, and other types of networks.

### Limitations:

- Limited to local information: Proximity methods primarily consider local connections and may miss global patterns.
- Can be sensitive to network structure: The effectiveness of proximity methods can vary depending on the network's topology.