# NetworKit Link Prediction

Link prediction is concerned with estimating the probability of the existence of edges between nodes in a graph. The `linkprediction` module has sampling algorithms that provide methods to sample graphs as well link prediction algorithms. The results of the sampling algorithms can be passed to link prediction algorithms. 

This notebook begins by briefly introducing a wide range of link predicition algorithms available in NetworKit and then demonstrates how to use the sampling algorithms together with the link prediction algorithms and then concludes with some metrics to evaluate the accuracy of the results of the link prediction.

In [None]:
import networkit as nk

# Link prediction algorithms

## Adamic/Adar Index

The Adamic/Adar index predicts links in a social network according to the amount of shared links between two nodes. The index sums up the reciprocals of the logarithm of the degree of all common neighbors of u and v.

The constructor, [AdamicAdarIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=adamic#networkit.linkprediction.AdamicAdarIndex), expects a graph. After initializiation, the `run(u, v)` method should be called. It takes a pair of nodes (u, v) and returns the Adamic/Adar Index of the given node-pair (u, v).

In [None]:
# Read graph
G = nk.graphio.readGraph("../input/karate.graph", nk.Format.METIS)

In [None]:
# Initialize algorithm
aai = nk.linkprediction.AdamicAdarIndex(G)

In [None]:
# Get Adamic/Adar Index of 5 nodes
for i in range(5):
    print("Adamic/Adar of node {} and node {} = {}" .format(i, i+1, aai.run(i, i+i)))

## Algebraic Distance Index

The Algebraic distance index assigns a distance value to pairs of nodes according to their structural closeness in the graph.

The constructor, [AlgebraicDistanceIndex(G, numberSystems, numberIterations, omega=0.5, norm= 2)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=algeb#networkit.linkprediction.AlgebraicDistanceIndex) expects a graph followed by the number of systems to use for algebraic iteration and the  number of iterations in each system. `omega` is the overrelaxation parameter while `norm` is the norm factor of the extended algebraic distance. Maximum norm is realized by setting the norm to 0.

After initializiation, the `preprocess()` method should be called before the run(u, v) is executed. `run` takes a pair of nodes (u, v) and returns the algebraic distance index of the given node-pair (u, v).

In [None]:
# Initialize algorithm
adi = nk.linkprediction.AlgebraicDistanceIndex(G, 2, 200)
adi.preprocess()

In [None]:
# Get Algebraic distance index of first 5 nodes
for i in range(5):
    print("Algebraic distance index of node {} and node {} = {}" .format(i, i+1, adi.run(i, i+i)))

## Common Neighbors Index

The Common neighbors index calculates the number of common neighbors of a node-pair in a given graph. 

The constructor, [CommonNeighborsIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=common#networkit.linkprediction.CommonNeighborsIndex), expects a graph to work on. After initializiation, the `run(u, v)` method should be called. It takes a pair of nodes (u, v) and returns the number of common neighbors between u and v.

In [None]:
# Initialize algorithm
cni = nk.linkprediction.CommonNeighborsIndex(G)

In [None]:
# Get common neighbors of the first 5 nodes
for i in range(5):
    print("Common neighbors between node {} and node {} = {}" .format(i, i+1, cni.run(i, i+i)))

## Neighbors Measure Index

The Neighbors measure index returns the number of connections between neighbors of the given nodes u and v.

The [NeighborsMeasureIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=neighborsme#networkit.linkprediction.NeighborsMeasureIndex) constructor expects a graph. After initializiation, the `run(u, v)` method should be called. It takes a pair of nodes (u, v) and returns the neighbors measure index between nodes u and v.

In [None]:
# Initialize algorithm
nmi = nk.linkprediction.NeighborsMeasureIndex(G)

In [None]:
# Get common neighbors of the first 5 nodes
for i in range(5):
    print("Neighbors measure index between node {} and node {} = {}" .format(i, i+1, nmi.run(i, i+1)))

## Preferential Attachment Index

The Preferential attachment index suggests that the more connected a node is, the more likely it is to receive new links.

The [PreferentialAttachmentIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=preferential#networkit.linkprediction.PreferentialAttachmentIndex) constructor expects a graph. After initializiation, the `run(u, v)` method should be called. It takes a pair of nodes (u, v) and returns the product of the cardinalities of the neighborhoods regarding nodes u and v.

In [None]:
# Initialize algorithm
pai = nk.linkprediction.PreferentialAttachmentIndex(G)

In [None]:
# Get common neighbors of the first 5 nodes
for i in range(5):
    print("Preferential attachment index between node {} and node {} = {}" .format(i, i+1, pai.run(i, i+1)))

## Resource Allocation Index

The constructor, [ResourceAllocationIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=resource#networkit.linkprediction.ResourceAllocationIndex), expects a graph. After initializiation, the `run(u, v)` method should be called. It takes a pair of nodes (u, v) and returns the resource allocation index of the node-pair.

In [None]:
# Initialize algorithm
rai = nk.linkprediction.ResourceAllocationIndex(G)

In [None]:
# Get common neighbors of the first 5 nodes
for i in range(5):
    print("Resource allocation index of node {} and node {} = {}" .format(i, i+1, rai.run(i, i+1)))

## Same Community Index

The Same community index determines whether two nodes u and v are in the same community.

The constructor, [SameCommunityIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=samecommunity#networkit.linkprediction.SameCommunityIndex), expects a graph. After initializiation, the `run(u, v)` method should be called. It takes a pair of nodes (u, v) and returns `1` if the pair of nodes is in the same community.

In [None]:
# Initialize algorithm
sni = nk.linkprediction.SameCommunityIndex(G)

In [None]:
# Get same community index of some nodes
for i in range(5):
    print("Node {} and node {} community index = {}" .format(i, i+1, sni.run(i, i+1)))

## Total Neighbors Index

The Total neighbors index returns the number of nodes in the neighborhood-union of nodes u and v.

The [TotalNeighborsIndex(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=totalneighb#networkit.linkprediction.TotalNeighborsIndex) constructor expects a graph. After initializiation, the `run(u, v)` method should be called. It takes a pair of nodes (u, v) and returns the total neighbors index between u and v.

In [None]:
# Initialize algorithm
tni = nk.linkprediction.TotalNeighborsIndex(G)

In [None]:
# Get total neighbors index of some nodes
for i in range(5):
    print("Total neighbors between node {} and node {} = {}" .format(i, i+1, tni.run(i, i+1)))

# Link sampling and link prediction

This section shows how to use the training algorithms and the link prediction algorithms alongside each other, i.e, how to pass the results of the samplers to the link predictors. As an example, we shall use the Random Link Sampler, the Missing Links Finder and the Katz Index.

The Katz index assigns a pair of nodes a similarity score that is based on the sum of the weighted number of paths of length $l$ where $l$ is smaller than a given limit.

The class has 2 [constructors](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=katzindex#networkit.linkprediction.KatzIndex) that each take different parameters:
   1. `KatzIndex(G, maxPathLength=5, dampingValue=0.005)` takes a graph, followed by the maximum length of paths to consider, and the damping value.
   2. `KatzIndex(maxPathLength=5, dampingValue=0.005)` only takes the maximum length of paths to consider and damping value.
   
The [RandomLinkSampler(G, numLinks)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=randomlinksampler#networkit.linkprediction.RandomLinkSampler) provides methods to randomly sample a number of edges from a given graph. `numLinks` defines the number of links the returned graph should consist of. The sampler returns a graph that contains `numLinks` links from the given graph.

The [MissingLinksFinder(G)](https://networkit.github.io/dev-docs/python_api/linkprediction.html?highlight=missing#networkit.linkprediction.MissingLinksFinder) finds missing links in the given graph. The `findAtDistance(k)` function returns all missing links in the graph that have distance k.

In [None]:
# Read graph
G = nk.graphio.readGraph("../input/jazz.graph", nk.Format.METIS)

In [None]:
# Sample graph
trainingGraph = nk.linkprediction.RandomLinkSampler.byPercentage(G, 0.7)

In [None]:
# Find missing links
missingLinks = nk.linkprediction.MissingLinksFinder(trainingGraph).findAtDistance(2)

In [None]:
# Run link prediticion
predictions = nk.linkprediction.KatzIndex(G).runOn(missingLinks)

In [None]:
# Verify
for p in range(len(predictions)-1):
    a, b = predictions[p]
    c, d = predictions[p+1]
    assert (a < c or b < d)