# NetworKit Sparsification Tutorial

The sparsification module has algorithms that compute edge scores, and algorithms that sparsify an input graph. In this notebook the usage of both categories of algorithms shall be demonstrated.

All sparsification algorithims rely on edge scores, thus the edges of the graph need to be indexed. Call the [indexEdges()](https://networkit.github.io/dev-docs/python_api/networkit.html?highlight=indexedges#networkit.Graph.indexEdges) method if the edges of your graph are not indexed.The `scores()` method that can be called after running an edge scores algorithm returns an edge attribute that holds for each edge the maximum parameter value such that the edge is contained in the sparsified graph.

The [getSparsifiedGraph(G, parameter, attribute)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=getsparsif#networkit.sparsification.Sparsifier.getSparsifiedGraph) or [getSparsifiedGraphOfSize(G, edgeRatio, attribute)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=getsparsif#networkit.sparsification.Sparsifier.getSparsifiedGraphOfSize) methods can be used after running a sparsification algorithm to obtain the sparsified graph. `parameter` is a parameter value that determines the degree of sparsification attribute while `edgeRatio` is the target edge ratio of the specified graph. `attribute` is an optional parameter and a previously calculated edge attribute. If none is provided, one will try to be calculated.

In [None]:
import networkit as nk

In [None]:
G = nk.readGraph("../input/jazz.graph", nk.Format.METIS)
G.indexEdges()
G.numberOfEdges()

We shall pass the same `edgeRatio` to all sparsifiers. As a result, all sparsified graphs should be approximately the same size regardless of the sparsification method that we use.

In [None]:
targetRatio = 0.2

## Forest Fire 

The Forest Fire sparsifier implements a variant of the Forest Fire sparsification approach that is based on random walks.

### Edge Scores

The [ForestFireScore(G, pf, tebr)]() constructor expects a graph, the probability `pf` that the neighbor nodes will burn as well, and the target burn ratio which states that forest fire will burn until `tebr` * numberOfEdges edges have been burnt.

In [None]:
# Initialize
ffs = nk.sparsification.ForestFireScore(G, 0.6, 5.0)
# Run
ffs.run()
# Get edge scores
attributes = ffs.scores()
print(attributes[:10])

### Sparsification

The [ForestFireSparsifier(burnProbability, targetBurntRatio)]() constructor expects the probability `burnProbability` that the neighbor nodes will burn as well and the target burn ratio which states that forest fire will burn until `targetBurntRatio` * numberOfEdges edges have been burnt.

In [None]:
# Initialize
fireSparsifier = nk.sparsification.ForestFireSparsifier(0.6, 5.0)
# Get sparsified graph
fireGraph = fireSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
# Compare graphs
print(G.numberOfEdges(), fireGraph.numberOfEdges())

## Global Threshold Filter 

The Global Threshold Filter calculates a sparsified graph by filtering globally using a constant threshold value and a given edge attribute.

The [GlobalThresholdFilter(G, attribute, e, above)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=globalth#networkit.sparsification.GlobalThresholdFilter) constructor expects a graph, a list of edge attributes, a threshold value `e` and a Boolean value `above`. If above is set to true, all edges with an attribute value equal to or above will be kept in the sparsified graph. The `calculate` methode returns the sparsified graph.

### Sparsification

In [None]:
# Initialize
gtf = nk.sparsification.GlobalThresholdFilter(G, attributes, 0.2, False)
# Run
newG = gtf.calculate()
print(G.numberOfEdges(), newG.numberOfEdges())

## Local Degree

The LocalDegree sparsification approach is based on the idea of hub nodes. This attributizer calculates for each edge the maximum parameter value such that the edge is still contained in the sparsified graph.

### Edge Scores

The [LocalDegreeScore(G)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=local%20degree#networkit.sparsification.LocalDegreeScore) constructor expects a graph.

In [None]:
# Initialize
lds = nk.sparsification.LocalDegreeScore(G)
# Run
lds.run()
# Get edge scores
ldsScores = lds.scores()
print(ldsScores[:10])

### Sparsification

In [None]:
# Initialize
localDegSparsifier = nk.sparsification.LocalDegreeSparsifier()
# Get sparsified graph
localDegGraph = localDegSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
# Compare graphs
print(G.numberOfEdges(), localDegGraph.numberOfEdges())

## Local Similarity

This attributizer calculates for each edge the maximum parameter value such that the edge is still contained in the sparsified graph.

### Edge Scores

The [LocalSimilarityScore(G, triangles)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=local#networkit.sparsification.LocalSimilarityScore) constructor expects a graph and  previously calculated edge triangle counts of the graph. 

The edge triangles in the graph can be computed using the [TriangleEdgeScore(G)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=triangle#networkit.sparsification.TriangleEdgeScore) algorithm.

In [None]:
# Compute triangles in G
tes = nk.sparsification.TriangleEdgeScore(G)
tes.run()
triangles = tes.scores()

In [None]:
# Compute Local Similarity Score
lss = nk.sparsification.LocalSimilarityScore(G, triangles)
# Run
lss.run()
# Get edge scores
lss.scores()[:10]

### Sparsification

In [None]:
# Initialize
similaritySparsifier = nk.sparsification.LocalSimilaritySparsifier()
# Get sparsified graph
similarityGraph = similaritySparsifier.getSparsifiedGraphOfSize(G, targetRatio)
# Compare graphs
print(G.numberOfEdges(), similarityGraph.numberOfEdges())

## Random Edge Score

This attributizer generates a random edge attribute. Each edge is assigned a random value in [0,1].

### Edge Scores

The [RandomEdgeScore(G)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=randomedge#networkit.sparsification.RandomEdgeScore) constructor expects a graph.

In [None]:
# Initialize
res = nk.sparsification.RandomEdgeScore(G)
# Run
res.run()
# Get edge scores
randomEdgeScores = res.scores()
print(randomEdgeScores[:10])

### Sparsification

In [None]:
# Initialize
randomEdgeSparsifier = nk.sparsification.RandomEdgeSparsifier()
# Get sparsified graph
randomGraph = randomEdgeSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
# Compare graphs
print(G.numberOfEdges(), randomGraph.numberOfEdges())

## Random Node Edge Score

This attributizer returns edge attributes where each value is selected uniformly at random from [0,1].

### Edge Scores

The [RandomNodeEdgeScore(G)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=randomnode#networkit.sparsification.RandomNodeEdgeScore) constructor expects a graph.

In [None]:
# Initialize
rn = nk.sparsification.RandomNodeEdgeScore(G)
# Run
rn.run()
# Get edge scores
randomNodeScores = rn.scores()
print(randomNodeScores[:10])

### Sparsification

In [None]:
# Initialize
randomNodeEdgeSparsifier = nk.sparsification.RandomNodeEdgeSparsifier()
# Get sparsified graph
randomNodeGraph = randomNodeEdgeSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
# Compare graphs
print(G.numberOfEdges(), randomNodeGraph.numberOfEdges())

## SCAN Structural Similarity Score

This algorithm is a Structural Clustering Algorithm for Networks (SCAN) whose goal is to find  clusters,  hubs,  and  outliers  in  large  networks.

### Edge Scores

The [SCANStructuralSimilarityScore(G, triangles)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=scan#networkit.sparsification.SCANStructuralSimilarityScore) constructor expects a graph and previously calculated edge triangle counts of the graph.

In [None]:
# Initialize
scan = nk.sparsification.SCANStructuralSimilarityScore(G, triangles)
# Run
scan.run()
# Get edge scores
scanScores = scan.scores()
print(scanScores[:10])

### Sparsification

In [None]:
# Initialize
scanSparsifier = nk.sparsification.SCANSparsifier()
# Get sparsified graph
scanGraph = scanSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
# Compare graphs
print(G.numberOfEdges(), scanGraph.numberOfEdges())

## Simmelian Overlap Score

This is an implementation of the parametric variant of Simmelian Backbones. It calculates for each edge the minimum parameter value such that the edge is still contained in the sparsified graph. 

### Edge Scores

The [SimmelianOverlapScore(G, triangles, maxRank)](https://networkit.github.io/dev-docs/python_api/sparsification.html?highlight=simmelian#networkit.sparsification.SimmelianOverlapScore) constructor expects a graph, triangles and the maximum rank that is considered for overlap calculation.

In [None]:
# Initialize
sos = nk.sparsification.SimmelianOverlapScore(G, triangles, 5)
# Run
sos.run()
sosScores = sos.scores()
print(sosScores[:10])

### Sparsification

In [None]:
# Initialize
simmelianSparsifier = nk.sparsification.SimmelianSparsifierNonParametric()
# Get sparsified graph
simmelieanGraph = simmelianSparsifier.getSparsifiedGraphOfSize(G, targetRatio)
# Compare graphs
print(G.numberOfEdges(), simmelieanGraph.numberOfEdges())