<img src="https://raw.githubusercontent.com/networkit/networkit/master/docs/logo/logo_color.png" alt="Drawing" style="width: 600px;"/>

# Requirements:

In [None]:
!pip install numpy==1.26.4

After executing the cell above, go to "Runtime" -> "Restart Session" (once). Otherwise the new (and needed) numpy version will not be recognized by Colab.

In [None]:
!wget https://raw.githubusercontent.com/fabratu/nd24/main/networkit-11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
!wget https://raw.githubusercontent.com/fabratu/nd24/main/karate
!wget https://raw.githubusercontent.com/fabratu/nd24/main/PGPgiantcompo.graph

In [None]:
%pip install --force-reinstall ./networkit-11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
%pip install seaborn plotly ipycytoscape networkx tabulate

The two cells above need also only to be executed once before you start with the content. Enjoy!

# NetworKit Introduction

At first, we import NetworKit as a Python module. The underlying C++ core is loaded automatically (similar to scipy, sklearn, numpy, ...)


In [None]:
import networkit as nk

## Creating a graph

Let us start by reading a network from a file on disk: `PGPgiantcompo.graph` network. In the course of this tutorial, we are going to work on the `PGPgiantcompo` network, a social network/web of trust in which nodes are PGP keys and an edge represents a signature from one key on another. It is distributed with NetworKit as a good starting point.

Network datasets are available in different formats. NetworKit supports many popular formats, like Edgelist, SNAP, MatrixMarket and more. `PGPgiantcompo.graph` is present in the so called `METIS` format. For reading the graph from disk, we can use therefore the `METISGraphReader`. 

In [None]:
# Read a METIS Graph (sparse adjacency matrix)
G = nk.graphio.METISGraphReader().read('PGPgiantcompo.graph')

In case we don't know the format of a network, there exists also a convenient function in the top namespace which tries to guess the input format and select the appropriate reader:

In [None]:
G = nk.readGraph('PGPgiantcompo.graph')

In [None]:
# Print basic informations of the network

nk.overview(G)

### Graph from Numpy, Scipy and Pandas data

In addition to reading a graph from file, it is also possible to create a graph based on given input data in COO (coordinate) format by using `nk.GraphFromCoo(...)`. The parameter syntax is related to `scipy.sparse.coo_array`. We start by manually define row, col and data arrays (whereas `A[row[k], col[k]] = data[k]`) and use them as a constructed tuple in the form (data, (row, col)).

In [None]:
import numpy as np
import scipy as sc
import pandas as pd

# Start with numpy
row = np.array([0, 1, 2])
col = np.array([1, 2, 0])
data = np.array([1.0, 1.0, 1.0])

GData = nk.GraphFromCoo((data,(row,col)))

nk.overview(GData)

For speedup purposes, it is possible to also pass the number of expected nodes as a parameter. Due to the tiny size of our example graph, the difference is very small in this case. However, for the majority of use cases, providing n results in a much faster creation of the graph.

It is also possible to omit the `data` frame. In this case the resulting edge weight will all be `1.0`.

In [None]:
row = np.random.randint(50, size=50)
col = np.random.randint(50, size=50)

GData = nk.GraphFromCoo((row,col), n = 50)

In [None]:
nk.overview(GData)

Additionally we can also create a graph from a `scipy.sparse.coo_matrix`.

In [None]:
data = np.random.random((50,))

# From SciPy
S = sc.sparse.coo_matrix((data, (row, col)), dtype = np.double)

GData = nk.GraphFromCoo(S, n=50)
nk.overview(GData)

A common usecase involves data handling via pandas. Since the underlying data structure is easily transformed to arrays create graphs from pandas DataFrames. In our example we define a set of people (Alice, Bob, Carol, Dan, Erin and Frank) and relationships between them. Each row in the DataFrame describes a (directed) relation from `People_1` to `People_2`. For example Alice has a relationship to Carol (not necessarely the other way around, therefore directed).

In [None]:
# From pandas
persons = pd.CategoricalDtype(categories=['Alice', 'Bob', 'Carol', 'Dan', 'Erin', 'Frank'], ordered=True)
relations = [('Alice', 'Carol'), ('Bob', 'Dan'), ('Dan', 'Erin'), ('Carol', 'Frank')]

friends_df = pd.DataFrame(relations, columns=['Person_1', 'Person_2']).astype(persons)
print(friends_df)

In [None]:
GData = nk.GraphFromCoo((friends_df['Person_1'].cat.codes.to_numpy(dtype=np.uint, copy = False), friends_df['Person_2'].cat.codes.to_numpy(dtype=np.uint, copy = False)), n = len(persons.categories), directed = True)

nk.overview(GData)

## The Graph Object

![Graph](https://upload.wikimedia.org/wikipedia/commons/b/bf/Undirected.svg)


`Graph` is the central class of NetworKit. An object of this type represents an undirected, optionally directed and/or weighted network. 

Coming back to the `PGPgiantcompo.graph`. Let us inspect several of the methods which the class provides.

In [None]:
print(G.numberOfNodes(), G.numberOfEdges())

Nodes are simply integer indices, and edges are pairs of such indices.

In [None]:
for u in G.iterNodes():
    if u > 5:
        print('...')
        break
    print(u)

In [None]:
i = 0
for u, v in G.iterEdges():
    if i > 5:
        print('...')
        break
    print(u, v)
    i += 1


Now that we have created a graph we can start to play around with it. Say we want to remove the node with the node ID 10, so the third node. We can easily do so using `Graph.removeNode(node u)`.

In [None]:
G.removeNode(10)

In [None]:
# 10 has been deleted
print(G.hasNode(10))

The node has been remove from the graph, however, the node IDs are not adjusted to the match the new number of nodes. Hence, if we want to restore the node we previously removed from G, we can do so using `Graph.restoreNode(node u)` using the original node ID.

In [None]:
G.restoreNode(10)

## Node and Edge Attributes

It is possible to attach attributes to the nodes of a NetworKit graph with `attachNodeAttribute`. Attributes in Python can be of type str, float, or int. In C++ it is possible to add arbitrary types. We will use here the Python interface.

In [None]:
GAttr = nk.readGraph("karate")

# Create a new attribute named 'taste' of type 'str'
att = GAttr.attachNodeAttribute("taste", str)

# Set attribute values by interacting with the attribute object. It references the specific attribute storage for the graph.
att[0] = "sweet" # Attribute of node 0
att[1] = "umami" # Attribute of node 1

# Get attribute value
for u in GAttr.iterNodes():
    try:
        print(f"Attribute of node {u} is {att[u]}")
    except ValueError:
        print(f"Node {u} has no attribute")
        break    


In the same way, it is also possible to add edge attributes to a NetworKit graph with `attachEdgeAttribute`. Attributes can be of type str, float, or int. Note that the edges of the graph have to be indexed.


It is possible to access the attributes both by edge index and by endpoints. Note: Access by edge index can be much slower compared to access by endpoints, therefore best use `att[u, v]` for access.

In [None]:
# Add some edges
GAttr = nk.Graph(4)
GAttr.addEdge(0,1)
GAttr.addEdge(1,2)
GAttr.addEdge(0,3)

# Call indexEdges once (all edges inserted afterwards will also get indexed)
GAttr.indexEdges()

# Create a new attribute named 'rating' of type 'float'
try:
    attrEdge = GAttr.getEdgeAttribute("rating", float)
except:
    attrEdge = GAttr.attachEdgeAttribute("rating", float)

# Set attribute values
attrEdge[0, 1] = 8.1 # Attribute of edge 0-1
attrEdge[0, 3] = 2.1 # Attribute of node 0-2

# Get attribute value by edge endpoints (fast)
for u,v in GAttr.iterEdges():
    try:
        print(f"Attribute of edge ({u},{v}) is {attrEdge[v,u]}")
    except ValueError:
        print(f"Edge ({u},{v}) has no attribute")
        break    



### Convert networkx to networkit (with attributes)

In [None]:
import networkx as nx

# 0 - 1
# |   |
# 3   2

Gnx = nx.Graph()
Gnx.add_edge(0, 1)
Gnx.add_edge(1, 2)
Gnx.add_edge(0, 3)

Gnx.nodes[0]["taste"] = "sweet"
Gnx.nodes[1]["taste"] = "umami"

Gnx[0][1]["rating"] = 8.1
Gnx[0][3]["rating"] = 2.1

In [None]:
# Convert networkx graph to networkit transfering the data
Gnk = nk.nxadapter.nx2nk(Gnx, data=True)
tasteAttr = Gnk.getNodeAttribute("taste", str)
ratingAttr = Gnk.getEdgeAttribute("rating", float)

print(f"Attribute of node 0: {tasteAttr[0]}")
print(f"Attribute of edge 0-3: {ratingAttr[0,3]}")

In [None]:
Gnx[1][2]["rating"] = 1.0

# Do the same but use "rating" as an edge weight
Gnk = nk.nxadapter.nx2nk(Gnx, weightAttr="rating", data=True)
print(f"Weight of edge 0-1: {Gnk.weight(0,1)}")

## Connected Components

![Components](https://upload.wikimedia.org/wikipedia/commons/3/38/Equivalentie.svg)

A connected component is a set of nodes in which each pair of nodes is connected by a path. We use this component extraction as an example of how algorithms are designed in NetworKit. An algorithm is always an object, which usually is created by providing the reference (parameter) to a given `Graph` object. 

```
myAlgorithm = nk.module.Algorithm(Graph)
```

This creation also includes necessary setup steps. Afterwards we call `Algorithm.run()` to execute the computational heavy logic. In the end we can receive (depending on the category of the algorithm) the results.

```
myAlgorithm.run()
myResult = myAlgorithm.results()
```

We use this scheme to determine the connected components of a graph:

In [None]:
nk.overview(G)

In [None]:


cc = nk.components.ConnectedComponents(G)
cc.run()
print("number of components ", cc.numberOfComponents())
v = 0
print("component of node ", v , ": " , cc.componentOfNode(0))
print("map of component sizes: ", cc.getComponentSizes())

## Visualization

The vizbridges module provides the widgetFromGraph function, which creates and returns Python widgets for graph visualization. Per default, a graph is plotted in 2D using the Python-package ipycytoscape. If the parameter dimension to 3D, the graph network is plotted in 3D using plotly. For this to work one or both of the packages have to be installed on the machine where the Jupyter backend is running. The default mode is 2D. There is an optional parameter for node scores or a partition list (e.g. as a result of from centrality or community detection algorithms). If provided, the nodes are colored according to their partition membership or score.


### Visualize in 2D using Cytoscape

When plotting a graph in 2D with Cytoscape, the internal layouting algorithm from Cytoscape is used. This and the performance of the plugin makes this visualization suitable for graphs with up to around 500 nodes. For larger graphs, it is recommended to use the 3D visualization. So at first we load and visualize a smaller example, the well known karate graph.

Link: https://cytoscape.org/


In [None]:
from networkit import vizbridges
from google.colab import output
output.enable_custom_widget_manager()

# Read a KONECT graph (adjacency list)
G = nk.readGraph('karate')
nk.overview(G)

# Visualize the Karate graph
nk.vizbridges.widgetFromGraph(G)

### Visualize in 3D using Plotly

When plotting a graph in 3D with Plotly, the Maxent-Stress layouting from `networkit.viz.MaxentStress` is used. With a moderate to decent client, graph visualizations with up to 50k of nodes are possible. Note: The time it takes for generating the widget scales with the number of nodes.

Link: https://plotly.com/


In [None]:
# Visualize the graph in 3D
myWidget = nk.vizbridges.widgetFromGraph(G, dimension = nk.vizbridges.Dimension.Three)
myWidget.show()


# Additional Stuff

## Degree Distribution

![DD](https://upload.wikimedia.org/wikipedia/commons/9/97/UndirectedDegrees.svg)


Node degree, the number of edges connected to a node, is one of the most studied properties of networks. Types of networks are often characterized in terms of their distribution of node degrees. We obtain and visualize the degree distribution of our example network as follows.

In [None]:
nk.plot.degreeDistribution(G)

As we can see, most of the nodes are connected to only a few edges. It might be beneficial to visualize the degrees in a log-log-plot:

In [None]:
import numpy
import matplotlib.pyplot as plt
dd = sorted(nk.centrality.DegreeCentrality(G).run().scores(), reverse=True)
degrees, numberOfNodes = numpy.unique(dd, return_counts=True)
plt.xscale("log")
plt.xlabel("degree")
plt.yscale("log")
plt.ylabel("number of nodes")
plt.plot(degrees, numberOfNodes)
plt.show()


## Community Detection

![Community](https://upload.wikimedia.org/wikipedia/commons/f/f4/Network_Community_Structure.svg)

This section demonstrates the community detection capabilities of NetworKit. Community detection is concerned with identifying groups of nodes which are significantly more densely connected to eachother than to the rest of the network.

Code for community detection is contained in the `community` module. The module provides a top-level function to quickly perform community detection with a suitable algorithm and print some stats about the result.


In [None]:
nk.community.detectCommunities(G)

The function prints some statistics and returns the partition object representing the communities in the network as an assignment of node to community label. 

`Modularity` is the primary measure for the quality of a community detection solution. The value is in the range [-0.5,1] and usually depends both on the performance of the algorithm and the presence of distinctive community structures in the network.

### Choice of Algorithm

The community detection function used a good default choice for an algorithm: PLM, our parallel implementation of the well-known Louvain method. It yields a high-quality solution at reasonably fast running times. But there are other choices, like the Leiden algorithm. This leads to a similar modularity score than the default one, but the communities are more evenly sized. However, as often, the results are really depending on the network at hand. For `PGPgiantcompo`, Leiden does lead to inferior results.

In [None]:
nk.community.detectCommunities(G, algo=nk.community.ParallelLeiden(G, randomize=False, iterations=100))

So instead, let’s capture the result of the default function call.

In [None]:
communities = nk.community.detectCommunities(G)

## Visualizing bigger graphs (different options)

A note on a datastructure called a `Partition`. A partition is a datastructure, which in our context places the graph nodes into different disjoint subsets. The community detection algorithm produces such partition object. We can feed this into the plotting mechanism in order to visualize the communities of our graph.

Therefore first we again load the `PGPgiantcompo` network and then visualize both the network and the communities. Not that the generation of the 3D layout can take some seconds.

In [None]:
# Only execute this cell in your own environment. Colab is likely to constrained on resources.
# myWidget = nk.vizbridges.widgetFromGraph(G, dimension = nk.vizbridges.Dimension.Three, nodePartition = communities)
# myWidget.show()

Oops!

As we can see, the visualization is colorful and we can see the different communities, however information extraction is difficult due to the density and amount of nodes involved. We can approach this problem for example in two ways:

(i) Sparsification of the graph

(ii) Looking only a subset of nodes (subgraph) and investigate further

### First method: Sparsify the graph

In [None]:
# Initialize the algorithm
jaccard = nk.sparsification.JaccardSimilaritySparsifier()

# Set the ratio of preserved edges
targetRatio = 0.25

# Get sparsified graph
G.indexEdges()
GSparsified = jaccard.getSparsifiedGraphOfSize(G, targetRatio)

nk.overview(GSparsified)

The new sparsified graph is the same as before, but with fewer connections between the nodes. This also means, that some node can not anymore reach other nodes directly. For further analysis it often makes sense to extract the largest connected component - so the largest group of nodes, which can all reach each other.

In [None]:

GSparsifiedCC = nk.components.ConnectedComponents.extractLargestConnectedComponent(GSparsified, compactGraph = True)
nk.overview(GSparsifiedCC)

In [None]:
communitiesSparsified = nk.community.detectCommunities(GSparsifiedCC)

myWidget = nk.vizbridges.widgetFromGraph(GSparsifiedCC, dimension = nk.vizbridges.Dimension.Three, nodePartition = communitiesSparsified)
myWidget.show()

## Second method: Subgraph

NetworKit supports the creation of Subgraphs depending on an original graph and a set of nodes. This might be useful in case you want to analyze certain communities of a graph. And it also helps us in terms of visualizing information. Let’s say that community 2 of the above result is of further interest, so we want a new graph that consists of nodes and intra cluster edges of community 2.

Note: add visualization of nodes

In [None]:
c2 = communities.getMembers(2)
GSub = nk.graphtools.subgraphFromNodes(G, c2, compact=True)

nk.overview(GSub)

In [None]:
communities2 = nk.community.detectCommunities(GSub)

nk.vizbridges.widgetFromGraph(GSub, dimension = nk.vizbridges.Dimension.Two, nodePartition=communities2)