# NML'22 tutorial 4: Manipulating graphs with NetworkX


After getting familiarized with defining a graph object with NetworkX (from edges or features), in this session we will explore some of the most common network models, look at their basic properties and compare them.

## 1 Creating graphs using network models

In [None]:
%matplotlib inline
import collections

import numpy as np
from scipy import spatial
from matplotlib import pyplot as plt
import networkx as nx

Create an Erdős-Rényi graph with $N=100$ vertices, and a probability of connecting each pair of vertices equal to $p=0.15$.

In [None]:
N = 100  # number of nodes
p = 0.15  # probability of connection
er = nx.erdos_renyi_graph(N, p)

You can retrieve the adjacency matrix of the graph, from the `Graph` object `er` as follows:

In [None]:
er_adj = nx.adjacency_matrix(er, range(N))
er_adj = er_adj.todense()

You can now visualise the adjacency matrix:

In [None]:
plt.spy(er_adj);

## 2 Plotting graphs

With NetworkX and Matplotlib we can also plot a graph. For example, we can plot the Erdős-Rényi graph that we created before as follows:

In [None]:
nx.draw(er)

### 2.1 Exercise

Create a Watts-Strogatz graph and plot it.

In [None]:
# Create a Watts-Strogartz graph.
# your code here
#ws =  

In [None]:
N=100
k = 10 # Regularity (number of nearest neighbors)
p = 0.2 # randomness (rewiring probabilty)
ws = nx.watts_strogatz_graph(N, k, p)
nx.draw(ws)

## 3 Modifying graphs

It's easy to add or remove edges, but also nodes. If we add an edge between nodes that don't yet exist, they will be automatically created.

In [None]:
er.add_node(100)

In [None]:
er.nodes()

Similarly, you can add and remove a collection of nodes or edges, and add and remove one node or edge:
* Adding nodes with:
    * `G.add_node`: One node at a time
    * `G.add_nodes_from`: A container of nodes
* Adding edges with:
    * `G.add_edge`: One edge at a time
    * `G.add_edges_from`: A container of edges
* Removing nodes with:
    * `G.remove_node`: One node at a time
    * `G.remove_nodes_from`: A container of nodes
* Removing edges with:
    * `G.remove_edge`: One edge at a time
    * `G.remove_edges_from`: A container of edges

You can get the number of edges with `G.size()`.

Add an edge between two non-existant vertices. Remove all nodes up to node 50. Draw the graph after each change.

In [None]:
er.add_edge(101, 102)
nx.draw(er)

In [None]:
er.size()

In [None]:
er.remove_nodes_from(range(50))
nx.draw(er)
er.nodes()

In [None]:
er.size() # returns number of edges

## 4 Degree distribution

`G.degree()` returns a ``DegreeView`` object with pairs of nodes and their degree.
If we specify a node, `G.degree(node)` will return the degree of that node.

Create an Erdős-Rényi network and plot a histogram of node degrees.  

In [None]:
N = 100  # number of nodes
p = 0.15  # probability of connection
er = nx.erdos_renyi_graph(N, p)

In [None]:
d = er.degree()
print(d)

In [None]:
# Erdős-Rényi node degree histogram.
degree_sequence = sorted([d for n, d in er.degree()], reverse=True)  # degree sequence: creating a sorted list
degreeCount = collections.Counter(degree_sequence)
deg, count = zip(*degreeCount.items())

fig, ax = plt.subplots()
ax.bar(deg, count)
ax.set_title("Degree Histogram")
ax.set_ylabel("Count")
ax.set_xlabel("Degree");

### 4.1 Fitting a distribution

Try to fit a Poisson distribution.

In [None]:
# Poisson distribution.
def poisson(mu, k):
    return np.exp(-mu) * mu**k * (np.math.factorial(k)**-1)

In [None]:
fig, ax = plt.subplots()
ax.bar(deg, count, label='Histogram')

# Poisson distribution
# your code here

ax.legend()
ax.set_title("Degree Histogram")
ax.set_ylabel("Count")
ax.set_xlabel("Degree");

In [None]:
fig, ax = plt.subplots()
ax.bar(deg, count, label='Histogram')

# Poisson distribution
mu = 2 * er.size() / N #average degree
k = np.arange(1, np.max(deg)+1)
deg = [100 * poisson(mu, i) for i in k]
ax.plot(k, deg, color='r', label='Poisson distribution')

ax.legend()
ax.set_title("Degree Histogram")
ax.set_ylabel("Count")
ax.set_xlabel("Degree");

In [None]:
print('average degree={}'.format(2*er.size()/N))

We observe that $ \langle k \rangle <\!\!< N $. Poisson distribution is a good approximation.  

Let's try with a higher probability of connection

In [None]:
N = 100  # number of nodes
p = 0.75  # probability of connection
er = nx.erdos_renyi_graph(N, p)
d = er.degree()
degree_sequence = sorted([d for n, d in er.degree()], reverse=True)  # degree sequence: creating a sorted list
degreeCount = collections.Counter(degree_sequence)
deg, count = zip(*degreeCount.items())

When the average degree increases and is no longer significantly smaller than the number of nodes the Poisson distribution is not a good fit.  

### 4.2 Exercise

Let's go back to the Watts-Strogatz network.

In [None]:
N=100
k = 10 # Regularity (number of nearest neighbors)
p = 0.2 # randomness (rewiring probabilty)
ws = nx.watts_strogatz_graph(N, k, p)

Calculate the averge distance $\langle d \rangle$ (average shortest path length) of the graph

In [None]:
# your code here

In [None]:
dAvg = nx.average_shortest_path_length(ws)
print(dAvg)

Now evaluate it through the small world property.  

In [None]:
# your code here

In [None]:
avg_degree = 2*ws.size()/N 
dAvg_small_world = np.log(N)/np.log(avg_degree)
print(dAvg_small_world)

Plot the absolute error of the small world estimation of $\langle d \rangle$ with respect to randomness $p$

In [None]:
probs = np.linspace(0,1,50)
err = np.zeros(probs.shape)
for idx,p in enumerate(probs):
  ws = nx.watts_strogatz_graph(N, k, p)
  dAvg = nx.average_shortest_path_length(ws)
  err[idx] = np.abs(dAvg-dAvg_small_world)

fig, ax = plt.subplots()
ax.plot(probs, err, label='small world error')
ax.set_title("small world error")
ax.set_xlabel("randomness")
ax.set_ylabel("<d> estimation error");


Calculating the average clustering coefficient ($C$) of the WS graph with NetworkX:

In [None]:
N=100
k = 10 # Regularity (number of nearest neighbors)
p = 0.2 # randomness (rewiring probabilty)
ws = nx.watts_strogatz_graph(N, k, p)
C = nx.average_clustering(ws)
print(C)

Estimating it using the random network model $C=\frac{\langle k \rangle}{N}$:

In [None]:
avg_degree = 2*ws.size()/N
C_est = avg_degree/N 
print(C_est)

Try to gradually increase the number of nodes $N$ and notice what happens. 