# Centrality measures for the Karate Club network

In [None]:
import networkx as nx
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import pandas as pd
import seaborn as sns

In this exercise, we will get familiar with common centrality measures by applying them to a small, famous network — Zachary's Karate Club.

The network captures 34 karate club members and their interactions outside the club. A conflict between "John A" (administrator) and "Mr. Hi" (instructor) led to the club splitting. Half the members joined Mr. Hi's new club, while the others either found a new instructor or quit karate. Zachary accurately predicted all but one member's group after the split. This network has been regularly used as a benchmark for community detection, something you'll learn about next week. 

We will use networkx to compute the following centrality measures for the network:

1) Betweenness centrality $bc(i)$:
Number of shortest paths that pass through node $i$. If there are several shortest paths between a given pair of nodes, the contribution of that pair to the betweenness of $i$ is given by the fraction of those paths that contain $i$. Betweenness scores are normalized by $(N-1)(N-2)$, i.e. the number of node pairs in the network, excluding pairs that contain $i$ because paths starting or ending in node $i$ do not contribute to its betweenness. If $\sigma_{st}$ is the number of shortest paths from $s$ to $t$ and $\sigma_{sit}$ the number of such paths that contain $i$, then
$$
bc(i) = \frac{1}{(N-1)(N-2)}\sum_{s\neq i} \sum_{t\neq i} \dfrac{\sigma_{sit}}{\sigma_{st}}.
$$

2) Closeness centrality $C(i)$:
The inverse of the average shortest path distance to all other nodes except $i$:
$$
C(i) = \dfrac{N-1}{\sum\limits_{v\neq i}d(i,v)}.
$$

3) Eigenvector centrality $e(i)$:
Eigenvector centrality is a generalization of degree that takes into account the degrees of the node's neighbors and recursively the degrees of the neighbours of its neighbours, and of the neighbours of the neighbours of the neighbours, and so on. It corresponds to the elements of the eigenvector of the adjacency matrix associated with the largest eigenvalue.

Let's first generate the network and visualize it (no modifications to the code below needed, just run it).

In [None]:
net=nx.karate_club_graph() # the network is so famous that it even has its own networkx generator...
pos = nx.spring_layout(net) # calculate the drawing coordinates of nodes separately for future use
nx.draw(net,pos);

## a. Calculating and visualizing centralities using NetworkX

Use NetworkX to compute the betweenness, closeness, and eigenvector centralities for the Karate Club. Then, visualize the networks with the nodes coloured according to each centrality measure, to get an understanding of where the central nodes are. The lighter the node color, the higher its centrality. 




In [None]:
# NO NEED TO MODIFY THIS CELL. Yields a color corresponding to x (scaled between minval and maxval).
def get_color(x, minval, maxval, cmap_name='viridis'):
    # Normalize x between minval and maxval
    norm = mcolors.Normalize(vmin=minval, vmax=maxval)
    
    # Get the colormap
    cmap = plt.get_cmap(cmap_name)
    
    # Map the normalized value to a color
    color = cmap(norm(x))
    
    return color

In [None]:
def get_centrality_measures(network, tol=1e-6):
    """
    Calculates three centrality measures (betweenness, closeness, and 
    eigenvector centrality) for the nodes of the given network.
    Returns the centralities as three dictionaries {node_id:centrality}.

    Parameters
    ----------
    network: nx.Graph
        The network for which the centrality measures are calculated.
    tol: float
        Tolerance parameter for calculating the eigenvector centrality.
        You don't need to touch this, except if the eigenvector centrality doesn't converge.

    Returns
    -------
    betweenness: dictionary
        The betweenness centrality of the nodes in the network.
    closeness: dictionary
        The closeness centrality of the nodes in the network.
    eigenvector: dictionary
        The eigenvector centrality of the nodes in the network.
    """

    #TODO: Use NetworkX functions to obtain, for each centrality measure, 
    # a dictionary where the keys are the names of the nodes and the values are 
    # the value of centrality of the corresponding node. 
    # You'll find the functions easily at https://networkx.org/documentation/stable/reference/algorithms/centrality.html

    # YOUR CODE HERE
    betweenness = nx.betweenness_centrality(network)
    closeness = nx.closeness_centrality(network)
    eigenvector = nx.eigenvector_centrality(network, tol=tol)

    return [betweenness, closeness, eigenvector]

In [None]:
# NO NEED TO MODIFY THIS CELL

centralities=get_centrality_measures(net)
centrality_names=['betweenness centrality','closeness centrality','eigenvector centrality']

# generate centrality color lists because networkx drawing cannot handle dictionaries (why oh why)

centrality_color_lists=[]

for c in centralities:
    minval=min(c.values())
    maxval=max(c.values())
    templist=[]
    for node in net:
        nodecolor=get_color(c[node],minval,maxval,cmap_name='viridis')
        templist.append(nodecolor)
    centrality_color_lists.append(templist)

figsize = (12, 2.5)
fig, axes = plt.subplots(1, 3, figsize=figsize)

for i,(centrality_color_list, centrality_name) in enumerate(zip(centrality_color_lists,centrality_names)):
    ax = axes[i]
    node_size = 80 * figsize[0] * figsize[1] / net.number_of_nodes()
    nx.draw(net, ax=ax, pos=pos, 
            node_size=node_size, node_color=centrality_color_list, edge_color='dimgray', edgecolors='k')
    ax.set_title(centrality_name)

In [None]:
figure_filename='Karate_club_centralities.pdf' # run this if you want to save the figure
fig.savefig(figure_filename)

### b. Correlations between centralities
Next, we'll have a look at how the various centrality measures correlate. We'll also include the node degree here. Modify the code below to get a list of node degrees and run it to produce a grid that contains scatterplots between all measures. Here, we'll use Pandas and Seaborn, which are both very handy tools for any budding data scientist.   

In [None]:

df=pd.DataFrame()

degree_list=[]
betweenness_list=[]
closeness_list=[]
eigenvector_list=[]

# TODO: fill the above lists with node degrees and centrality measures (for node in net: ...)
# use the three centrality dictionaries from above (the variable centralities is a list of three dictionaries, one for each centrality)

# YOUR CODE HERE
for node in net:
    degree_list.append(net.degree[node])
    betweenness_list.append(centralities[0][node])
    closeness_list.append(centralities[1][node])
    eigenvector_list.append(centralities[2][node])

df = pd.concat([df, pd.DataFrame({'degree': degree_list,'betweenness': betweenness_list,'closeness': closeness_list,'eigenvector': eigenvector_list})])
    
    # Plot pairwise relationships
grid = sns.PairGrid(df)
grid.map_offdiag(sns.scatterplot, alpha=0.6)
grid.add_legend()

In [None]:
# Save figure in the current directory
grid.figure.savefig('centrality_pairwise_scatter.pdf')