### CS4423 - Networks
Prof. Götz Pfeiffer<br />
School of Mathematics, Statistics and Applied Mathematics<br />
NUI Galway

#### 3. Centrality Measures

# Lecture 9: Closeness and Betweenness  Centrality

We study two more centrality measures
* **closeness** centrality and
* **betweenness** centrality

and compare these to the centralities studied earlier in the
example of the marital ties graph of the Florentine families.

Import the packages (including a `Queue` for BFS) and set standard drawing options:

In [None]:
import networkx as nx
import pandas as pd
from queue import Queue
opts = { "with_labels": True, "node_color": 'y'}

Next, recover the graph `G` of marital ties between Florentine families, together with the node attributes we have already determined.

In [None]:
G = nx.read_yaml("data/florentine.yml")
nx.draw(G, **opts)

In [None]:
G.nodes['Medici']

In [None]:
G.number_of_nodes()

## Closeness Centrality

A node $x$ in a network can be regarded as being central, if it is **close** to (many) other nodes, 
as it can then quickly interact with them.  A simple way to measure closeness in this sense
is based on the sum of all the distances to the other nodes, as follows.

<div class="alert alert-danger">

**Definition (Closeness Centrality).**
In a simple, **connected** graph $G$, the **closeness centrality** $c_i^C$ of node $i$
is defined as
$$
c_i^C = \Bigl(\sum_j d_{ij}\Bigr)^{-1}.
$$

The **normalized closeness centrality** of node $i$, defined as
$$
C_i^C = (n-1) c_i^C
$$
takes values in the interval $[0, 1]$.
</div>

How to compute $c_i^C$? 

**BFS again.**  

* The following `python` function implements
BFS for shortest distance from a previous lecture.  
* It takes a graph $G = (X, E)$ and a vertex $x \in X$
as its arguments. 
* It returns a **dictionary**, which assigns to each node its distance to $x$.

In [None]:
def distances(G, x):
    
    # 1. init: set up the dictionary and a queue
    dists = { y: None for y in G }
    Q = Queue()
    dists[x] = 0
    Q.put(x)
    
    # 2. loop
    while not Q.empty():
        y = Q.get()
        for z in G.neighbors(y):
            if dists[z] is None:
                dists[z] = dists[y] + 1
                Q.put(z)
    
    # 3. stop here
    return dists

* For example, the distance from node `'Medici'` to all nodes in `G`:

In [None]:
d = distances(G, 'Medici')
print(d)

* The isolated node `'Pucci'`:

In [None]:
d = distances(G, 'Pucci')
print(d)

* If the sum of the distances is $0$ (why?), computing the closeness will most likely
cause a **division-by-zero error**.

* So let's remove the isolated node from `G` and update `n`, the number of nodes.

In [None]:
G.remove_node('Pucci')
n = G.number_of_nodes()
n

In [None]:
nx.draw(G, **opts)

* Use `distances` to compute the normalized closeness centrality according to the above
definition.

In [None]:
closeness = { x : (n-1)/sum(distances(G, x).values()) for x in G }
closeness

* Compare the results to the `networkx` version of closeness:

In [None]:
nx.closeness_centrality(G)

In [None]:
nx.closeness_centrality(G) == closeness

* Let's add those measurements to the table.

In [None]:
nx.set_node_attributes(G, closeness, '$C_i^C$')

In [None]:
pd.DataFrame.from_dict(
    dict(G.nodes(data=True)), 
    orient='index'
).sort_values('degree', ascending=False)

## Betweenness Centrality

When interactions between non-adjacent agents in a network depend
on middle men (sitting on shortest paths between these agents), **power comes
to those in the middle**.  Betweenness centrality measures centrality
in terms of the number of shortest paths a node lies on.

<div class="alert alert-danger">
    
**Definition (Betweenness Centrality).**
In a simple, connected graph $G$, the **betweenness centrality** $c_i^B$ of node $i$
is defined as
$$
c_i^B = \sum_{j \neq i} \sum_{k \neq i} \frac{n_{jk}(i)}{n_{jk}},
$$
where $n_{jk}$ denotes the **number** of shortest paths from
node $j$ to node $k$, and where $n_{jk}(i)$ denotes the
number of those shortest paths **passing through** node $i$.

The **normalized betweenness centrality** of node $i$, defined as
$$
C_i^B = \frac{c_i^B}{(n-1)(n-2)}
$$
takes values in the interval $[0, 1]$.
</div>

* Note that $(n-1)(n-2)$ is the largest number of shortest paths beween pairs of distinct nodes that a given node could possibly sit on.

* How to compute $C_i^B$?

**BFS once more.**  This time as a python function, which returns a **dictionary** that contains, for each node $y$, a list of **immediate predecessors** of $y$
in a shortest path from $x$ to $y$.  We have already seen that this another piece of information that BFS can determine
on the fly: when constructing a **spanning tree** while traversing a graph, we need to remember **one**
immediate predecessor for each newly discovered node.  Here we determine and remember **all** immediate
predecessors, requiring little if no extra work.

From this list of predecessors, one can then recursively reconstruct **all shortest paths** from $x$ to $y$.
We still need to keep track of the shortest path lengths in order to decide which neighbor $x$
actually is a predecessor of $y$: .

In [None]:
def predecessors(G, x):
    
    # 1. init: set up the two dictionaries and queue
    dists = { y: None for y in G }
    preds = { y: [] for y in G }
    Q = Queue()
    dists[x] = 0
    Q.put(x)
    
    # 2. loop
    while not Q.empty():
        y = Q.get()
        for z in G.neighbors(y):
            if dists[z] is None:
                dists[z] = dists[y] + 1
                preds[z].append(y)
                Q.put(z)
            elif dists[z] > dists[y]:
                preds[z].append(y)
    
    # 3. stop here
    return preds

In [None]:
p = predecessors(G, 'Medici')
p

In [None]:
nx.draw(G, **opts)

Using the **predecessor lists** with respect to $x$, the **shortest paths** from $x$ to $y$ can be enumerated recursively:
* if $y = x$: the shortest path from $x$ to itself is the empty path starting an ending at $x$.
* else, if $y \neq x$ then each shortest path from $x$ to $y$ travels through
  exactly one of $y$'s predecessors ... and ends in $y$.

So, in formulas, $S_x(x) = \{(x)\}$ and
$$
S_x(y) = \{ p + (y) : p \in S_x(z),\, z \in \mathrm{pre}_x(y)\}
$$

In [None]:
def shortest_paths(x, y, pre):
    if x == y:
        return [[x]]
    paths = []
    for z in pre[y]:
        for path in shortest_paths(x, z, pre):
            paths.append(path + [y])
    return paths

In [None]:
def spaths(x, y, pre):
    if x == y:
        return [[x]]
    return [ p + [y] for p in spaths(x, z, pre) for z in pre[y] ]

In [None]:
shortest_paths('Medici', 'Bischeri', p)

* Now compute betweenness:

In [None]:
betweenness = { x : 0.0 for x in G }

In [None]:
for x in G:
    pre = predecessors(G, x)
    for y in G:
        paths = shortest_paths(x, y, pre)
        njk = len(paths)*(n-1)*(n-2)  # normalize
        for p in paths:
            for z in p[1:-1]:  # exclude endpoints
                betweenness[z] += 1/njk

In [None]:
betweenness

* Compare the results to the `networkx` version of betweenness:

In [None]:
nx.betweenness_centrality(G)

In [None]:
nx.betweenness_centrality(G) == betweenness

In [None]:
nx.draw(G, **opts)

* Finally, let's add the normalized betweenness centralities as attributes to the
nodes of the graph, and display the resulting table.

In [None]:
nx.set_node_attributes(G, betweenness, '$C_i^B$')

In [None]:
pd.DataFrame.from_dict(
    dict(G.nodes(data=True)), 
    orient='index'
).sort_values('degree', ascending=False)

##  Summary

There are many different ways to be important.  As a node in a network, you are important if
* you have **many friends** (degree centrality)
* you have **important friends** (eigenvalue centrality)
* you are **close** to many (closeness centrality)
* many interactions **pass through** you (betweenness centralty).

##  Code Corner

### `networkx`

* `read_yaml`: [[doc]](https://networkx.github.io/documentation/stable/reference/readwrite/generated/networkx.readwrite.nx_yaml.read_yaml.html#networkx.readwrite.nx_yaml.read_yaml)


* `closeness_centrality`: [[doc]](https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.closeness_centrality.html#networkx.algorithms.centrality.closeness_centrality)
   
    
* `betweenness_centrality`: [[doc]](https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html#networkx.algorithms.centrality.betweenness_centrality)

## Exercises

1. Recall that $C_i^C$ is the normalized closeness centrality of node $i$.  Why
   is $0 \leq C_i^C \leq 1$?  When is $C_i^C = 1$?  Is $C_i^C$ ever $0$?

2. Recall that $C_i^B$ is the normalized betweenness centrality of node $i$.
   Why is $0 \leq C_i^B \leq 1$?  When is $C_i^B = 1$?  Is $C_i^B$ ever $0$?
   
3. Determine the closeness centrality and the betweenness centrality of the nodes in some
   random trees.  What do you observe?
   
3. Compute the closeness centrality and the betweenness centrality of the nodes of the Petersen graph.
   What do you observe?