# Betweeness Centrality

**Assumption** important nodes connect other nodes

Recall: the distance between two nodes is the length of the shortest path between them


Example: The distance between nodes 34 and 2 is 2:
Path 1: 34-31-2
Path 2: 34-14-2
Path 3: 34-20-2

Nodes 31, 14, and 20 are in a shortest path of between nodes 34 and 2

$$C_{btw}(v)=\sum_{s,t \in N}\frac{\sigma_{s,t}(v)}{\sigma_{s,t}}$$

* $\sigma_{s,t}$ is going to be the number of shortest paths between nodes s, t
* ${\sigma_{s,t}(v)}$ is how many of those shortest paths actually contain node v between nodes s, t

![](images/3-2.2.png)

**Endpoints:** we can wither include or exclude node $v$ as node $s$ and $t$ in the computation of $C_{btw}(v)$

* If we exclude node $v$ in this case node B

![](images/3-2.3.png)

$C_{btw}(B) = \frac{\sigma_{A,D}(B)}{\sigma_{A,D}} + \frac{\sigma_{A,C}(B)}{\sigma_{A,C}} + \frac{\sigma_{C,D}(B)}{\sigma_{C,D}} = \frac{1}{1} + \frac{1}{1} + \frac{0}{1} =2 $

![](images/3-2.4.png)

The number of shortest path between A and D is 1, and the shortest path contains 1 B

![](images/3-2.5.png)

![](images/3-2.6.png)

* If we include node $v$ in this case node B

![](images/3-2.7.png)

![](images/3-2.8.png)


## Disconnected Nodes

* **Assumption:** important nodes connect other nodes

$C_{btw}(v) = \sum_{s,t\in N}\frac{\sigma_{s,t}(v)}{\sigma_{s,t}}$

What if not all nodes can reach each other?

![](images/3-2.9.png)

Node D cannot be reached by any other node in the above example, hence, $\sigma_{A,D} =0 $, making the above definition undefined.

**Example**: What is the betweenness centrality of node B, without including it as endpoint?


![](images/3-2.10.png)


![](images/3-2.11.png)

### Normalization

* **Assumption:** important nodes connect other nodes.


$C_{btw}(v) = \sum_{s,t\in N}\frac{\sigma_{s,t}(v)}{\sigma_{s,t}}$

* **Normalization**: betwenness centrality values will be larger in graphs with many nodes. To control for this, we divide centrality values by the number of pairs of nodes in the graph (excluding v):

* $\frac{1}{2}(|N|-1)(|N|-2)$ in undirected graphs

* $(|N|-1)(|N|-2)$ in directed graphs


![](images/3-2.12.png)



In [6]:
import networkx as nx

G=nx.karate_club_graph()
G=nx.convert_node_labels_to_integers(G,first_label=1)
btwnCent=nx.betweenness_centrality(G, normalized=True, endpoints=False)

import operator

sorted(btwnCent.items(),key=operator.itemgetter(1),reverse=True)[0:5]

[(1, 0.43763528138528146),
 (34, 0.30407497594997596),
 (33, 0.145247113997114),
 (3, 0.14365680615680618),
 (32, 0.13827561327561325)]

### Betweeness Centrality is computationaly complex

Computing betweenness centraliy of all nodes can be very computationally expensive.

Depending on the algorithm, this computation can take up to $0(|N|^3)$ time.

![](images/3-2.13.png)

We can do **approximation** rather can computing betweenness centrality based on all pairs of nodes 2, t, we can approximate it based on a sample of nodes

In [8]:
btwncent_approx = nx.betweenness_centrality(G, normalized=True, endpoints = False, k=10)
#k is how many nodes yoy use for approximation

sorted(btwncent_approx.items(),key=operator.itemgetter(1),reverse=True)[0:5]

[(1, 0.43444444444444436),
 (34, 0.3526871392496393),
 (32, 0.24312545093795096),
 (33, 0.19847132034632037),
 (3, 0.1410150613275613)]

We can also use **subset**

![](images/3-2.14.png)

In [10]:
btwnCent_subset = nx.betweenness_centrality_subset(G, [34,33,21,30,16,27,15,23,10],[1,4,13,11,6,12,17,7], normalized=True)

sorted(btwnCent_subset.items(),key=operator.itemgetter(1),reverse=True)[0:5]

[(1, 0.04899515993265994),
 (34, 0.028807419432419434),
 (3, 0.018368205868205867),
 (33, 0.01664712602212602),
 (9, 0.014519450456950456)]

We can also use **edges**.

We can use betweenness centrality to find important edges instead of nodes:

$C_{btw}(e) = \sum_{s,t\in N}\frac{\sigma_{s,t}(e)}{\sigma_{s,t}}$, where 

$\sigma_{s,t}$ = the number of shortest paths between nodes s and t.

$\sigma_{s,t}(e)$ = the number shortest paths between nodes s and t that pass through edge e.

In [11]:
btenCent_edge = nx.edge_betweenness_centrality(G, normalized=True)
sorted(btenCent_edge.items(),key=operator.itemgetter(1),reverse=True)[0:5]

[((1, 32), 0.1272599949070537),
 ((1, 7), 0.07813428401663695),
 ((1, 6), 0.07813428401663694),
 ((1, 3), 0.0777876807288572),
 ((1, 9), 0.07423959482783014)]

In [12]:
btwnCent_edge_subset = nx.edge_betweenness_centrality_subset(G, [34,33,21,30,16,27,15,23,10],[1,4,13,11,6,12,17,7], normalized=True)

sorted(btwnCent_edge_subset.items(),key=operator.itemgetter(1),reverse=True)[0:5]

[((1, 9), 0.01366536513595337),
 ((1, 32), 0.01366536513595337),
 ((14, 34), 0.012207509266332794),
 ((1, 3), 0.01211343123107829),
 ((1, 6), 0.012032085561497326)]

![](images/3-2.15.png)