In [77]:
import networkx as nx

# **Node Importance**
There are different ways of thinking about "importance":

* High degree -> number of connections
* High average proximity to other nodes
* High fraction of shortest paths that pass trough node

**Centrality Measures** identify the most important nodes in a network:

* Influential nodes in a social network.
* Nodes that disseminate information to many nodes or prevent epidemics
* Hubs a transportation network
* Important pages on Web
* Nodes that prevent the network from braking up

## **Degree Centrality**
**Assumption**: Important nodes have many connections -> number of connections.

#### **Undirected Networks**
Use degree

$C_{deg}(v) = {{d_v}\over{|N|-1}}$

where $N$ is the set of nodes in the network and $d_v$ is the degree of node $v$

In [78]:
G = nx.karate_club_graph()
G = nx.convert_node_labels_to_integers(G, first_label=1)
degCent = nx.degree_centrality(G)

# Highest degree centrality node and value
max_node = max(degCent, key=degCent.get)
max_node, degCent[max_node]

(34, 0.5151515151515151)

#### **Directed Networks** 

<img src='../assets/directed_graph.png' width=300px>

##### **In-degree centrality**
$C_{deg}(v) = {{d_{v}^{in}}\over{|N|-1}}$

where $N$ is the set of nodes in the network and $d_{v}^{in}$ is the in -degree of node $v$

In [79]:
D = nx.read_adjlist(
    '../assets/directed_graph.txt', 
    nodetype=str,
    create_using=nx.DiGraph()
)

# Highest in-degree centrality node and value
inDegCent = nx.in_degree_centrality(D)
max_node = max(inDegCent, key=inDegCent.get)
max_node, inDegCent[max_node]

('E', 0.21428571428571427)

##### **Out-degree centrality**

$C_{deg}(v) = {{d_{v}^{out}}\over{|N|-1}}$

where $N$ is the set of nodes in the network and $d_{v}^{in}$ is the out-degree of node $v$

In [80]:
# Highest out-degree centrality node and value
outDegCent = nx.out_degree_centrality(D)
max_node = max(outDegCent, key=outDegCent.get)
max_node, outDegCent[max_node]

('I', 0.2857142857142857)

## **Closeness Centrality**
**Assumption**: Important nodes are close to other nodes.

#### **Undirected Networks**

${C_{close}(v)} = {{|N|-1}\over{\sum_{u \in N\\\{v\}}{d(u, v)}}}$

where $N$ is the set of nodes in the network and $d(u, v)$ is the length of shortest path from $v$ to $u$

In [81]:
closeCent = nx.closeness_centrality(G)

# Highest closeness centrality node and value
max_node = max(closeCent, key=closeCent.get)
max_node, closeCent[max_node]

(1, 0.5689655172413793)

In [82]:
(len(G.nodes()) - 1)/sum(nx.shortest_path_length(G, max_node).values())

0.5689655172413793

#### **Directed Networks**
How to measure the closeness centrality of a node when it cannot reach all other nodes?
<img src='../assets/directed_graph.png' width=300px>

##### 1 - **Consider only nodes that the node $L$ can reach**

${C_{close}(L)} = {{R(L)}\over{\sum_{u \in R(L)}{d(L, u)}}}$

where $R(L)$ is the set of nodes $L$ can reach

$C_{close}(L) = {1\over1} = 1$, since $L$ can only reach $M$ and it has a shortest path of length 1

##### 2 - **Consider only nodes that the node $L$ can reach and normalize by the fraction of nodes $L$ can reach**

${C_{close}(L)} = [{{|R(L)|}\over{|N-1|}}]{{R(L)}\over{\sum_{u \in R(L)}{d(L, u)}}}$

where $R(L)$ is the set of nodes $L$ can reach

$C_{close}(L) = [{1\over14}]{1\over1} = 0.071$

In [89]:
closeCent = nx.closeness_centrality(D)

# Highest closeness centrality node and value
max_node = max(closeCent, key=closeCent.get)
max_node, closeCent[max_node]

('L', 0.32625482625482627)