# Directed and Weighted Centrality

Specific measures can be applied to measure centrality or node importance in directed and/or weighted networks.

In [None]:
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
pd.set_option("display.precision", 3)

In this notebook we will use data from the widely-studied Enron email corpus, which is described [here](https://en.wikipedia.org/wiki/Enron_Corpus).

First we load the data and create a network representing the Enron internal email network. Each line contains a sender email addreess, a receipient email address, and the number of emails sent from the sender to the receipient.

It makes sense to represent this data as a directed network, with edge weights indicating the number of emails from sender to receipient.

In [None]:
fin = open("enron.edgelist","r")
lines = fin.readlines()
fin.close()

In [None]:
g = nx.DiGraph()
for line in lines:
    parts = line.strip().split("\t")
    num_emails = int(parts[2])
    g.add_edge( parts[0], parts[1], weight=num_emails )

In [None]:
g.number_of_nodes(), g.number_of_edges()

### Unweighted In-Degrees and Out-Degrees

First, we could look at the simple count of incoming edges - i.e. the **unweighted in-degree**. In this context, the number indicates the number of unique individuals from whom a person has received emails.

In [None]:
# get a dictionary of in-degree scores for all nodes
in_degrees = dict(g.in_degree())
in_degrees

Examine statistics and distribution of unweighted in-degree values:

In [None]:
indeg = pd.Series(in_degrees)
print('In-degree range: [%d, %d]' % (indeg.min(), indeg.max() ) )
print('Mean in-degree: %.2f' % indeg.mean() )
print('Median in-degree: %d' % indeg.median() )

In [None]:
ax = indeg.plot.hist(figsize=(12,6), fontsize=14, legend=None, color="darkred", bins=20, zorder=3)
ax.yaxis.grid()
ax.set_ylabel("Number of Nodes", fontsize=14)
ax.set_xlabel("Unweighted In-Degree", fontsize=14);

Get the top 10 nodes ranked by in-degree. Who received emails from the largest number of unique senders in the company?

In [None]:
indeg.sort_values(ascending=False).head(10)

Next, look at the simple count of outgoing edges - i.e. the **unweighted out-degree**. This correspond to the number indicates the number of unique individuals to whom each person has sent emails.

In [None]:
# get a dictionary of out-degree scores for all nodes
out_degrees = dict(g.out_degree())

We see that the lowest out-degree is 0 - i.e. these are nodes representing people who sent no emails during the time period covered this dataset.

In [None]:
outdeg = pd.Series(out_degrees)
print('In-degree range: [%d, %d]' % (outdeg.min(), outdeg.max() ) )
print('Mean in-degree: %.2f' % outdeg.mean() )
print('Median in-degree: %d' % outdeg.median() )

In [None]:
ax = outdeg.plot.hist(figsize=(12,6), fontsize=14, legend=None, color="darkgreen", bins=20, zorder=3)
ax.yaxis.grid()
ax.set_ylabel("Number of Nodes", fontsize=14)
ax.set_xlabel("Unweighted Out-Degree", fontsize=14);

Get the top 10 nodes ranked by in-degree. Who sent emails to the largest number of unique addresses?

In [None]:
outdeg.sort_values(ascending=False).head(10)

### Weighted In-Degrees and Out-Degrees

So far we have not considered the number of emails sent between each employee, just the number of edges. We can also look at weighted degrees, which are based on email counts in this case. 

Calculate the weighted in-degree - i.e. the total number of e-mails received by each employee:

In [None]:
# get a dictionary of in-degree scores for all nodes, using values from the 'weight' attribute
win_degrees = dict(g.in_degree(weight="weight"))

We can see from the statistics of these scores that the range of values is much larger when we take into account weights:

In [None]:
windeg = pd.Series(win_degrees)
print('Weighted in-degree range: [%d, %d]' % (windeg.min(), windeg.max() ) )
print('Mean weighted in-degree: %.2f' % windeg.mean() )
print('Median weighted in-degree: %d' % windeg.median() )

In [None]:
ax = windeg.plot.hist(figsize=(12,6), fontsize=14, legend=None, color="darkred", bins=20, zorder=3)
ax.yaxis.grid()
ax.set_ylabel("Number of Nodes", fontsize=14)
ax.set_xlabel("Weighted In-Degree", fontsize=14);

Which employees received the most emails in the company during this time period?

In [None]:
windeg.sort_values(ascending=False).head(10)

An analgous measure, **weighted out-degree** is based on the number of edge for a node, but ponderated by the weigtht of each edge. Here this corresponds to the total number of emails sent by each person.

In [None]:
# get a dictionary of weighted out-degree scores for all nodes
wout_degrees = dict(g.out_degree(weight="weight"))

In [None]:
woutdeg = pd.Series(wout_degrees)
print('Weighted in-degree range: [%d, %d]' % (woutdeg.min(), woutdeg.max() ) )
print('Mean weighted in-degree: %.2f' % woutdeg.mean() )
print('Median weighted in-degree: %d' % woutdeg.median() )

In [None]:
ax = woutdeg.plot.hist(figsize=(12,6), fontsize=14, legend=None, color="darkgreen", bins=20, zorder=3)
ax.yaxis.grid()
ax.set_ylabel("Number of Nodes", fontsize=14)
ax.set_xlabel("Weighted Out-Degree", fontsize=14);

From the distribution plot and the node rankings, we see that one user accounts for 38% of all emails sent:

In [None]:
woutdeg.sort_values(ascending=False).head(10)

In [None]:
# calculate percentage from this user
100.0 * (woutdeg["pete.davis@enron.com"]/woutdeg.sum())

If we remove this individual from the series, we can get a clearer view of the distribution of out-degree scores for the rest of the employees.

In [None]:
woutdeg2 = woutdeg.drop("pete.davis@enron.com")
ax = woutdeg2.plot.hist(figsize=(12,6), fontsize=14, legend=None, color="darkgreen", bins=20, zorder=3)
ax.yaxis.grid()
ax.set_ylabel("Number of Nodes", fontsize=14)
ax.set_xlabel("Weighted Out-Degree", fontsize=14);

### Weighted Centrality Measures

NetworkX contains implementations of various centrality measures which can take into account edge weights.

For example, we can compute **weighted eigenvector centrality**, which takes into account the strength of connections on edges, by specifying the attribute to use for weights:

In [None]:
w_eigs = dict(nx.eigenvector_centrality(g, weight="weight"))
# convert dictionary to a Series
weig = pd.Series(w_eigs)
# get top 20
weig.sort_values(ascending=False).head(20)

Similarly, we can calculate **weighted betweeneness centrality**, where shortest paths are computed considering edge weights.

In [None]:
w_bets = dict(nx.betweenness_centrality(g, weight="weight"))
# convert dictionary to a Series
wbet = pd.Series(w_bets)
# get top 20
wbet.sort_values(ascending=False).head(20)