## Tutorial 4. Network Modularity: Quantitative History

Created by Emanuel Flores-Bautista 2018.  All code contained in this notebook is licensed under the [Creative Commons License 4.0](https://creativecommons.org/licenses/by/4.0/).

This tutorial can be accesed here: https://programminghistorian.org/lessons/exploring-and-analyzing-network-data-with-python

### The data set: the Quaker Society of Friends. 

> ##### Before there were Facebook friends, there was the Society of Friends, known as the Quakers. Founded in England in the mid-seventeenth century, the Quakers were Protestant Christians who dissented from the official Church of England and promoted broad religious toleration, preferring Christians’ supposed “inner light” and consciences to state-enforced orthodoxy. Quakers’ numbers grew rapidly in the mid- to late-seventeenth century and their members spread through the British Isles, Europe, and the New World colonies—especially Pennsylvania, founded by Quaker leader William Penn and the home of your four authors.

>##### Since scholars have long linked Quakers’ growth and endurance to the effectiveness of their networks, the data used in this tutorial is a list of names and relationships among the earliest seventeenth-century Quakers. This dataset is derived from the Oxford Dictionary of National Biography and from the ongoing work of the Six Degrees of Francis Bacon project, which is reconstructing the social networks of early modern Britain (1500-1700).

> ##### Each Quaker node also has a number of associated attributes including historical significance, gender, birth/death dates, and SDFB ID—a unique numerical identifier that will enable you to cross-reference nodes in this dataset with the original Six Degrees of Francis Bacon dataset, if desired. Here are the first few lines:

In [None]:
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx 
from operator import itemgetter
import TCD19_utils as TCD
TCD.set_plotting_style_2()
import community #Python Louvain package

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

In [None]:
x= pd.read_csv('../data/quakers_nodelist.csv')

In [None]:
x.head()

In [None]:
y= pd.read_csv('../data/quakers_edgelist.csv')

In [None]:
y.head()

In [None]:
net = nx.from_pandas_edgelist(y, source= 'Source', target = 'Target')

In [None]:
plt.figure(figsize =(9, 8.5))
nx.draw_circular(net,node_color="lightgreen", node_size=50, edge_size=20, edge_color="lightgrey",
                 with_labels= True, font_color="black", font_size= 8);

In [None]:
cc = nx.clustering(net)

In [None]:
cc_d= sorted(cc.items(), key= lambda cc: cc[1], reverse= True)[0:20]

In [None]:
net_degree_distribution= []

for i in list(net.degree()):
    net_degree_distribution.append(i[1])
    

In [None]:
sns.distplot(net_degree_distribution, color = 'lightgreen')
plt.xlabel('Degree')
plt.ylabel('PDF')

In [1]:
from TCD19_utils import net_stats

In [None]:
net_stats(net)

In [None]:
trn_lcc= max(nx.connected_component_subgraphs(net), key=len)

In [None]:
nx.average_shortest_path_length(trn_lcc)

### Hubs in the network

In [None]:
eigen_cen= nx.eigenvector_centrality(net)
eigen_cen= sorted(eigen_cen.items(), key= lambda cc: cc[1], reverse= True)[:10]
eigen_cen

We can see that the most central node in the network was George Fox. He may be in fact the most famous Quaker in history, do you find him familiar ? 

In [None]:
from IPython.display import Image

Image(url='http://www.abingtonmeeting.org/wp-content/uploads/2014/05/George_Fox.jpg')

In fact, all of these hubs of the network have an interesting history, you can Google them out to find out more. 

Let's look back at the data.

In [None]:
x.head()

Let's make our network richer and add some of our variables as attributes. First we'll make each column into a `pd.Series` object, and then turn it into a dictionary.

In [None]:
gender_dict= pd.Series(x['Gender'].values,index=x.Name).to_dict()
birth_dict= pd.Series(x['Birthdate'].values,index=x.Name).to_dict()
death_dict = pd.Series(x['Deathdate'].values,index=x.Name).to_dict()
id_dict = pd.Series(x['ID'].values,index=x.Name).to_dict()
his_sig= pd.Series(x['Historical Significance'].values,index=x.Name).to_dict()

Let's add our attributes to the network. 

In [None]:
nx.set_node_attributes(net, name = 'gender',values= gender_dict)
nx.set_node_attributes(net, name = 'birth',values= birth_dict)
nx.set_node_attributes(net, name = 'death',values= death_dict)
nx.set_node_attributes(net, name = 'id',values= id_dict)
nx.set_node_attributes(net, name = 'his_sig',values= his_sig)

Now we can easily scan our network using the ID, for example.

In [None]:
ID = nx.get_node_attributes(net, 'id')

In [None]:
ID['George Keith']

### Running the Louvain Clustering Algorithm 

The beauty of clustering networks in NetworkX with the community module, is that we can do it in one line of code. 

In [None]:
communities = community.best_partition(net)

In [None]:
communities

Let's add the cluster labels as an attribute

In [None]:
nx.set_node_attributes(net, values= communities, name='modularity')

In [None]:
eigenvector_dict = nx.eigenvector_centrality(net)
nx.set_node_attributes(net, values= eigenvector_dict, name= 'eigenvector')

In [None]:
sorted(eigenvector_dict.items(), key= lambda cc: cc[1], reverse= True)[:10]

In [None]:
net.neighbors('Alexander Parker')

Let's extract module zero.

In [None]:
cluster_0 = [n for n in net.nodes() if net.node[n]['modularity'] == 0]

Let's compute the eigenvector centrality for the Module 0 

In [None]:
class0_eigenvector = {n:net.node[n]['eigenvector'] for n in cluster_0}

In [None]:
class0_sorted_by_eigenvector = sorted(class0_eigenvector.items(), key=itemgetter(1), reverse=True)

In [None]:
print("Modularity Class 0 Sorted by Eigenvector Centrality:")
for node in class0_sorted_by_eigenvector[:5]:
    print("Name:", node[0], "| Eigenvector Centrality:", node[1])

### Other centrality measures.

Finally, let's take it home with other centrality measures. 

In [None]:
cc= nx.closeness_centrality(net)
closeCen= sorted(cc.items(), key= lambda cc: cc[1], reverse= True)[:10]
closeCen

In [None]:
bc= nx.betweenness_centrality(net)
betweeness_centrality = sorted(bc.items(), key= lambda cc: cc[1], reverse= True)[:10]
betweeness_centrality

In [None]:
dc= nx.degree_centrality(net)
betweeness_centrality = sorted(dc.items(), key= lambda cc: cc[1], reverse= True)[:10]
betweeness_centrality

### Conclusions

We can see that we can easily implement the Louvain clustering algorithm in Python. We also saw how to extract the clusters, and how to get some attributes from a particular module, for example, centrality measures. This has a ton of applications. What will you use it for ? 