## Tutorial 1. Network Science: difussion of microfinance at rural villages in India 

Created by Emanuel Flores-Bautista 2018.  All code contained in this notebook is licensed under the [Creative Commons License 4.0](https://creativecommons.org/licenses/by/4.0/).

This tutorial was extracted from the HarvardX Course Python for Research. You can find the course [here](https://www.edx.org/es/course/using-python-research-harvardx-ph526x-0). This tutorial uses the data from **A. Banerjee _et al._** (2013)The Diffusion of Microfinance. *Science* 26 Jul 2013: Vol. 341, Issue 6144, 1236498 [DOI: 10.1126/science.1236498](http://science.sciencemag.org/content/341/6144/1236498.long) You can download the paper [here](https://economics.mit.edu/files/7781).

In [None]:
##This Python Magic command allows graphs to be plotted in the notebook
%matplotlib inline
##This command sets the graphs format to svg
%config InlineBackend.figure_format = 'svg'

import numpy as np
import networkx as nx
import seaborn as sns 
import matplotlib.pyplot as plt
import TCD19_utils as TCD

TCD.set_plotting_style_2()

First, let's practice some NetworkX and look at the Erdos-Renyi graph.

In [None]:
net = nx.erdos_renyi_graph(150, 0.1)


In [None]:
type(net)

Let's visualize our network.

In [None]:
nx.draw_kamada_kawai(net, edge_color='lightgrey', node_color= 'lightgreen',
        line_color = 'black', edge_size = 200, node_size = 70)

The network degrees (or edges of each node) are stored in the `net.degree()` method.

In [None]:
net.degree()

In [None]:
type(net.degree())

We can see that it is a dictionary, so let's extract its values. Now we can plot a histogram of the degree distribution.

In [None]:
x= [] ## list storing degree distribution 

for i in list(net.degree()):
    x.append(i[1])


In [None]:
ax = sns.distplot(x,color= 'lightgreen')
ax.set_title('Degree Distribution')
ax.set_xlabel('degree (no. of connections)')
ax.set_ylabel('PDF')

We can see that the distribituion $\sim$N ($\mu$ = $np$, $\sigma^2$)

How would you do it using Matplotlib? find out and write the code.

In [None]:
##Write your code here 

### Small world networks: Barabási & Albert Model.

The Barabási Albert model is called by the preferential attachment

https://youtu.be/prjl7wYvX4g

In [None]:

n_nodes = 1000
m = 4 # number of edges to attach from

net = nx.barabasi_albert_graph(n_nodes, m) 

In [None]:
x= [] ## list storing degree distribution 

for i in list(net.degree()):
    x.append(i[1])

sns.kdeplot(x,color= 'lightgreen', shade = True)
plt.title('Barabaszi-Albert model', fontsize= 21)
plt.xlabel('Degree')
plt.ylabel('PDF');

## Analysis on the microfinance network.

Okay, let's start loading the adjacenct matrices corresponding to two villages. 

In [None]:
pwd

In [None]:
##Upload the adjacency matrixes

A1= np.loadtxt("../data/adj_allVillageRelationships_vilno_1.csv", delimiter=",")
A2= np.loadtxt("../data/adj_allVillageRelationships_vilno_2.csv", delimiter=",")

`nx.to_networkx_() method`

In [None]:
## convert the adjacency matrixes into graphs in one step
G1= nx.to_networkx_graph(A1)
G2= nx.to_networkx_graph(A2)

Let's compute the clustering coefficient. 

In [None]:
def net_stats(G):
    
    net_degree_distribution= []

    for i in list(G.degree()):
        net_degree_distribution.append(i[1])
        
    print("Number of nodes in the network: %d" %G.number_of_nodes())
    print("Number of edges in the network: %d" %G.number_of_edges())
    print("Avg node degree: %.2f" %np.mean(list(net_degree_distribution)))
    print('Avg clustering coefficient: %.2f'%nx.cluster.average_clustering(G))
    print('Network density: %.2f'%nx.density(G))

    
    fig, axes = plt.subplots(1,2, figsize=(10,4))
    
    axes[0].hist(list(net_degree_distribution), bins=20, color = 'lightgreen')
    axes[0].set_xlabel("Degree $k$")
    #axes[0].set_ylabel("$P(k)$")
    
    axes[1].hist(list(nx.clustering(G).values()), bins= 20, color = 'lightgrey')
    axes[1].set_xlabel("Clustering Coefficient $C$")
    #axes[1].set_ylabel("$P(k)$")
    axes[1].set_xlim([0,1])
          

In [None]:
net_stats(G1)

In [None]:
net_stats(G2)

We can see that our networks look like free-scale networks. Therefore there will be hubs in the network.

## Largest Conected Component (LCC)

Q: How large is the largest connected component in our graph? 

The `nx.connected_component_subgraphs()`method is a generator function.

In [None]:
gen= nx.connected_component_subgraphs(G1) 

## calling the actual component by next method

g= gen.__next__()

type(g)


We can ask the size of the network using the `len()` function.

In [None]:
len(g)

However if we calling the `next()`method, we will be generating the second to largest connected component, and so on.

In [None]:
g = gen.__next__()

In [None]:
len(g)

We can extract the more neatly LCC calling the `max()` function. And using the length `len` on the `key` argument.

In [None]:
g1_lcc= max(nx.connected_component_subgraphs(G1), key=len)

g2_lcc= max(nx.connected_component_subgraphs(G2), key=len)

g1_lcc


We can estimate the proportion of the LCC with respect to the original graph.

In [None]:
## proportion of the LCC with its graph
len(g2_lcc)/ len(G2)

In [None]:
len(g1_lcc)/ len(G1)

Just to practice, let's explore the eigenvector centrality in NetworkX. We'll go back to other centrality measures in the following lectures.

In [None]:
ec= nx.eigenvector_centrality(G1)
eigen_centrality = sorted(ec.items(), key= lambda ec: ec[1], reverse= True)[:10]
eigen_centrality

We see that we get the same hubs using different metrics. 

In [None]:
plt.figure()
nx.draw(g1_lcc, edge_color='lightgrey', node_color= 'lightgreen', node_size=13);