<img src="pics/Evans_Logo.png" width="500">

# Spring 2015

# Computational Tools for Public Management and Policy Making: 

## *Introduction to Social Network Analysis in Python*

### *José Manuel Magallanes Ph.D *
**email: magajm@uw.edu**

* Visiting Professor at Evans School of Public Policy and Governance and Senior Data Science Fellow at eScience Institute, University of Washington, Seattle
* Professor of Political Science and Public Policy Methodology, Pontificia Universidad Católica del Perú, Lima


Plan for this session:
1. [Importing Data and Building a Network](#part1) 
2. [Exploring Network, agents and groups](#part2) 
3. [Exporting the Network](#part3) 

<a id='part1'></a>
## 1. Importing Data

You need to be aware of the format your data has when you need to import it. If you have files already formatted as a network this step is less important. But if you are receiving file from where you need to create the network, the job can be difficult if you are not aware of some simple details shared below.

1.1 **Importing Edges List:**

An *edge list* is common way to gather information on a network, its format is shown below:

<img src="pics/edgelist.png" width="500">

This data is about cosponsorship. The data is connecting legislators that presented a bill together. 

We will use Pandas to get the data:

In [None]:
#name and location of file:
fileEdges='data/cosponsorshipEdges.csv'

# This reads the CSV file. Not a network yet.
import pandas
EdgesAsDF=pandas.read_csv(fileEdges)

We have a data frame in Python, we will use that information to build the network.
To build a network, we will use **Networkx**:

In [None]:
#importing and giving alias:
import networkx as net

attributesToInclude=['weight','status']
NWfromEdges=net.from_pandas_dataframe(EdgesAsDF, 'to', 'from',attributesToInclude)

In [None]:
# Here you can visualize your import:

# very important
%matplotlib inline 

net.draw(NWfromEdges)

In [None]:
# How many nodes:
len(NWfromEdges.nodes())

In [None]:
# How many nodes:
len(NWfromEdges.edges())

1.2 **Importing from Adjacency Matrix:**

The data below represents the people in the board of most important companies of Peru. The format tells you if they are in the same company board. It doees not represent counts.
<img src="pics/matrix.png" width="900">

As you see, this is a matrix where 1 indicates two people are connected (both are part of a compaby board), and 0 otherwise. 

First, get the data:

In [None]:
# Getting the matrix (edges):
EdgesAsDF = pandas.read_csv('data/dataFigueroa.csv', index_col=0) # col 0 (first is index)

In [None]:
EdgesAsDF.shape # square?

As we have more columns than rows, there might be some extra info in one of the columns. Let's see:

In [None]:
EdgesAsDF.columns[-1:-5:-1] # start:end:increment

The adjacency matrix needs not to include the  column ("Multinacional") in the data frame:

In [None]:
varsToDrop=['Multinacional']
adjacency=EdgesAsDF.drop(varsToDrop,axis=1) 

To create the network, we first use the numeric values, then add the names to the nodes.

In [None]:
#These are the node names
nodeLabels=adjacency.index.tolist()

In [None]:
# let's save them
nodeLabels=list(adjacency)

In [None]:
# now create the network
NWfromMatrix = net.Graph(adjacency.values)  #adjacency.values has no names in the nodes
net.draw_random(NWfromMatrix,with_labels=True)

Relabelling is easy, if we understand the node structure:

In [None]:
NWfromMatrix.nodes(data=True)

The nodes are numbers. According to Networkx, we should use the function **relabel**. This function needs that you create a dictionary, where the **key is  the current node label**, and the **value is the new label**:
<br>
* <font color='blue'>**NWfromMatrix = net.relabel_nodes(NWfromMatrix, mapping)**</font>

Using the command aboce will make the network ready, but let's see how we get the "mapping".

In [None]:
# so we need a dict like this:
changingLabels={0:"John",1:"Tania"}
changingLabels

In [None]:
# we can NOT do this by hand!
# Let's think about an strategy:

oldNames=[0,1]
newNames=["John","Tania"]
zip(oldNames,newNames) #list of tuples:

In [None]:
#can I convert the above to a dict?
dict(zip(oldNames,newNames))

My mapping strategy of old-new names is clear now:

In [None]:
size=len(NWfromMatrix.nodes())
mapping=dict(zip(range(size), nodeLabels))

In [None]:
#take a look:
mapping

In [None]:
# Finally add labels to nodes (relabelling):
NWfromMatrix = net.relabel_nodes(NWfromMatrix, mapping)

In [None]:
net.draw_random(NWfromMatrix,with_labels=True)

**1.3 Adjacency List**

Here is an adjacency list:
<img src="pics/adjacency.png" width="900">

In an adjacency list, only the fisrt value in a row is linked to the other ones in the same row. For example, the third row says that Eritrea had had conflicts with Ethiopia and Djibouti. It does not mean the conflict was with both of them at the same time, neither that Ethiopia had a conflict with Djibouti.

Importing is in simple steps:

In [None]:
NWfromAdjList=net.read_adjlist("data/warsAdjlist.csv",delimiter=',') # no pandas this time.
net.draw_circular(NWfromAdjList)

## 2. Exploration <a id='part2'></a>

Using the data on Peruvian elites we were usign a while ago:
<img src="pics/elites.png" width="900">

I used this data previously, when importing the adjacency matrix. Now it is time to explore it.

### 2.0 A previous step

Remember we had an attribute we dropped, but now it is time to use it.

In [None]:
# The adjacency matrix did not include the nodes attributes.
EdgesAsDF['Multinacional'].head()

So the network does not have that information:

In [None]:
NWfromMatrix.nodes(data=True)[:5]

Networkx has the fucntion **set_node_attributes** to take care of that, but needs that we prepare the attribute as a **mapping** using a dict, as we did before to relabel the nodes:
* We have the node values here: **EdgesAsDF.index**
* We have the attribute here: **EdgesAsDF['Multinacional']**

So, let's **zip** them into a dict!


In [None]:
dict(zip(EdgesAsDF.index,EdgesAsDF['Multinacional']))

As this procedure worked well, let's save it into an object:

In [None]:
EdgesAsDF['Names'] = EdgesAsDF.index

In [None]:
attributeToAdd=dict(zip(EdgesAsDF.index,EdgesAsDF['Multinacional']))

In [None]:
net.set_node_attributes(NWfromMatrix, 'Multinational', attributeToAdd)

It should have worked:

In [None]:
NWfromMatrix.nodes(data=True)

Now we are ready to explore the network as a whole, its communities, and the nodes.

### 2. 1 Exploring the NETWORK

Is this network of businessmen **connected**?

If not connected, it means there are sub groups that do not interact with each other.

In [None]:
net.is_connected(NWfromMatrix)

In [None]:
net.number_connected_components(NWfromMatrix)

In [None]:
# let's compute: net.connected_components(NWfromMatrix)
# and see those elements:

for c in net.connected_components(NWfromMatrix):
    print c, '\n'

Visuals can help:

In [None]:
import matplotlib.pyplot as plt

totalColors=net.number_connected_components(NWfromMatrix)

colorsSelected = plt.get_cmap('Paired',totalColors)

c = net.number_connected_components(NWfromMatrix)
pos=net.spring_layout(NWfromMatrix,k=0.1)
C = net.connected_component_subgraphs(NWfromMatrix)
i=0
for g in C:
    net.draw(g,pos,node_color=colorsSelected(i))
    i+=1

As this context does not have ONE connected networkm but several components, we can pay attention to the Giant Component:

In [None]:
NWfromMatrix_giant = max(net.connected_component_subgraphs(NWfromMatrix), key=len)

In [None]:
#take a look:
net.draw(NWfromMatrix_giant)

**Knowing this network (Giant Component)**

In [None]:
#number of nodes
len(NWfromMatrix_giant.nodes())

In [None]:
#number of edges
len(NWfromMatrix_giant.edges())

In [None]:
# Density: 
#from 0 to 1, where 1 makes it a 'complete' network: there is a link between every pair of nodes.
net.density(NWfromMatrix_giant) 

In [None]:
# Clustering coefficient of a node is away to measure if my own connections are connected among them.
# The average clustering coefficiente tells you the average of those values.
net.average_clustering(NWfromMatrix_giant)

In [None]:
# Shorter path (average)
# shows the average number of steps it takes to get from one node to another.

net.average_shortest_path_length(NWfromMatrix_giant)

* **Random networks** have *small shortest path* and *small clustering coefficient*...Is this the case?
* The high clustering coefficient would suggest a **small world**, as most nodes are not neighbors of one another, but most nodes can be reached from every other in few steps.

In [None]:
# Transitivity

# How probable is that two business men with a common business friend, are also friends.
net.transitivity(NWfromMatrix_giant)

In [None]:
# Assortativity (degree)
# A measure to see if nodes are connecting to other nodes similar in degree.  
# closer to 1 means higher assortativity, closer to -1 diassortativity; while 0 no assortitivity.
net.degree_assortativity_coefficient(NWfromMatrix_giant)

In [None]:
# you can also compute assortativity using an attribute of interest.
net.attribute_assortativity_coefficient(NWfromMatrix_giant,'Multinational')

More plotting:

In [None]:
# coloring the nodes by attribute:
color_map = plt.get_cmap("cool")  # color palette
valuesForColors=[n[1]['Multinational'] for n in NWfromMatrix_giant.nodes(data=True)]
net.draw(NWfromMatrix_giant,cmap=color_map,node_color=valuesForColors)

### 2.2 Exploration of network communities

A **clique** can be understood a community of nodes that are well connected.

In [None]:
# How many cliques
net.graph_number_of_cliques(NWfromMatrix_giant)

In [None]:
for c in net.enumerate_all_cliques(NWfromMatrix_giant):
    print c

In [None]:
# the number of nodes in the biggeest cliques
max([len(c) for c in net.enumerate_all_cliques(NWfromMatrix_giant)])

In [None]:
# which are:
[c for c in net.enumerate_all_cliques(NWfromMatrix_giant) if len(c) == 8]

In [None]:
# COMMUNITY DETECTION (set of nodes densely connected internally)

# based on: https://perso.uclouvain.be/vincent.blondel/research/louvain.html
# pip install python-louvain

import community 
parts = community.best_partition(NWfromMatrix_giant)
parts

'parts' is a dictionary, so we can use it to add an attribute:

In [None]:
net.set_node_attributes(NWfromMatrix_giant, 'community', parts)

In [None]:
pos=net.spring_layout(NWfromMatrix, k=0.2) 

valuesForColors=[n[1]['community'] for n in NWfromMatrix_giant.nodes(data=True)]

## To control size of plot:
# import matplotlib.pyplot as plot
# plot.figure(figsize=(8,8))

plt.axis("off")
net.draw_networkx_nodes(NWfromMatrix_giant,pos,cmap = plt.get_cmap("cool"), node_color = valuesForColors, 
                  node_size = 50, with_labels = False)

# edges
net.draw_networkx_edges(NWfromMatrix_giant,pos,width=1.0,alpha=0.2)
plt.show()

### 2.3 Node level exploration

In [None]:
#Central nodes: degree

from operator import itemgetter
NodeDegree=sorted(NWfromMatrix_giant.degree().items(), key=itemgetter(1),reverse=True)
NodeDegree[:5]

In [None]:
# Ego network of Hub?
HubNode,HubDegree=NodeDegree[0]
HubEgonet=net.ego_graph(NWfromMatrix_giant,HubNode)
pos=net.spring_layout(HubEgonet)
net.draw(HubEgonet,pos,node_color='b',node_size=800,with_labels=True, alpha=0.5,node_shape='^')
net.draw_networkx_nodes(HubEgonet,pos,nodelist=[HubNode],node_size=2000,node_color='r')
plt.show()

In [None]:
# minimum number of nodes that must be removed to disconnect the network?
netalg.node_connectivity(NWfromMatrix_giant)

In [None]:
#who can break the network?
list(net.articulation_points(NWfromMatrix_giant))

In [None]:
# Ego network of articulation node?
pos=net.spring_layout(NWfromMatrix_giant,k=0.5)
net.draw(NWfromMatrix_giant,pos,node_color='b',node_size=800,with_labels=True, alpha=0.5,node_shape='^')
net.draw_networkx_nodes(NWfromMatrix_giant,pos,nodelist=['Bentin'],node_size=2000,node_color='r')
plt.show()

In [None]:
# Computing centrality measures:
degr=net.degree_centrality(NWfromMatrix_giant)  # based on connections count
clos=net.closeness_centrality(NWfromMatrix_giant) # "speed" to access the rest
betw=net.betweenness_centrality(NWfromMatrix_giant) # "control flow" among the network nodes
eige=net.eigenvector_centrality(NWfromMatrix_giant) # central nodes connected to central nodes (influential?)

In [None]:
# measures into a data frame:
Centrality=[ [rich, degr[rich],clos[rich],betw[rich],eige[rich]] for rich in NWfromMatrix_giant]
headers=['Businessman','Degree','Closeness','Betweenness','Eigenvector']
DFCentrality=pandas.DataFrame(Centrality,columns=headers)

In [None]:
DFCentrality.head()

Representing these nodes (step by step):

In [None]:
import matplotlib.pyplot as plot

plot.scatter(DFCentrality.Betweenness, DFCentrality.Closeness)

In [None]:
# size od dot
plot.scatter(DFCentrality.Betweenness, DFCentrality.Closeness,s=(DFCentrality.Degree+1.3)**14)

In [None]:
plot.figure(figsize=(20,20)) # size of plot
plot.scatter(DFCentrality.Betweenness, DFCentrality.Closeness,s=(DFCentrality.Degree+1.3)**14)

In [None]:
# color of point
plot.figure(figsize=(20,20))
plot.scatter(DFCentrality.Betweenness, DFCentrality.Closeness,s=(DFCentrality.Degree+1.3)**14,
c=DFCentrality.Eigenvector,cmap=plt.get_cmap('YlOrRd'))

In [None]:

plot.figure(figsize=(20,20))
plot.scatter(DFCentrality.Betweenness, DFCentrality.Closeness,s=(DFCentrality.Degree+1.3)**14,
c=DFCentrality.Eigenvector,cmap=plt.get_cmap('YlOrRd'))


# ANNOTATING DOTS:

for i in range(len(DFCentrality.index)):
    plot.annotate(DFCentrality['Businessman'][i], 
                  (DFCentrality['Betweenness'][i],DFCentrality['Closeness'][i]),
                  fontsize=18,color="orange")

In [None]:
plot.figure(figsize=(20,20))
plot.scatter(DFCentrality.Betweenness, DFCentrality.Closeness,s=(DFCentrality.Degree+1.3)**14,
c=DFCentrality.Eigenvector,cmap=plt.get_cmap('YlOrRd'))

for i in range(len(DFCentrality.index)):
    plot.annotate(DFCentrality['Businessman'][i], 
                  (DFCentrality['Betweenness'][i],DFCentrality['Closeness'][i]),
                  fontsize=18,color="orange")

# TITLES:

plot.title("scatterplot (size for degree of node, color for eigenvalue)")
plot.xlabel("betweenness")
plot.ylabel("closeness")

<a id='part3'></a>
## 3. Exporting the Network

In [None]:
net.write_graphml(NWfromMatrix, "data/ElitePeru.graphml",encoding='utf-8')
net.write_gexf(NWfromMatrix, "data/ElitePeru.gexf",encoding='utf-8')

In [None]:
type(NWfromMatrix.nodes(data=True)[1][1]["Multinational"])

In [None]:
for i in range(len(NWfromMatrix.nodes(data=True))):
    NWfromMatrix.nodes(data=True)[i][1]["Multinational"]=int(NWfromMatrix.nodes(data=True)[i][1]["Multinational"])

In [None]:
net.write_graphml(NWfromMatrix, "data/ElitePeru.graphml",encoding='utf-8')
net.write_gexf(NWfromMatrix, "data/ElitePeru.gexf",encoding='utf-8')