<h1> Insulin protein network </h1>
    

- In this notebook you will apply the functions from the ***previous notebook*** to a protein-protein interaction network and a few more concepts and how to manipulate graphs in python.

Relevant literature:

<a href="https://www.nature.com/articles/s41467-019-09177-y"> Network-based prediction of protein interactions </a>

### Import libraries

In [None]:
import networkx as nx
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
import warnings

import csv
from operator import itemgetter

import pandas as pd

from statsmodels.distributions.empirical_distribution import ECDF


warnings.filterwarnings('ignore')

%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

## Loading data¶

- Load the two csv files: 
        insulinnetwork_nodelist.csv (contains names on protein) 
        insulinnetwork_edgelist.cvs (contains the protein-protein interactions and an interaction scores)

`Pandas.read_csv` is the easiest way to do this.

In [None]:
nodes = pd.read_csv('data/insulinnetwork_nodelist.csv')
nodes = nodes.Name.values # returns array of the values in column "Name"

In [None]:
edges = pd.read_csv('data/insulinnetwork_edgelist.csv')
edges = [(row['Source'], row['Target']) for idx,row in edges.iterrows()] # make a pair for values in each row

Inspect our objects, then use them to construct a networkX graph:

In [None]:
print(nodes)
print('-------------')
print(edges)

In [None]:
# create a new graph
G = nx.Graph() 

# add nodes and edges
G.add_nodes_from(nodes) 
G.add_edges_from(edges)

<div class='alert alert-warning'>
    <h4>Exercise 1 </h4>Basic information and drawing the network</h4>
</div>

The network has the protein Insulin as the center of the network. This means that all the other proteins are connected by edges to Insulin, but may also be connected to eachother. This is called an **ego network**.

Answer the question below by applying some of the commands you learned in the previous notebook. To get a high learning output you can type in the codes manually - to save time you can chose to copy-paste them instead. 

<div class='alert alert-warning'>
       <span style=" font-weight: bold;"> 1.a. </span>Display the number of nodes and edges in the network
</div>

In [None]:
# Ex1 a)


In [None]:
# %load solutions/ex2_1a.py

<div class='alert alert-warning'>
    <span style=" font-weight: bold;">1.b.</span> Print the first five edges in the network. Do 'AKT1' and 'EGF' interact?
</div>

In [None]:
# b)


In [None]:
#%load solutions/ex2_1b.py

<div class='alert alert-warning'>
    <span style=" font-weight: bold;">
        1.c. </span>Draw network with labels

In [None]:
# c)


In [None]:
# %load solutions/ex2_1c.py

<div class='alert alert-warning'>
    <span style=" font-weight: bold;">
        1.d.  </span>List the neighbors of 'AKT1'

In [None]:
# d)


In [None]:
# %load solutions/ex2_1d.py

<div class='alert alert-warning'>
    <span style=" font-weight: bold;">
        1.e.  </span>Draw subnetwork for the protein 'AKT1'

In [None]:
# e)


In [None]:
# %load solutions/ex2_1e.py

### Degree centrality

In the notebook <a href='./0_Concepts_in_network_theory.ipynb'>0_Concepts_in_network_theory</a> we introduced some important graph metrics. Centralities are metrics to quantify the importance of the node. Degree centrality represents simply the number of edges the node has.

In [None]:
# get the degrees in dictionary format
ds = dict(G.degree)
print(ds)

#access only the values and plot them
plt.hist(ds.values())
plt.show()

<div class='alert alert-warning'>
    <span style=" font-weight: bold;">
        1.f.  </span>Rank top 5 proteins with the highest centrality 

In [None]:
# f)


In [None]:
# %load solutions/ex2_1f.py

<div class='alert alert-warning'>
    <span style=" font-weight: bold;">
        1.g. </span> Plot the network, but the size of the nodes proportional to their degree centrality (number of edges adjacent to the node, high centrality=bigger). </div>
        

**Hint:** Use the arguments `nodelist` and `node_size` in `nx.draw`. *node_size* takes either one value (size for all nodes) or a list as input. The order has to match and that's why using *nodelist* is a good idea.

In [None]:
# g)


In [None]:
# %load solutions/ex2_1g.py

### ECDF

We plot a empirical cumulative density function (ECDF), like in notebook 1. 

In [None]:
ecdf = ECDF(list(ds))
x,y = ecdf.x, ecdf.y

plt.scatter(x,y)
plt.title('Degree Centralities')

<div class='alert alert-warning'>
    <h4>Exercise 2.</h4> What does the shape of the curve tell you? (answer in words)
</div>

In [None]:
#Ex 2)


In [None]:
# %load solutions/ex2_2.py

## Layout and directions


Now we wants to look at different layout and include directions on the edges.

First we draw a standard graph from our data:

In [None]:
nx.draw(G, with_labels=True)
plt.show()

## Topological versus physical space
Most networks do not "exist" in physical space (e.g. a protein interaction network is an abstract network, while an air travel network has nodes fixed in physical space). But all networks can be said to exist in **topological** space, but the position of the nodes are only a feature of how *we* decide to plot them, not a property of the network itself. A certain layout may simplify the graph and make it easier to analyse and interpret, for example by plotting interacting nodes closer together. We can choose between dozens of different layout algorithms. Here we start by applying the Fruchterman Reingold layout algorithm, which is a force-directed layout.

In [None]:
# Fruchterman Reingold
nx.draw(G, node_size=1200, node_color='lightblue',
    linewidths=0.25, font_size=10, font_weight='bold', with_labels=True, pos=nx.fruchterman_reingold_layout(G))
plt.title("fruchterman_reingold")
plt.show()


You can also try out some of the other layout options in Networkx. Instead of Fruchterman_Reingold try circular, random and spectral.

In [None]:
def plot_graph(G, layout):
    nx.draw(G, node_size=1200, node_color='lightblue',
    linewidths=0.25, font_size=10, font_weight='bold', with_labels=True, pos=layout(G))
    plt.title(layout.__name__)
    plt.show()

<div class='alert alert-warning'>
<h4>Exercise 3. </h4> Use the function above (plot_graph) and try out the 3 different layouts from `nx`: circular, random and spectral.

In [None]:
# Ex3


In [None]:
# %load solutions/ex2_3.py

---
## Directed graphs

In the graphs we have made so far, the direction of the edges were not given. In the previous notebook, if Mads was playing with Anna, then Anna was also playing with Mads. If instead we are dealing with situations were e.g. a protein activate another protein, we would like to add a direction on the relationship - going from the activating protein towards the activated protein. We do that by making a directed graph.

In networkx we use the command `nx.DiGraph()` to indicate that we now want to make a **directed** graph. 

Be aware that one edge can go in both directions, indicated by an arrow in both ends of the edge.

In [None]:
# make a directed graph
G = nx.DiGraph()

# add nodes and eges
G.add_nodes_from(nodes)
G.add_edges_from(edges)

In [None]:
# draw the directed graph
nx.draw(G, node_size=1200, node_color='lightblue',
        linewidths=0.25, font_size=10, 
        font_weight='bold', with_labels=True, 
        pos=nx.circular_layout(G))

plt.title("Directed graph")
plt.show()

<div class='alert alert-warning'>
<h4>Exercise 4:</h4> Now, try to make a directed network with a circular layout, green nodes and where the labels have a white font. 

In [None]:
# Ex4


In [None]:
# %load solutions/ex2_4.py

##  Density, shortest paths and betweenness centrality

We will now introduce a few more terms related to graphs and show how networks can be represented as a plot instead of a graph. 

We start by calculation of the density of the network. Whereas the centrality is connected to the individual nodes, the density is related to the network and describes the **portion of the potential connections** in a network **that are actual connections**, i.e. how many of the potential edges are given in the network. For this network we will see that around one sixth of all the potential connections are actually given in the network.

<div class='alert alert-warning'>
    <h4>Exercise 5. </h4>Calculate the density of G. 

In [None]:
# Ex5


In [None]:
# %load solutions/ex2_5.py

Usually we find that there is a method that will do what we are looking for:

In [None]:
nx.density(G)

In [None]:
# do you agree that about 1/3 of the squares are white?
npg = nx.to_numpy_array(G)
plt.imshow(npg, cmap='gray')

We can also study the paths through the network. Often we will be most interested in finding the shortest path between two nodes, e.g. the shortest way that a signal can travel from one protein to another. E.g. the shortest path from GCG to EGF. 

In [None]:
nx.shortest_path(G, 'GCG', 'EGF')

<div class='alert alert-warning'>
<h4>Exercise 6. </h4> What happens if we take the shortest path the other way around - from EGF to GCG?

In [None]:
# Ex6


In [None]:
# %load solutions/ex2_6.py

<div class='alert alert-warning'>
<h4>Exercise 7.</h4> Change the graph from a directed graph to an undirected graph and then try the same.
    
**Hint.** *Make a new undirected graph passing existing graph as parameter.*
 </div>   



In [None]:
# Ex7


In [None]:
# %load solutions/ex2_7.py

<div class='alert alert-warning'>
Select two arbitrary proteins of your own and find the shortest path between them.

In [None]:
G.nodes

In [None]:
# Shortest path from __ to __


---
## Betweennes centrality

Knowing about the concepts of shortest path and centrality, we can now meassure the betweenness centrality. Betweenness centrality is a measure of centrality in a graph based on shortest paths. The **betweenness centrality** for each node (here proteins) **is the number of shortest paths that pass through the node**. 

You can say that the higher betweenness centrality a node has, the more control the node has over the network, because more information will pass through that node. In an ego-centeret network it is expected that the ego (here Insulin) will have the highest betweenness centrality. 

In [None]:
btws = nx.betweenness_centrality(G, normalized=True)

# plot the betweennes of each node
plt.bar(btws.keys(), btws.values())

<div class='alert alert-warning'>
    <h4>Exercise 8.</h4> Which of the proteins in the insulin network has the second highest betweenness centrality?

In [None]:
# b)


In [None]:
# %load solutions/ex2_8.py

## Networks as matrices and arrays

We will end this notebook with demonstrating how a matrix plot of a network can be made. We use the nxviz function from MatrixPlot for this. In matrixplot first row and last column is the same node and so on.

In [None]:
from nxviz import MatrixPlot
m = MatrixPlot(G)

Depending on whether or not your present network is given as a directed or undirected graph, the plot will have a symmetric or asymmetric structure. 

<div class='alert alert-warning'>
    <h4>Exercise 9. </h4>Make a new Matrixplot, but with the directed graph

In [None]:
# Ex9


In [None]:
# %load solutions/ex2_9.py

We can also use an array function from numpy to make an array of the graph instead of a matrix plot.

In [None]:
# note that the nodes might be in different order
A1 = nx.to_numpy_array(G)
A1

---
Next you can explore following links:

https://python.quantecon.org/sir_model.html

https://scipython.com/book/chapter-8-scipy/additional-examples/the-sir-epidemic-model/