# Introduction to graph analysis with networkx

## Graphs are everywhere

![Routing graph](img/17_node_mesh_network.png)


![Family tree](img/Familia_Curie.png)

![Social network](img/social_network.png)


<img src="img/Semantic_Net.svg" alt="Semantic graph" style="height: 700px;"/>

## What are graphs?

### Definition
- A graph  is a pair G = (V, E), where V is a set whose elements are called vertices (singular: vertex), and E is a set of two-sets (set with two distinct elements) of vertices, whose elements are called edges (sometimes links or lines)
- A directed graph or digraph is a graph in which edges have orientations
- A complete graph is a graph in which each pair of vertices is joined by an edge. A complete graph contains all possible edges
- A weighted graph or a network is a graph in which a number (the weight) is assigned to each edge

### Further terms
- Centrality: identify the most important vertices within a graph
- Component: Is a subgraph in which any two vertices are connected to each other by paths in an undirected graph
- Complete graph: Every node is connected to each other node


## Preparation

In [None]:
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np

In [None]:
%matplotlib inline

In [None]:
def get_node_size(value_dict):
    value_array = np.array(list(value_dict.values()))
    value_range = value_array.max() - value_array.min()
    node_size = (2 * (value_array - value_array.min())/value_range + 1) * 300
    return node_size.tolist()

## networkx package structure

- networkx.{Graph, DiGraph, MultiGraph, MultiDiGraph}: Basic classes for Graphs
- networkx.algorithms.*: Functions to evaluate and analysing on a graph structure
- networkx.classes.function.*: Get graph properties via function calls
- networkx.generator.*: Generate specific types of graphs or random graphs, some existing datasets
- networkx.linalg.*: Calculate some derived matrix properties of graph
- networkx.convert.*: Conversion from/to different python data types
- networkx.drawing.*: (Basic) layouting and plotting functions

## Calling package functions
Almost every function can be used by applying: 
```
nx.function_name(G, additional_arguments)
```
where G is the Graph you are trying to analyse

## Defining graphs in networkx

### Undirected

In [None]:
G = nx.Graph() 

# Add a node
G.add_node(1) 
G.add_nodes_from([2,3]) # You can also add a list of nodes by passing a list argument

# Add edges 
G.add_edge(1,2)

e = (2,3)
G.add_edge(*e) # * unpacks the tuple
G.add_edges_from([(1,2), (1,3)]) # Just like nodes we can add edges from a list

### Directed

In [None]:
G = nx.DiGraph()

# Defining nodes and edges is the same as in the Graph example:
G.add_nodes_from([1, 2,3])

G.add_edge(1,2)

G.add_edges_from([(1,2), (1,3), (2,3)])

In [None]:
plt.figure(figsize=(16,12))
nx.draw_networkx(G, with_labels=True, arrowsize=30, node_size=1000, width=3)

### With weights

In [None]:
G = nx.Graph()
G.add_nodes_from([1, 2, 3, 4])
G.add_weighted_edges_from([(1, 4, 5.), (2, 3, 0.5), (1, 2, 1.), (3, 4, 3.)])

In [None]:
nx.attr_matrix(G, edge_attr='weight')

In [None]:
labels = nx.get_edge_attributes(G, 'weight')

In [None]:
pos = nx.spring_layout(G)
plt.figure(figsize=(16,12)) 
nx.draw_networkx_nodes(G, pos, node_size=600)
nx.draw_networkx_edges(G, pos, width=list(labels.values()))
nx.draw_networkx_labels(G, pos);
nx.draw_networkx_edge_labels(G, pos, font_size=20);

### Accessing graph properties

In [None]:
G.nodes()

In [None]:
G.edges()

## Creating a graph with the conversion functions

In [None]:
edges = pd.read_csv('data/out.moreno_innovation_innovation', sep=' ', names=['from_node', 'to_node'], skiprows=2)

In [None]:
edges

In [None]:
digraph = nx.from_pandas_edgelist(edges,'from_node', 'to_node', create_using=nx.DiGraph)

In [None]:
pd.DataFrame((nx.adjacency_matrix(digraph).todense()))

In [None]:
plt.figure(figsize=(16,12)) 
nx.draw_networkx(digraph, with_labels=True)

## Random Graphs

### Erdos-Renyi networks
Every edge has constant probabilty $p$

In [None]:
erdos_renyi = nx.random_graphs.erdos_renyi_graph(50, 0.1)

In [None]:
plt.figure(figsize=(16,12)) 
nx.draw_networkx(erdos_renyi, with_labels=True, node_size=600, width=2)

### $\log(n)/n$ phase transition

In [None]:
# Add plot increasing edge prob multiple trials amount one component vs more than one
probs = np.arange(0.01, 0.05001, 0.0025)
has_one_component = [np.mean([nx.number_connected_components(nx.random_graphs.erdos_renyi_graph(100, p)) < 2 for i in range(200)]) for p in probs]

In [None]:
plt.figure(figsize=(16, 12))
plt.plot(probs, has_one_component, lw=4)
plt.xlabel('Edge probability', fontsize=25)
plt.ylabel('Frequency of one component', fontsize=25)
plt.tick_params(labelsize=20)
plt.show()

### Growing random networks

### Preferential attachment  (Barabási–Albert)

In [None]:
bara_albert = nx.barabasi_albert_graph(50, 2)

In [None]:
plt.figure(figsize=(16,12)) 
nx.draw_networkx(bara_albert, with_labels=True, node_size=600, width=2)

## Graph analysis

In [None]:
positions = nx.spring_layout(bara_albert)

### Edge density
Number of edges in the graph compared to number of edges in complete graph

In [None]:
nx.density(bara_albert)

### Dijkstra pathes
Shortest path from one vertex to another vertex

In [None]:
dij_path = nx.dijkstra_path(bara_albert, source=49, target=20)

In [None]:
dij_path

### Average shortest path length
Average length of all shortest pathes

In [None]:
nx.average_shortest_path_length(bara_albert)

### Node degree

In [None]:
node_degree = nx.degree_centrality(bara_albert)

In [None]:
plt.figure(figsize=(16,12))
nx.draw_networkx(bara_albert, positions, with_labels=True, arrowsize=30, node_size=get_node_size(node_degree), 
                 node_color=get_node_size(node_degree))

### Degree histogramm

In [None]:
degrees = nx.degree_histogram(bara_albert)

In [None]:
plt.figure(figsize=(16, 12))
plt.plot(list(range(len(degrees))), np.array(degrees)/len(bara_albert.nodes), lw=4)
plt.xlabel('Degree', fontsize=25)
plt.ylabel('Degree frequency', fontsize=25)
plt.tick_params(labelsize=20)
plt.show()

### Closeness centrality
Inverse average distance to all other nodes in the graph

In [None]:
closeness = nx.closeness_centrality(bara_albert)

In [None]:
plt.figure(figsize=(16,12))
nx.draw_networkx(bara_albert, positions, with_labels=True, arrowsize=30, node_size=get_node_size(closeness), 
                 node_color=get_node_size(closeness))

### Betweenness centrality
Number of shortest pathes between two nodes the node is contained

In [None]:
betweenness = nx.betweenness_centrality(bara_albert, normalized=True, endpoints=True)

In [None]:
plt.figure(figsize=(16,12))
nx.draw_networkx(bara_albert, positions, with_labels=True, arrowsize=30, node_size=get_node_size(betweenness), 
                 node_color=get_node_size(betweenness))

### Eigenvector centrality
- Eigenvector belonging to the largest eigenvalue of the adjacency matrix
- Captures imortance of nodes the node is connected to

In [None]:
eigenvec_cen = nx.eigenvector_centrality_numpy(bara_albert)

In [None]:
plt.figure(figsize=(16,12))
nx.draw_networkx(bara_albert, positions, with_labels=True, arrowsize=30, node_size=get_node_size(eigenvec_cen), 
                 node_color=get_node_size(eigenvec_cen))

### Clustering
Fraction of neighboring nodes that have a edge with each other (friends are also friends)

In [None]:
clustering = nx.clustering(bara_albert)

In [None]:
plt.figure(figsize=(16,12))
nx.draw_networkx(bara_albert, positions, with_labels=True, arrowsize=30, node_size=get_node_size(clustering), 
                 node_color=get_node_size(clustering))

### Minimum spanning tree
Graph with the smallest amount of edge weights that connects all vertices

In [None]:
msp = nx.minimum_spanning_tree(bara_albert)

In [None]:
plt.figure(figsize=(16,12))
nx.draw_networkx(msp, positions, with_labels=True, arrowsize=30, node_size=600, width=2)

## Graph Layouting

### Circular layout

In [None]:
plt.figure(figsize=(16,12)) 
nx.draw_circular(digraph, with_labels=True, node_size=600, width=2)

In [None]:
nx.number_connected_components(digraph.to_undirected())

### Kamada Kawai layout
- edges are of more or less equal length and there are as few crossing edges as possible

In [None]:
plt.figure(figsize=(16,12)) 
nx.draw_kamada_kawai(digraph, with_labels=True, node_size=600, width=2)

### Spring layout

In [None]:
plt.figure(figsize=(16,12)) 
nx.draw_spring(digraph, with_labels=True, node_size=600, width=2)

### Spectral layout

In [None]:
plt.figure(figsize=(16,12)) 
nx.draw_spectral(digraph, with_labels=True, node_size=2000, width=2)

### Alternative formulation (calculate layout independently)

In [None]:
pos = nx.kamada_kawai_layout(digraph)
plt.figure(figsize=(16,12))
nx.draw_networkx(digraph, pos=pos, with_labels=True, node_size=600, width=2)