# Analyzing Network Data with networkx

For documentation and the developer's tutorial, see [here](https://networkx.github.io/documentation/stable/).

In [None]:
import numpy as np
import networkx as nx

G = nx.Graph() # initialize empty graph

# construct intransitive triad
G.add_node(0) # add a node labeled '0'
G.add_nodes_from([1,2]) # add two more nodes, respectively labeled '1' and '2'
G.add_edges_from([[0, 1], [0, 2]]) # add edges 01 and 02

Let's check that we did, indeed, construct an intransitive triad.

In [None]:
print(G.has_edge(0,1))
print(G.has_edge(0,2))
print(G.has_edge(1,2))
print("{} nodes, {} edges".format(G.number_of_nodes(), G.number_of_edges()))

In [None]:
print(G.nodes)
print(G.edges)

${\tt G.degree[0]}$ outputs the degree of node 0.

**Exercise:** create a numpy array of node degrees, one for each node. Try to do this in just one line (recall list comprehensions). Then output the mean, stdev, min, and max degrees.

In [None]:
# deleting nodes
G.remove_node(0)
print("{} nodes, {} edges".format(G.number_of_nodes(), G.number_of_edges()))

There are many functions available for generating random graphs. The most basic one, which generates an Erdos-Renyi graph, is as follows:

In [None]:
# random graph with n nodes and m links
n = 100
m = 70
ER = nx.gnm_random_graph(n, m)

print("{} nodes, {} edges".format(ER.number_of_nodes(), ER.number_of_edges()))

## Saving and Loading

A good format for saving network data is adjlist. 

In [None]:
nx.write_adjlist(ER, "test.adjlist")
ER2 = nx.read_adjlist("test.adjlist")
print("{} nodes, {} edges".format(ER2.number_of_nodes(), ER2.number_of_edges()))

If you have node or edge data, you can use graphml instead, but this produces bigger files. Also, the Gephi visualization program doesn't read adjlist files, so you'll need to export as graphml or edgelist. Edgelist is the most compact file format, consisting of only a list of dyads, but this means dropping all isolates. 

In [None]:
nx.write_graphml(ER, "test.graphml")
ER2 = nx.read_graphml("test.graphml")
print("{} nodes, {} edges".format(ER2.number_of_nodes(), ER2.number_of_edges()))

nx.write_edgelist(ER, "test.edgelist")
ER2 = nx.read_edgelist("test.edgelist")
print("{} nodes, {} edges".format(ER2.number_of_nodes(), ER2.number_of_edges()))

Open test.adjlist in a word processor, and see what the format looks like.

## Network Statistics

Functions for outputting some core summary statistics:

In [None]:
G = nx.complete_graph(10)
G.remove_edges_from([[0,1], [1,2], [2,3]])
G.add_nodes_from(range(11,16))

# clustering
print("average clustering: {}".format(nx.average_clustering(G)))

# average degree
import numpy as np
DegreeList = np.array([G.degree[i] for i in G.nodes])
print("average degree: {}".format(DegreeList.mean()))

# giant component
giant_size = len(max(nx.connected_components(G), key=len))
print("frac in giant component: {}".format(giant_size / G.number_of_nodes()))

# isolates
isolates = list(nx.isolates(G))
print("{} isolates in G".format(len(isolates)))

# number of components
print("{} components in G".format(nx.number_connected_components(G)))

**Exercise:** Import the Facebook data from the file ${\tt facebook.adjlist}$ on Blackboard. Construct a list containing these five statistics for the Facebook graph, and print the list.

## Degree Distribution

Let's plot the degree distribution of an Erdos-Renyi graph on the log-log scale.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt, numpy as np
from scipy.stats import relfreq

# generate graph
n = 5000
p = 5/n
G = nx.fast_gnp_random_graph(n, p)
print(nx.info(G)) # summary statistics

**Exercise:** construct a list containing the degree of each node, excluding isolates.

In [None]:
# empirical frequency of degrees
EmpDist,lowerlimit,binsize,exceptions = relfreq(degrees, numbins=np.max(degrees))
print(np.vstack([np.unique(degrees), EmpDist])) # first row lists unique values of degree
                                                # second row lists empirical frequencies

To take the log of the elements of an array ${\tt x}$, use ${\tt np.log(x)}$. To get the unique elements of an array ${\tt x}$, use ${\tt np.unique(x)}$. 

**Exercise:** To plot degree distribution on log-log plot, want to use ${\tt plot(x, y, }$'.'${\tt )}$, where ${\tt x}$ is log of degrees and ${\tt y}$ is log of empirical frequencies. Construct ${\tt x}$ and ${\tt y}$ and plot them.