<p style="text-align:center">
PSY 394U <b>Data Analytics with Python</b>, Spring 2018


<img style="width: 400px; padding: 0px;" src="https://github.com/sathayas/JupyterAnalyticsSpring2018/blob/master/images/Title_pics.png?raw=true" alt="title pics"/>

</p>

<p style="text-align:center; font-size:40px; margin-bottom: 30px;"><b> Network statistics </b></p>

<p style="text-align:center; font-size:18px; margin-bottom: 32px;"><b>March 29, 2018</b></p>

<hr style="height:5px;border:none" />

# 1. How big is a network?
<hr style="height:1px;border:none" />

## Number of nodes and edges
The size of a network can be easily summarized by the numbers of nodes and edges. Here, we call them $n$ and $m$. These quantities can be easily calculated by the **`len()`** function on **`.nodes()`** and **`.edges()`** methods associated with a graph. Here are two examples. 
  * **`G_karate`**: From **`karate.gml`**, Zachary's karate club network
  * **`G_netsci`**: From **`netscience.gml`**, network science co-authorship network

*Both data sets are available from [Mark Newman](http://www-personal.umich.edu/~mejn/netdata/)*

`<NetSize.py>`

In [2]:
import networkx as nx
import numpy as np

# loading network data
G_karate = nx.read_gml('karate.gml', label='id')  # Karate network
G_netsci = nx.read_gml('netscience.gml')  # network science co-authorship


# Network sizes
print('Network sizes')
print("Zachary's karate network, n:", len(G_karate.nodes()), sep='')
print("Zachary's karate network, m:", len(G_karate.edges()), sep='')

print("Network science co-authorship network, n:",
      len(G_netsci.nodes()), sep='')
print("Network science co-authorship network, m:",
      len(G_netsci.edges()), sep='')


Network sizes
Zachary's karate network, n:34
Zachary's karate network, m:78
Network science co-authorship network, n:1589
Network science co-authorship network, m:2742


## Giant component size

In a network data set, there is no guarantee that all nodes are connected as a single network. It is plausible that some nodes are disconnected from other nodes. Thus, in addition to network sizes, we can also examine the size of the giant component, or the number of nodes included in the largest connected component in the data set. Unfortunately there isn't a straightforward approach to calculate the giant component size in `networkx`. We will use a function **`connected_component_subgraphs`**, which returns a generator of connected subcomponents of the network data. A generator provides a sequence of items you can use in a `for` loop. We first generate a list of connected component sizes, then find the max as the giant component size.

In [3]:
# Giant component sizes
print('Giant component sizes')
listCC_karate = [len(G.nodes()) for G in nx.connected_component_subgraphs(G_karate)]
listCC_netsci = [len(G.nodes()) for G in nx.connected_component_subgraphs(G_netsci)]
print("Zachary's karate network, GC:", max(listCC_karate), sep='')
print("Network science co-authorship network, GC:",
      max(listCC_netsci), sep='')

Giant component sizes
Zachary's karate network, GC:34
Network science co-authorship network, GC:379


As you can see, the karate network includes all nodes as part of the giant component, whereas the network science network only includes a fraction of all available nodes as part of the giant component.

### Exercise
The following network data sets are available for you:

* Metrics
   * Size metrics (nodes, edges)
   * Giant component
   * Connectivity metrics (degree, assortativity)
      * Scale-free
   * Distance metrics (path lengths, diameter)
   * Clustering metrics (clustering coefficient)
   * Small-world
* Random deletion vs targeted attack
