# Task 1
## Basic graph statistics
### Part 1: Tutorial

First, we need to import pyplot and the library itself:


In [None]:
import snap
from matplotlib import pyplot as plt
%matplotlib inline

Let's create a random graph with 100 vertices and 1 000 edges:

In [None]:
g = snap.GenRndGnm(snap.PUNGraph, 100, 1000)

#### Degree distribution:

Now, let's compute degree distribution pairs:

In [None]:
count_vec = snap.TIntPrV()
snap.GetOutDegCnt(g, count_vec)

And print the distribution:

In [None]:
for p in count_vec:
     print "degree %d: count %d" % (p.GetVal1(), p.GetVal2())

Let's plot it:

In [None]:
x = [int(p.GetVal1()) for p in count_vec]
y = [int(p.GetVal2()) for p in count_vec]
    
plt.bar(x, y)
plt.show()

#### Clustering coefficient:

Now, let's calculate the degree - local clustering coefficient for all nodes in this graph.

You need to declare a new float vector for the result:

In [None]:
clustering_vec = snap.TFltPrV()

And now calculate the average coefficient and the distribution itself:

In [None]:
avg_cf = snap.GetClustCf(g, clustering_vec)

Let's print it out:

In [None]:
print "Average clustering coefficient:", avg_cf
print
for pair in clustering_vec:
    print "degree: %d, clustering coefficient: %f" % (pair.GetVal1(), pair.GetVal2())

Let's plot the degree-clustering coefficient distribution:

In [None]:
x = [int(p.GetVal1()) for p in clustering_vec]
y = [p.GetVal2() for p in clustering_vec]
    
plt.bar(x, y)
plt.show()

#### Degree centrality:

We actually calculated the degree centrality already (see the degree distribution point). But snap itself has the ability to calculate it and normalize it into [0, 1]. Let's calculate this and plot:

In [None]:
x = []
y = []
for node in g.Nodes():
    deg_centrality = snap.GetDegreeCentr(g, node.GetId())
    print "node: %d centrality: %f" % (node.GetId(), deg_centrality)
    x.append(node.GetId())
    y.append(deg_centrality)

plt.bar(x, y)
plt.show()

#### Closeness and farness centrality:

We calculate these centralities for every node in our random graph and then plot:

In [None]:
x = []
yc = []
yf = []

for node in g.Nodes():
    farness = snap.GetFarnessCentr(g, node.GetId())
    closeness = snap.GetClosenessCentr(g, node.GetId())
    print "node: %d closeness: %f farness: %f" % (node.GetId(), closeness, farness)
    x.append(node.GetId())
    yc.append(closeness)
    yf.append(farness)
    
plt.bar(x, yc)
plt.show()
plt.bar(x, yf)
plt.show()

#### Betweenness centrality:

For the betweenness centrality, you actually have to get the centralities for all vertices.

Snap function calculates the centralities for both nodes and edges.

Let's create the hash tables for nodes and edges to compute the centralities:

In [None]:
nodes_btw_centr = snap.TIntFltH()
edges_btw_centr = snap.TIntPrFltH()

Now, let's calculate the betweenness centrality for the nodes:

In [None]:
snap.GetBetweennessCentr(g, nodes_btw_centr, edges_btw_centr, 1.0) 

The last parameter tells snap how much to approximate the value (1.0 means the actual value, less number means worse quality of the measure).

Let's print out and plot the centrality:

In [None]:
x = [node for node in nodes_btw_centr]
y = [nodes_btw_centr[node] for node in nodes_btw_centr]

for i in range(len(x)):
    print 'Node %d centrality %f' % (x[i], y[i])

plt.bar(x, y)
plt.show()

#### Eigenvector centrality:

The calculation of the eigenvector centarlity is very similar to the betweenness centrality:

In [None]:
nodes_eig_centr = snap.TIntFltH()
snap.GetEigenVectorCentr(g, nodes_eig_centr)

Let's print it out and plot:

In [None]:
x = [node for node in nodes_eig_centr]
y = [nodes_eig_centr[node] for node in nodes_eig_centr]

for i in range(len(x)):
    print 'Node %d centrality %f' % (x[i], y[i])

plt.bar(x, y)
plt.show()

### Part 2: Task

For the graphs: web-Stanford and roadNet-PA:

* calculate the average clustering coefficient (*),
* calculate the degree distribution and plot it,
* calculate all the centrality measures (without betweenness centrality) mentioned above for every 1000 (Stanford) or 10000 (PA) node, print it and plot.

(*) for the clustering coefficient use the snap function:

```avg_coeff = snap.GetClustCf(graph)```

We load the graphs for you below:

In [None]:
g1 = snap.LoadEdgeList(snap.PUNGraph, "./data/web-Stanford.txt", 0, 1)
g2 = snap.LoadEdgeList(snap.PUNGraph, "./data/roadNet-PA.txt", 0, 1)