## scratch

1. Read .gml file
2. Degree histogram
3. Density
4. Degree centrality
5. Closeness centrality
6. Betweenness centrality
7. Degree assortativity coefficient
8. Degree pearson correlation coefficient
9. Clustering coefficient
10. Average node connectivity

- MultiDigraph to MultiGraph
    - directed = "graph"
    - undirected = "ugraph"

In [1]:
import networkx as nx
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import os
from glob import glob

gml_files = glob('../output/network/*/*.gml')

# graph = nx.read_gml('../data/graph/article1.gml')
# print(len(gml_files))
# gml_files
# gml_files[0]

In [None]:
def calculate_graph_inf(graph):
    graph.name = filename
    info = nx.info(graph)
    print info
    
    ## plot spring layout
    # plt.figure(figsize=(11,11))
    # nx.draw_spring(graph, arrows=True, with_labels=True)

In [None]:
for graph_num, gml_graph in enumerate(gml_files):
    graph = nx.read_gml(gml_graph)
    ugraph = graph.to_undirected() # to undirected
    (filepath, filename) = os.path.split(gml_graph)
    print('-' * 40)
    print(gml_graph)
    calculate_graph_inf(graph)
    calculate_graph_inf(ugraph)
    if graph_num == 1:
        break

- - -

### Undirected network

In [None]:
# print edge info that was lost during conversion to undirected

ugraph = graph.to_undirected(reciprocal=True)
print nx.info(ugraph)
print nx.edges(ugraph)

- - -

### Degree histogram
Return a list of the frequency of each degree value; degree values are the index in the list

In [None]:
# returns a list of frequencies of degrees
print ("undirected graph ="), nx.degree_histogram(ugraph)
print ("directed graph ="), nx.degree_histogram(graph)

In [None]:
# only for undirected type

degree_sequence=sorted(nx.degree(ugraph).values(),reverse=True) # degree sequence
#print "Degree sequence", degree_sequence
dmax=max(degree_sequence)

plt.loglog(degree_sequence,'b-',marker='o')
plt.title("Degree rank plot")
plt.ylabel("degree")
plt.xlabel("rank")

# draw graph in inset
plt.axes([0.45,0.45,0.45,0.45])
Gcc=sorted(nx.connected_component_subgraphs(ugraph), key = len, reverse=True)[0]
pos=nx.spring_layout(Gcc)
plt.axis('off')
nx.draw_networkx_nodes(Gcc,pos,node_size=20)
nx.draw_networkx_edges(Gcc,pos,alpha=0.4)

plt.show()

### Density
Notes: The density is 0 for a graph without edges and 1 for a complete graph. The density of multigraphs can be higher than 1. Self loops are counted in the total number of edges so graphs with self loops can have density higher than 1.

In [None]:
print "undirected graph =", nx.density(ugraph)
print "directed graph =", nx.density(graph)

### Degree centrality
Degree centrality for a node v is the fraction of nodes it is connected to

In [None]:
# get all the values of the dictionary, this returns a list of centrality scores
# turn the list into a numpy array
# take the mean of the numpy array

print "Degree centrality (directed) =", np.array(nx.degree_centrality(graph).values()).mean()
print "Degree centrality (undirected) =", np.array(nx.degree_centrality(ugraph).values()).mean()

### Closeness centrality
Closeness centrality of a node u is the reciprocal of the sum of the shortest path distances from u to all n-1 other nodes. Since the sum of distances depends on the number of nodes in the graph, closeness is normalized by the sum of minimum possible distances n-1
Higher values of closeness indicate higher centrality

In [None]:
# clo_cen = np.array(nx.closeness_centrality(graph).values()).mean()
# nx.closeness_centrality(graph)
# print "Closeness centrality (directed) =", np.array(nx.closeness_centrality(graph).values()).mean()
# print "Closeness centrality (undirected) =", np.array(nx.closeness_centrality(ugraph).values()).mean()

a = nx.closeness_centrality(graph)
dfIn=pd.DataFrame.from_dict(a,orient='index')
dfIn.columns = ['closeness centrality']
dfIn = dfIn.sort_values(by=['closeness centrality'])
dfIn

### Betweenness centrality
Betweenness centrality of a node v is the sum of the fraction of all pairs shortest paths that pass through v
Compute the shortest-path betweenness centrality for nodes

In [None]:
# nx.betweenness_centrality(graph)
# bet_cen = np.array(nx.betweenness_centrality(graph).values()).mean()

# print "Betweenness centrality (directed) =", nx.betweenness_centrality(graph)
print "Betweenness centrality (directed) =", np.array(nx.betweenness_centrality(graph).values()).mean()
print "Betweenness centrality (undirected) =", np.array(nx.betweenness_centrality(ugraph).values()).mean()

a = nx.betweenness_centrality(graph)
dfIn=pd.DataFrame.from_dict(a,orient='index')
dfIn.columns = ['betweenness centrality']
dfIn = dfIn.sort_values(by=['betweenness centrality'])
dfIn

### Current-flow betweenness centrality
Current-flow betweenness centrality uses an electrical current model for information spreading in contrast to betweenness centrality which uses shortest paths. Current-flow betweenness centrality is also known as random-walk betweenness centrality

In [None]:
# run for largest component
# graph must be connected
# print nx.current_flow_betweenness_centrality(graph)

### Degree assortativity coefficient

In [None]:
#deg_ac = nx.degree_assortativity_coefficient(graph)
print "Degree assortativity coefficient (directed) =", nx.degree_assortativity_coefficient(graph)
print "Degree assortativity coefficient (undirected) =", nx.degree_assortativity_coefficient(ugraph)

### Clustering coefficient

In [None]:
# (cannot be multigraph)
# nx.average_clustering(ugraph)

### Average node connectivity
The average connectivity \bar{\kappa} of a graph G is the average of local node connectivity over all pairs of nodes of G

In [None]:
# nx.edge_connectivity(graph)
# nx.node_connectivity(graph)

# avg_node_con = nx.average_node_connectivity(graph)
print "Average node connectivity (directed) =", nx.average_node_connectivity(graph)
print "Average node connectivity (undirected) =", nx.average_node_connectivity(ugraph)

In [None]:
# intersection_all()
# return a new graph that contains only the edges that exist in all graphs
# all supplied graphs must have the same node set