## Notebook Content

In this notebook we implement the following:
1. Merge all the daily graphs into one graph (graph_61)
2. Create the dataframe edge_list of the graph_61 (graph_weighted_edgeList_61.pkl)

In [48]:
import pickle
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
from dataprep.eda import create_report

## 1. Merge all the daily graphs

In [49]:
graphs = []
for i in range(0,61):
    g = pd.read_pickle('day_graphs/Graph_%s'%i)
    graphs.append(g)
G = nx.compose_all(graphs)

In [50]:
print("The graph G has ", len(G.nodes), " nodes and ", len(G.edges), " edges.")

The graph G has  24564  nodes and  236790  edges.


In [51]:
nx.write_gpickle(G, 'data/graph_61')

## 2. Create the dataframe edge list

##### Compute the weight of each edge in the final graph as the times that this edge has appeared in daily graphs

First create the edge list of the Graph_0 as the initial edge list

In [52]:
accepted_nodes = list(G.nodes)

In [53]:
g_0 = pd.read_pickle('day_graphs/Graph_0')
edgeList_0 = nx.to_pandas_edgelist(g_0, source='source_node', target='target_node')
edgeList_0['weight'] = 1
edgeList_0

Unnamed: 0,source_node,target_node,weight
0,0,1,1
1,0,2,1
2,3,4,1
3,3,5,1
4,6,7,1
5,8,9,1
6,8,10,1


Then for all the next daily graphs update the edge weight if the edge has appeared again, or add it as new row if it has not appeared again

In [54]:
edgeList = edgeList_0
for i in range(1,61):
    g_i = pd.read_pickle('day_graphs/Graph_%s'%i)
    edgeList_i = nx.to_pandas_edgelist(g_i, source='source_node', target='target_node')
    edgeList_i['weight'] = 1
    edgeList = pd.concat([edgeList,edgeList_i]).groupby(['source_node', 'target_node']).sum().reset_index()
edgeList

Unnamed: 0,source_node,target_node,weight
0,0,1,1
1,0,2,1
2,1,20,1
3,1,27,2
4,1,34,1
...,...,...,...
255593,24540,3321,1
255594,24543,3028,1
255595,24543,8477,1
255596,24549,4787,1


In [55]:
edgeList.to_pickle('data/graph_weighted_edgeList_61.pkl')

In [56]:
source = list(edgeList['source_node'])
target = list(edgeList['target_node'])
nodes_in_edgeList = set(source+target)
print("The edge list contains in source and target ", len(nodes_in_edgeList), " unique nodes.")

The edge list contains in source and target  24564  unique nodes.


In [57]:
graph_nodes = list(G.nodes)
print("The graph has ", len(graph_nodes), " unique nodes")

The graph has  24564  unique nodes


In [58]:
create_report(edgeList).show_browser()

  0%|          | 0/612 [00:00<?, ?it/s]