<h1>
<center>Graph Dataset Construction</center>
</h1>

<font size="3">
The following notebook takes as input the pickle files that represent graph data and outputs them with a format that will be suitable for the current graph kernels approach.

    
In more detail: 

Input: This is a list of 5074 dataframes each dataframe describes a graph for a timestep. Every dataframe has 222 rows that represent the parking sectors and the features are described by the columns.

Output: This is a list of 5074 lists. Each list consists of 3 elements. [set, dict1, dict2]
- Set: Describes the graph edges and have the format = set (int,int) "example = (node1,node2)"
- Dict1: Describes the node features and have the format = dict {key(int) : values(array)} "example = {node1 :  [f1,f2,...,fn]}"
- Dict2: Describes the edge weights and have the format dict = {key(set) : value(int)} "example = {(node1,node2) : weight}"
</font>

<font size="3">
In the data, we have as input, the node ids that represent a parking sector are the same at each time step.
The present approach requires node ids to be unique across the graphs. For this reason, we implemented a kind of mapping so that node ids are unique.   
<br>  
<br>
- PS1: At the end of this notebook, we apply the same procces for ChinckenPox dataset in order to have 2 graph datasets to make experiments using our methods.  
<br>
- PS2: Both Train-test targets are processed in a subsequent notebook.
</font>

## Generals

<font size="3"> 
Packages import and system configurations. 
</font>

In [None]:
import numpy as np
import pandas as pd
import pickle
from tqdm import tqdm

project_path = '/Users/nickkarras/PycharmProjects/Graph_Based_SVR'

<font size="3"> 
Datasets paths. 
</font>

In [None]:
train_dataset_path_park = project_path + '/Data/ParkingViolationPrediction/Init/Train_Dataset_Graph.pkl'
test_dataset_path_park = project_path + '/Data/ParkingViolationPrediction/Init/Test_Dataset_Graph.pkl'
edges_path_park = project_path+'/Data/ParkingViolationPrediction/Init/Edges_Weights.csv'

train_dataset_path_chic = project_path + '/Data/Chickenpox/Init/Chickenpox_Train_data.pkl'
test_dataset_path_chic = project_path + '/Data/Chickenpox/Init/Chickenpox_Test_data.pkl'
edges_path_chic = project_path + '/Data/Chickenpox/Init/Chickenpox_Edges.csv' 

<font size="3"> 
Function that saves objects to pickle files. 
</font>

In [None]:
def save_object(obj, filename):
    with open(filename, 'wb') as outp:  # Overwrites any existing file.
        pickle.dump(obj, outp, pickle.HIGHEST_PROTOCOL)

## Core functionality

<font size="3"> 
The following function takes as input a list that represents node id mapping.
    
    
Returns the set of edges and the dict of edge weights for each graph. Also, change node id according to the given mapping list.
</font>

In [None]:
def edge_weights_preprocessing(edges_path,mapping):
    edges_df = pd.read_csv(edges_path,sep=',', index_col=0)
    edges_df["Node1"] = edges_df["Node1"].map(mapping)
    edges_df["Node2"] = edges_df["Node2"].map(mapping)
    
    edges = edges_df.drop(['Weights'], axis=1)
    tmp_ls = edges.to_numpy().tolist()
    edges = set(map(tuple,tmp_ls))
    
    edge_weights = edges_df.copy()
    edge_weights['Combinations'] = list(zip(edges_df.Node1, edges_df.Node2))
    edge_weights = edge_weights[['Combinations','Weights']]
    edge_weights = edge_weights.set_index('Combinations').T.to_dict('records')
    
    return edges,edge_weights

<font size="3">
The following function create the node features dict for each graph and rename the node ids.
Also create the a list that map the old node id with the new ones.
</font>

In [None]:
def node_features_preprocesing_parking(path):
    with open(path, 'rb') as inp:
        node_features = pickle.load(inp)
    names1 = ['Node_Id','Date_Sin','Holidays','Capacity','temp','humidity','Week_Day_Sin','Month_Sin',
              'Real_Time','Γενικό Νοσοκομείο Θεσσαλονίκης «Γ. Γεννηματάς»', 'Λιμάνι' ,'Δημαρχείο Θεσσαλονίκης',
              'Λευκός Πύργος','Αγορά Καπάνι','Λαδάδικα','Πλατεία Άθωνος','Πλατεία Αριστοτέλους','Ροτόντα',
              'Πλατεία Αγίας Σοφίας','Πλατεία Αντιγονιδών','Μουσείο Μακεδονικού Αγώνα','Πλατεία Ναυαρίνου',
              'Πάρκο ΧΑΝΘ','Ιερός Ναός Αγίου Δημητρίου','ΔΕΘ','ΑΠΘ','Άγαλμα Ελευθερίου Βενιζέλου',
              'Ρωμαϊκή Αγορά Θεσσαλονίκης','Predictions']
    names2 = ['Init_Node_Id','Node_Id']
    mapping_list = []
    n1 = 1
    for i in tqdm (range (0,len(node_features))):
        node_features[i] = node_features[i].sort_values("Slot_id")
        node_features[i] = node_features[i].reset_index()  
        node_features[i]['Init_Node_Id'] = node_features[i].index
        node_features[i]['Node_Id'] = np.arange(n1, n1+len(node_features[i]))
        n1 = n1 + len(node_features[i])        
        mapping_list.append(node_features[i][names2].set_index('Init_Node_Id').T.to_dict('records'))
        
        node_features[i] = node_features[i][names1]
        node_features[i] = node_features[i].set_index('Node_Id').T.to_dict('list')
    return node_features,mapping_list

<font size="3">
The following function combines the above 2 functions and create the list with the final graphs 
</font>

In [None]:
def create_graph(dataset_path,edges_path,mode):
    G = []
    print (f"Creating {mode} Node Features")
    node_features_dict,mapping_list = node_features_preprocesing_parking(dataset_path)
    print (f"Create {mode} Graphs List")
    for i in tqdm (range (0,len(node_features_dict))):
        graph_list=[]
        edges_set,edge_weights_dict = edge_weights_preprocessing(edges_path,mapping_list[i][0])
        graph_list.append(edges_set)
        graph_list.append(node_features_dict[i])
        graph_list.append(edge_weights_dict)
        G.append(graph_list)
    return G

## Create and Save Graphs Parking Data

In [None]:
G_train = create_graph(train_dataset_path_park,edges_path_park,'Train')
save_object(G_train, 'Data/ParkingViolationPrediction/G_Train.pkl')
G_test = create_graph(test_dataset_path_park,edges_path_park,'Test')
save_object(G_test, 'Data/ParkingViolationPrediction/G_Test.pkl')

## Create and Save Graphs ChinckenPox Data

<font size="3">
The following function create the node features dict for each graph and rename the node ids.
Also create the a list that map the old node id with the new ones.
</font>

In [None]:
def node_features_preprocesing_chicken(path):
    with open(path, 'rb') as inp:
        node_features = pickle.load(inp)
        
    names = ['Init_Node_Id','Node_Id']
    mapping_list = []
    n1 = 1
    for i in tqdm (range (0,len(node_features))):
        node_features[i]['Init_Node_Id'] = node_features[i].index
        node_features[i]['Node_Id'] = np.arange(n1, n1+len(node_features[i]))
        n1 = n1 + len(node_features[i])        
        mapping_list.append(node_features[i][names].set_index('Init_Node_Id').T.to_dict('records'))
        node_features[i] = node_features[i].set_index('Node_Id').T.to_dict('list')
    return node_features,mapping_list

<font size="3">
The following function combines the above 2 functions and create the list with the final graphs 
</font>

In [None]:
def create_graph_chic(dataset_path,edges_path,mode):
    G = []
    print (f"Creating {mode} Node Features")
    node_features_dict,mapping_list = node_features_preprocesing_chicken(dataset_path)
    print (f"Create {mode} Graphs List")
    for i in tqdm (range (0,len(node_features_dict))):
        graph_list=[]
        edges_set,edge_weights_dict = edge_weights_preprocessing(edges_path,mapping_list[i][0])
        graph_list.append(edges_set)
        graph_list.append(node_features_dict[i])
        graph_list.append(edge_weights_dict)
        G.append(graph_list)
    return G

In [None]:
G_train = create_graph_chic(train_dataset_path_chic,edges_path_chic,'Train')
save_object(G_train, 'Data/Chickenpox/G2_Train.pkl')
G_test = create_graph_chic(test_dataset_path_chic,edges_path_chic,'Test')
save_object(G_test, 'Data/Chickenpox/G2_Test.pkl')