<h1>
<center> Experiments with SVM Kernels And K-Nearest Neighbor</center>
</h1>

<font size="3"> 
In notebook, we train and test various regression models (SVR kernels, KNN) from the sklearn library.
    

All experiments were done in non-graph-based approaches where the information on edges and edge weights was not used.
    
    
The data were converted to a common dataframe format and we use only the raw data for testing.

In summary, this notebook was created in order to compare the sklearn models with our graph kernel technique that we developed in the notebook "Graph_Kernel_SVR.ipynb"
    
     Ps: We use ParkingViolation and Chickenpox datasets for our experiments  
</font>

## Generals

<font size="3"> 
Packages import and system configurations. 
</font>

In [None]:
import numpy as np
from sklearn.neighbors import KNeighborsRegressor
from sklearn import svm
import pandas as pd
import pickle
from tqdm import tqdm
from datetime import datetime as dt
import tensorflow as tf
from sklearn import metrics
from sklearn.model_selection import RandomizedSearchCV
import matplotlib.pyplot as plt
import multiprocessing

cores = multiprocessing.cpu_count()-2
project_path = '/Users/nickkarras/PycharmProjects/Graph_Based_SVR'

<font size="3"> 
Datasets paths. 
</font>

In [None]:
#ParkingViolation
train_set_path_park = project_path + '/Data/ParkingViolationPrediction/Init/Train_Dataset_Graph.pkl'
test_set_path_park = project_path + '/Data/ParkingViolationPrediction/Init/Test_Dataset_Graph.pkl'
train_targets_path_park = project_path + '/Data/ParkingViolationPrediction/Init/Train_Targets.csv'
test_targets_path_park = project_path + '/Data/ParkingViolationPrediction/Init/Test_Targets.csv'
train_mask_path_park = project_path + '/Data/ParkingViolationPrediction/Init/Train_Mask.csv'
test_mask_path_park = project_path + '/Data/ParkingViolationPrediction/Init/Test_Mask.csv'

#ChickePox
train_set_path_chic = project_path + '/Data/Chickenpox/Init/Chickenpox_Train_data.pkl'
test_set_path_chic = project_path + '/Data/Chickenpox/Init/Chickenpox_Test_data.pkl'
train_targets_path_chic = project_path + '/Data/Chickenpox/Init/Chickenpox_Train_targets.csv'
test_targets_path_chic = project_path + '/Data/Chickenpox/Init/Chickenpox_Test_targets.csv'
train_mask_path_chic = None
test_mask_path_chic = None

## Data Preprocessing 

<font size="3"> 
A function that takes as input the data paths and return the data or a subset of them.
</font>

In [None]:
def data_load(train_set_path,test_set_path,train_targets_path,test_targets_path,use_mask,train_mask_path,test_mask_path,subset,train_size,test_size):
    with open(train_set_path, 'rb') as inp:
        train_set = pickle.load(inp)
    with open(test_set_path, 'rb') as inp:
        test_set = pickle.load(inp)
      
    train_targets = pd.read_csv(train_targets_path,index_col=0)
    test_targets = pd.read_csv(test_targets_path,index_col=0)
    if use_mask:
        train_mask = pd.read_csv(train_mask_path,index_col=0)
        test_mask = pd.read_csv(test_mask_path,index_col=0)
    else:
        train_mask = None
        test_mask = None 
        
    if subset:
        train_set,test_set,train_targets,test_targets,train_mask,test_mask = get_subset(train_set,test_set,
                                        train_targets,test_targets,use_mask,train_mask,test_mask,train_size,test_size)
    
    return train_set,test_set,train_targets,test_targets,train_mask,test_mask  

<font size="3"> 
A function that takes the datasets and return a subset for each data accoriding the given data-sizes.
</font>

In [None]:
def get_subset(train_set,test_set,train_targets,test_targets,use_mask,train_mask,test_mask,train_size,test_size):
    train_set = train_set[0:train_size]
    test_set = test_set[0:test_size]
    train_targets = train_targets.iloc[:,:train_size]
    test_targets = test_targets.iloc[:,:test_size]
    if use_mask:
        train_mask = train_mask.iloc[:,:train_size]
        test_mask = test_mask.iloc[:,:test_size]
    else:
        train_mask = None
        test_mask = None
        
    return train_set,test_set,train_targets,test_targets,train_mask,test_mask

<font size="3"> 
A function takes a list of dataframes that descibes the features, a dataframe with the targets and a mask.

Return only the raw X and y on dataframe format.
</font>

In [None]:
def data_preprocess(data_set,y,mask):
    names = ['Date_Sin','Holidays','Capacity','temp','humidity','Week_Day_Sin','Month_Sin','Real_Time','Γενικό Νοσοκομείο Θεσσαλονίκης «Γ. Γεννηματάς»', 'Λιμάνι' ,'Δημαρχείο Θεσσαλονίκης','Λευκός Πύργος','Αγορά Καπάνι','Λαδάδικα','Πλατεία Άθωνος','Πλατεία Αριστοτέλους','Ροτόντα','Πλατεία Αγίας Σοφίας','Πλατεία Αντιγονιδών','Μουσείο Μακεδονικού Αγώνα','Πλατεία Ναυαρίνου','Πάρκο ΧΑΝΘ','Ιερός Ναός Αγίου Δημητρίου','ΔΕΘ','ΑΠΘ','Άγαλμα Ελευθερίου Βενιζέλου','Ρωμαϊκή Αγορά Θεσσαλονίκης','Predictions']
    for i in tqdm (range (0,len(data_set))):
        data_set[i] = data_set[i].sort_values("Slot_id")
        data_set[i] = data_set[i].set_index("Slot_id")
        data_set[i] = data_set[i][names]
        
        data_set[i] = data_set[i].join(y.iloc[:,i])
        data_set[i] = data_set[i].set_axis([*data_set[i].columns[:-1], 'Target'], axis=1, inplace=False)
        
        data_set[i] = data_set[i].join(mask.iloc[:,i])
        data_set[i] = data_set[i].set_axis([*data_set[i].columns[:-1], 'Mask'], axis=1, inplace=False)
        
        data_set[i] = data_set[i].loc[data_set[i]['Mask'] == 1]
        
    data_set = pd.concat(data_set).reset_index()
    data_set = data_set.drop(['Mask','Slot_id'], axis=1)
    
    X = data_set.drop(['Target'], axis=1)
    y = data_set['Target']
    data_set = []
    return X,y

<font size="3"> 
A function that apply a random grid search with 4 fold CV in order to find the best hyper-parameters of a svr model.

Finally it returns the object with the results.
</font>

In [None]:
def svr_random_grid_search_cv(parameters,X_train,y_train,cores):
    SVR = svm.SVR()
    SVR_random = RandomizedSearchCV(estimator = SVR,param_distributions=parameters,n_iter=16,cv=3,verbose=2,
                                    random_state=42,n_jobs=cores,scoring='neg_mean_absolute_error')
    SVR_random.fit(X_train,y_train)
    print(f"\nBest model have been fitting with the following parameters: {SVR_random.best_params_}")
    return SVR_random

<font size="3"> 
A function that trains an SMV regression model with the best parameters found from the grid search.

Function prints calculated error metrics (mae,mse) for train and test set.
</font>

In [None]:
def svr_model_training(X_train,X_test,y_train,y_test,SVR_random):
    start = dt.now()
    SVR = SVR_random.best_estimator_
    SVR.fit(X_train, y_train)
    running_secs = (dt.now() - start).seconds
    print (f"\nSVR training have finished succesfully in {running_secs} seconds")
    
    y_train_predicted = SVR.predict(X_train)
    train_MAE = round(metrics.mean_absolute_error(y_train, y_train_predicted),5)
    print (f"The Mean Abslolute Error (MAE) that have been calculated for train set is: {train_MAE}")
    test_MSE = round(metrics.mean_squared_error(y_train, y_train_predicted),5)
    print (f"The Mean Squared Error (MSE) that have been calculated for train set is: {test_MSE}")     
    
    y_predicted = SVR.predict(X_test)
    MAE = round(metrics.mean_absolute_error(y_test, y_predicted),5)
    print (f"The Mean Abslolute Error (MAE) that have been calculated for test set is: {MAE}")
    MSE = round(metrics.mean_squared_error(y_test, y_predicted),5)
    print (f"The Mean Squared Error (MSE) that have been calculated for test set is: {MSE}")
    return MAE,MSE,y_predicted

<font size="3"> 
A function that trains an 101-KNeighborsRegressor model.

Function returns calculated error metrics (mae,mse) and predictions.
</font>

In [None]:
def knn_model_training(X_train,X_test,y_train,y_test):
    start = dt.now()
    KNN = KNeighborsRegressor(n_neighbors=101)
    KNN.fit(X_train, y_train)
    running_secs = (dt.now() - start).seconds
    
    y_train_predicted = KNN.predict(X_train)
    print (f"\nKNN training have finished succesfully in {running_secs} seconds")
    train_MAE = round(metrics.mean_absolute_error(y_train, y_train_predicted),5)
    print (f"The Mean Abslolute Error (MAE) that have been calculated for train set is: {train_MAE}")
    test_MSE = round(metrics.mean_squared_error(y_train, y_train_predicted),5)
    print (f"The Mean Squared Error (MSE) that have been calculated for train set is: {test_MSE}")     
    
    y_predicted = KNN.predict(X_test)
    MAE = round(metrics.mean_absolute_error(y_test, y_predicted),5)
    print (f"The Mean Abslolute Error (MAE) that have been calculated for test set is: {MAE}")
    MSE = round(metrics.mean_squared_error(y_test, y_predicted),5)
    print (f"The Mean Squared Error (MSE) that have been calculated for test set is: {MSE}")
    return MAE,MSE

<font size="3">
A function tha takes the predicitons and the true values in ordet to create a plot with them.
</font>

In [None]:
def plot_actuals_predictions(predicted_value,true_value,data_name):
    plt.figure(figsize=(7,7))
    plt.scatter(true_value, predicted_value, c='crimson')
    p1 = max(max(predicted_value), max(true_value))
    p2 = min(min(predicted_value), min(true_value))
    plt.title('Actual vs Predicted Values')
    plt.plot([p1, p2], [p1, p2], 'b-')
    plt.xlabel('True Values', fontsize=12)
    plt.ylabel('Predictions', fontsize=12)
    plt.axis('equal')
    #plt.savefig('Exports/SVR_Predictions_' + data_name + '.pdf')
    plt.show()

<font size="3">
A function that create a dataframe with cross validation results and save it to a csv file
</font>

In [None]:
def get_results_df(SVR_random,project_path,data_name):
    results = pd.DataFrame(SVR_random.cv_results_)
    results = results.sort_values(by=['rank_test_score'])
    results = results.reset_index()
    results = results.drop(['index'], axis=1)
    results.to_csv(project_path +'/Exports/CV_results_'+ data_name +'.csv')
    print('\nAll Cross Validation Results are described by the dataframe bellow:')
    return results

## 1. Functionality Combinations for Parking Data

### 1. Data

In [None]:
train_set,test_set,train_targets,test_targets,train_mask,test_mask =  data_load(train_set_path_park,test_set_path_park,
                                train_targets_path_park,test_targets_path_park,True,train_mask_path_park,test_mask_path_park,False,25,6)

X_train,y_train = data_preprocess(train_set,train_targets,train_mask)
X_test,y_test = data_preprocess(test_set,test_targets,test_mask)

### 1. SVR

In [None]:
parameters = {'kernel': ['rbf','linear','sigmoid','poly'],'tol': [0.01,0.001],'gamma': ['scale','auto']}

SVR_random = svr_random_grid_search_cv(parameters,X_train,y_train,cores)
svr_MAE,svr_MSE,y_predicted = svr_model_training(X_train,X_test,y_train,y_test,SVR_random)
plot_actuals_predictions(y_predicted,y_test,'Parking')

results = get_results_df(SVR_random,project_path,'Parking')
results

### 1. KNN

In [None]:
knn_MAE,knn_MSE = knn_model_training(X_train,X_test,y_train,y_test)

## 2. Functionality Combinations for ChickenPox Data

In [None]:
def data_preprocess_chic(data_set,y):
    for i in tqdm (range (0,len(data_set))):       
        data_set[i] = data_set[i].join(y.iloc[:,i])
        data_set[i] = data_set[i].set_axis([*data_set[i].columns[:-1], 'Target'], axis=1, inplace=False)
        
    data_set = pd.concat(data_set).reset_index()
    
    X = data_set.drop(['Target'], axis=1)
    y = data_set['Target']
    data_set = []
    return X,y

### 2. Data

In [None]:
train_set,test_set,train_targets,test_targets,train_mask,test_mask =  data_load(train_set_path_chic,test_set_path_chic,
                                train_targets_path_chic,test_targets_path_chic,False,train_mask_path_chic,test_mask_path_chic,False,20,5)

X_train,y_train = data_preprocess_chic(train_set,train_targets)
X_test,y_test = data_preprocess_chic(test_set,test_targets)

### 2. SVR

In [None]:
parameters = {'kernel': ['rbf','linear','sigmoid','poly'],'tol': [0.01,0.001],'gamma': ['scale','auto']}

SVR_random = svr_random_grid_search_cv(parameters,X_train,y_train,cores)
svr_MAE,svr_MSE,y_predicted = svr_model_training(X_train,X_test,y_train,y_test,SVR_random)
plot_actuals_predictions(y_predicted,y_test,'ChickePox')

results = get_results_df(SVR_random,project_path,'ChickePox')
results

### 2. KNN

In [None]:
knn_MAE,knn_MSE = knn_model_training(X_train,X_test,y_train,y_test)