## Prediction

This notebook is exclusively used for prediction. These are the following steps:
- Loading of the trained neural network.
- Importing the data subset to predict (maybe it has to be normalized first, will see).
- Predict right away (since it is already trained).
- Parse the results to a tabular manner and plot them simply.
- Write prediction data to the DB -> this will be the table shown in the dashboard.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.decomposition import PCA
from tensorflow import keras
from sqlalchemy import create_engine
import psycopg2

In [2]:
#import data to predict (from DB)
con = create_engine('postgresql://postgres:##########@ispacevm58.researchstudio.at:5555/ai4mob')
data_pred = pd.read_sql_table('ready_to_pred', con)
data_pred = data_pred[data_pred['day_id'] == 105]
data_pred.shape

(1040, 18)

In [3]:
#take predictors & normalize them
X = data_pred.iloc[:, 8:].values
sc = StandardScaler()
X = sc.fit_transform(X)
pca = PCA(n_components=6)
X = pca.fit_transform(X)

Filtering prediction table to specific date to predict. In this case, predicting the 15th of April 2021 as a sample.

In [4]:
#import trained NN
NNmodel = keras.models.load_model('../src/NeuralNetwork_model.h5')

In [5]:
pred = NNmodel.predict(X)

In [6]:
pred

array([[7.3088268e-03, 9.4913059e-01, 4.0801320e-02, 2.0571097e-03,
        7.0216577e-04],
       [7.5354110e-03, 9.5282602e-01, 3.7223741e-02, 1.7727783e-03,
        6.4212701e-04],
       [6.8133050e-03, 9.5218521e-01, 3.8612723e-02, 1.7785360e-03,
        6.1018293e-04],
       ...,
       [1.0859453e-01, 8.4309810e-01, 4.2504285e-02, 3.0340503e-03,
        2.7689158e-03],
       [1.4114228e-01, 8.1402797e-01, 3.8434029e-02, 3.2945643e-03,
        3.1010592e-03],
       [5.5094888e-03, 8.9216661e-01, 9.7303271e-02, 3.8621845e-03,
        1.1585050e-03]], dtype=float32)

In [7]:
#FOR EVALUATION - need to retrieve y values
#load table from DB 
ohe = OneHotEncoder()
y = data_pred.delay_class.values.reshape(-1,1)
y = ohe.fit_transform(y).toarray()

In [8]:
NNmodel.evaluate(X, y)



[0.5427826642990112,
 0.8211538195610046,
 0.8269794583320618,
 0.8134615421295166]

Once the pre-trained neural network has been used to predict with this sample data, it is time to rescale the predicted values and add them into the final visualization dataframe.

In [9]:
#predicted column must be rescaled 
re_pred = ohe.inverse_transform(pred)
re_pred = re_pred.ravel()
pred = re_pred.tolist()

In [10]:
data_pred['predicted'] = pred

In [11]:
data_pred.head()

Unnamed: 0,index,haltestelle_nr,day_hour,date_id,day_id,hour_id,delay_sec,delay_class,prev_day_del,prev_2day_del,prev_week_del,prev_2week_del,prev_3week_del,prev_4week_del,mean_delay,median_delay,min_delay,max_delay,predicted
330,613,5000205,2021-04-15 09,20210415,105,9,80.25,1,73.0,64.25,130.857143,51.428571,70.5,44.0,89.628968,75.285714,51.428571,140.666667,1
414,732,5000205,2021-04-15 10,20210415,105,10,69.428571,1,71.125,100.333333,95.0,96.285714,48.142857,57.142857,88.457341,95.642857,38.0,130.0,1
498,851,5000205,2021-04-15 11,20210415,105,11,46.25,1,94.5,66.666667,45.6,74.4,65.75,94.0,88.588889,84.45,45.6,132.166667,1
577,969,5000205,2021-04-15 12,20210415,105,12,66.5,1,45.666667,121.444444,83.333333,80.375,86.857143,79.75,82.704861,81.854167,45.666667,121.444444,1
654,1087,5000205,2021-04-15 13,20210415,105,13,73.0,1,87.3,74.0,52.5,76.0,47.222222,57.222222,68.17619,67.7,52.5,87.3,1


At this point, a new field in the table stating if each prediction was right or wrong must be added. This field will be displayed on the dashboard to show the model's performance. Since Grafana does not enable yet the addition of text values, the right/wrong labels must be relabelled to 1 and 0 (1 = right / 0 = wrong).

In [12]:
#adding result field
real = data_pred['delay_class'].tolist()

def ResultField(lst):
    for i in range(len(data_pred)):
        if real[i] == pred[i]:
            lst.append(1)
        else:
            lst.append(0)
    return lst

result = []
data_pred['result'] = ResultField(result)

In [13]:
#clean data table to upload it to DB
data_pred = data_pred[['day_hour', 'hour_id', 'haltestelle_nr', 'predicted', 'result']]

In [15]:
#upload to DB
data_pred.to_sql('sample_pred', con, if_exists='replace')