## **Convolutional Neural Network (CNN) model for Forecasting ML Models**

In this page, the CNN model is trained and tested for stream temperatura prediction considering the air-temperature (Ta), wind (Wind), solar radiation (SR), relative humidity (HR), the day in the year (DY), streamflow (Flow), precipitation (pp), and the shade factor (FS) as predictor variables. The daily stream temperature (Tw) is set as the only response variable.

Clean variables

In [None]:
### Clean variables
from IPython import get_ipython
get_ipython().magic('reset -sf')

### **Reading of data, process and separation in train and test**

In [None]:
### Read Data in CSV format for temporal analysis
from pandas import read_csv
import numpy as np

# Dir:  D:\research\ML_model\new_data
#raw_d = read_csv('006_sb31_8v_norm_2012_2018.csv', header=0, index_col=0)  # For SB 31
#raw_d = read_csv('006_sb59_8v_norm_2006_2012.csv', header=0, index_col=0) # For SB 59
raw_d = read_csv('002_sb31_norm_2012_2017_for_train_test.csv', header=0, index_col=0)  # For SB 31

### Removing/Dropping no-needed variables
data = raw_d.drop(['scenario'],axis=1)
var_tested = list(data.columns.values)
print ("Variables to be tested:", list(data.columns.values)) # Print headers

### convert df pandas to np-array
data = np.asarray(data, dtype=np.float32) # Convert pandas df to np array

### split into Predictor and Response variables
X = data[:, :-1]
y = data[:, -1]
print ("Data dimension (X), (y):", X.shape, y.shape)

# Split into train and test (Random split)
from sklearn.model_selection import train_test_split
# Test-size: 30%, Train-size: 70%
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=1)
print("Training and test data dimensions: \n train_x, train_y, test_x, test_y: \n",
      train_x.shape, test_x.shape, train_y.shape, test_y.shape)

Variables to be tested: ['Ta', 'Wind', 'SR', 'HR', 'DY', 'Flow', 'pp', 'SF', 'Tw']
Data dimension (X), (y): (6438, 8) (6438,)
Training and test data dimensions: 
 train_x, train_y, test_x, test_y: 
 (4506, 8) (1932, 8) (4506,) (1932,)


### Main libraries

In [None]:
import random
from numpy import mean, std
from math import sqrt
from keras.layers import Dense, BatchNormalization, Dropout
from keras.models import Sequential
from keras.callbacks import EarlyStopping
from sklearn.metrics import mean_squared_error
from tensorflow.keras.optimizers import Adam, SGD, RMSprop, Adadelta, Adagrad, Adamax, Nadam, Ftrl


### **Tuning of Hyperparameters**
Eleven parameters were tuned:  Activation function,kernel_initializer (initial_weigths), optimizer, learning_rate, n_epochs, batch_size, number of neurons,first set of hidden layers (layers1), second set of hidden layers (layers2), nomalization, and dropout.

Several iterations were performed to find the best set of hyperparameters.

Running this part can take several hours (more than 30 hours in some occasions)

Results are displayed in a table for a better comparison of the performance of hyperparameters.

In [None]:
act_all, opt_all, l_rate_all, n_epochs_all, batch_size_all = [],[],[],[],[]
n_neur_all, layers1_all, layers2_all =[],[],[]
normaliz_d_all, drop_d_all, drop_rate_all = [],[],[]
perfm1, perfm2 = [],[]

for i in range(10):
  print ("Set # "+str(i+1)+" in process .................")
  ### Set of parameters -------------------------------------------
  ### Parameters to be Tuned
  ### Activation,init_w,optimizer,learning_rate,batch_size,n_epochs,
  ### n_neur,layers1, layers2, nomaliz, dropout

  ### Hyperparameters
  # 1. Activation selection
  activat_n = ['relu','sigmoid','softplus','softsign','tanh','selu','elu','exponential','LeakyReLU']
  #activat_n = ['relu']
  activat_id = random.randint(0, len(activat_n)-1)
  activat_s = activat_n[activat_id]

  # 2. kernel_initializer (itial_weigths)
  init_w = ['normal','uniform','zeros']
  #init_w = ['uniform']
  init_w_id = random.randint(0, len(init_w)-1)
  init_w_s = init_w[init_w_id]

  # 3 & 4. Optimizer and learning-rate selection
  l_rate = 0.005 # random.randint(1,200)/1000
  opt1 = ['SGD', 'Adam', 'RMSprop', 'Adadelta', 'Adagrad', 'Adamax', 'Nadam', 'Ftrl']
  #opt1 = ['Adagrad']
  opt_id = random.randint(0, len(opt1)-1)
  opt2= {'Adam':Adam(learning_rate=l_rate), 'SGD':SGD(learning_rate=l_rate),
                  'RMSprop':RMSprop(learning_rate=l_rate), 'Adadelta':Adadelta(learning_rate=l_rate),
                  'Adagrad':Adagrad(learning_rate=l_rate), 'Adamax':Adamax(learning_rate=l_rate),
                  'Nadam':Nadam(learning_rate=l_rate), 'Ftrl':Ftrl(learning_rate=l_rate)}

  #print ("opt_id:",opt_id, ";   Optmzr:", opt2[opt1[opt_id]].__dict__["_name"])

  # 5 & 6. n_epochs, batch_size,
  n_epochs = 1500 # random.randint(100, 500)
  batch_size = 100 # random.randint(100, 300)

  # 7. number of neurons
  n_neur = 250 # random.randint(100, 300)

  # 8 & 9 number of layers
  layers1 = random.randint(0, 3) # 1 #
  layers2 = random.randint(0, 3) # 1 #

  # 10. nomalization
  normaliz_d = random.randint(0, 1) # 1 #

  # 11. dropout
  drop_d = random.randint(0, 1) # 1 #
  drop_rate = random.randint(50, 80)/100 # 0.5 - 0.8 good values
  ### End set of parameters -----------------------------

  ### Model definition
  #params = [activationL, init_weight, optimizerL, n_epochs, batch_size, learning_rate]
  n_input = len(train_x[0]) # Number of input variables
  model = Sequential()

  ### Hidden layer 0
  #model.add(Dense(n_neur, activation = activat_s, input_dim=n_input, kernel_initializer =init_w_s))# 'uniform'))

  ### CNN layer 0
  #b_model.add(Dense(250, activation = 'relu', input_dim=n_input, kernel_initializer = 'uniform'))
  filters= 128 # 256 # Suggested between 30 and 128
  kernel_size_cnn1 = 3
  kernel_size_cnn2 = 1


  if n_var < 3:
    kernel_size_cnn1 = 1

  model.add(Conv1D(filters, 3, activation='relu', input_shape=(n_input, 1))) #n_filters=256, n_kernel=3

  if conv2_d > 0.5:
    model.add(Conv1D(filters, 1, activation='relu'))

  model.add(MaxPooling1D(padding='same'))
  model.add(Flatten())


  if normaliz_d > 0.5:
    model.add(BatchNormalization())

  ### Hidden layer 1
  for i in range(layers1):
    model.add(Dense(n_neur, activation = activat_s))
    #model.add(Dense(n_neur, activation = activat_s))
    #model.add(Dense(n_neur, activation = activat_s))

  if drop_d > 0.5:
    model.add(Dropout(drop_rate, seed=123))

  ### Hidden layer 2
  for i in range(layers2):
    model.add(Dense(n_neur, activation = activat_s))
    #model.add(Dense(n_neur, activation = activat_s))

  ### Output layer
  model.add(Dense(1)) #model.add(Dense(1, activation='sigmoid'))
  #model.compile(loss='mse', optimizer=params[2]) # Compiling the model
  #model.compile(loss='mse', optimizer=Adam(learning_rate = l_rate)) # 0.8279
  model.compile(loss='mse', optimizer = opt2[opt1[opt_id]]) # 0.8279

  ### Fit the ML-Perceptron model

  # More about 'EarlyStopping': https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/
  # "batch_size" must be > 1 and < n_samples. Ref: https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/
  early_stop = EarlyStopping(monitor='loss', patience=20, verbose=1, mode='auto') #We will wait 'patience=10' epochs before training is stopped
  hist_model = model.fit(train_x, train_y,validation_data=(test_x, test_y),
                        epochs=n_epochs, batch_size=batch_size, verbose=0, callbacks=[early_stop])
  #print(hist_model.history['loss']) # Get list of 'loss' at each epoch
  #all_loss.append(hist_model.history['loss']) # For other error reports see: https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras/
  #all_val_loss.append(hist_model.history['val_loss']) # This loss is from the 'test' data

  print (activat_id, activat_n[activat_id])
  print (opt_id, opt2[opt1[opt_id]].__dict__["_name"])
  print ("learning_rate:", l_rate)
  print ("n_epochs, batch_size:", n_epochs, batch_size)
  print ("neurons, L1, L2: ", n_neur, layers1, layers2)
  #print ("nomalztn:", normaliz_d)
  print ("nomalztn:", ("Yes" if normaliz_d else "No"))
  #print ("Dropout:", drop_d, drop_rate)
  print ("Dropout:", ("Yes" if drop_d else "No"), " ", (drop_rate if drop_d else ""))
  print ("Performance", round(hist_model.history['loss'][-1],4)," ", round(hist_model.history['val_loss'][-1],4))
  print ()

  act_all.append(activat_n[activat_id])
  opt_all.append(opt2[opt1[opt_id]].__dict__["_name"])
  l_rate_all.append(l_rate)
  n_epochs_all.append(n_epochs)
  batch_size_all.append(batch_size)
  n_neur_all.append(n_neur)
  layers1_all.append(layers1)
  layers2_all.append(layers2)
  normaliz_d_all.append("Yes" if normaliz_d else "No")
  drop_d_all.append("Yes" if drop_d else "No")
  drop_rate_all.append(drop_rate)
  perfm1.append(round(hist_model.history['loss'][-1],3))
  perfm2.append(round(hist_model.history['val_loss'][-1],3))



In [None]:
### Print results of tuning
headers = ['Train_err','Test_err','Activat','optimz','learning_r','Epochs','Batch',
           'Neurons','Layers1','Layers2','Normalzn','Dropout','Drop_r']
scores = [perfm1,perfm2,act_all, opt_all, l_rate_all,
      n_epochs_all, batch_size_all, n_neur_all, layers1_all,
      layers2_all, normaliz_d_all, drop_d_all, drop_rate_all
      ]
scores_t = list(map(list, zip(*scores)))
#print (scores_t)

from tabulate import tabulate
print(tabulate(scores_t, headers=headers, tablefmt='orgtbl'))

## **Modeling employing tuned parameters**
Using the model with tuned parameters, stream temperature prediction were performed for eleven sets of predictor variables.

In [None]:
import random
from numpy import mean, std
from math import sqrt
from keras.layers import Dense, BatchNormalization, Dropout
from keras.models import Sequential
from keras.callbacks import EarlyStopping
from sklearn.metrics import mean_squared_error
from tensorflow.keras.optimizers import Adam, SGD, RMSprop, Adadelta, Adagrad, Adamax, Nadam, Ftrl

In [None]:
### Read Data in CSV format for temporal analysis
from pandas import read_csv
import numpy as np

# Dir:  D:\research\ML_model\new_data
#raw_d = read_csv('006_sb31_8v_norm_2012_2018.csv', header=0, index_col=0)  # For SB 31
#raw_d = read_csv('006_sb59_8v_norm_2006_2012.csv', header=0, index_col=0) # For SB 59
#raw_d = read_csv('002_sb31_norm_2012_2017_for_train_test.csv', header=0, index_col=0)  # For SB 31
raw_d = read_csv('004_sb59_norm_2006_2011_for_train_test.csv', header=0, index_col=0)  # For SB 59

### Removing/Dropping no-needed variables
data = raw_d.drop(['scenario'],axis=1)
var_tested = list(data.columns.values)
print ("Variables to be tested:", list(data.columns.values)) # Print headers

### convert df pandas to np-array
data = np.asarray(data, dtype=np.float32) # Convert pandas df to np array

### split into Predictor and Response variables
X = data[:, :-1]
y = data[:, -1]
print ("Data dimension (X), (y):", X.shape, y.shape)

# Split into train and test (Random split)
from sklearn.model_selection import train_test_split
# Test-size: 30%, Train-size: 70%
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=1)
print("Training and test data dimensions: \n train_x, train_y, test_x, test_y: \n",
      train_x.shape, test_x.shape, train_y.shape, test_y.shape)

Variables to be tested: ['Ta', 'Wind', 'SR', 'HR', 'DY', 'Flow', 'pp', 'SF', 'Tw_59']
Data dimension (X), (y): (5850, 8) (5850,)
Training and test data dimensions: 
 train_x, train_y, test_x, test_y: 
 (4095, 8) (1755, 8) (4095,) (1755,)


In [None]:
from keras.layers.convolutional import Conv1D, MaxPooling1D
from keras.layers import Flatten

### MLP-Model definition with best parameters (This setting is not automatically updated)
def best_nn_model(n_inp):
  n_input = n_inp #len(train_x[0]) # Number of input variables
  b_model = Sequential()

  ### CNN layer 0
  #b_model.add(Dense(250, activation = 'relu', input_dim=n_input, kernel_initializer = 'uniform'))
  filters= 128 # 256 # Suggested between 30 and 128

  # The kernel_size_cnn1 must be <= than n_inputs. If n_iputs=2 => kernel_size_cnn1 =2 or 1
  #kernel_size_cnn1 = 3 # if n_predictors < 3 => kernel_size_cnn1 = n_predictors
  #kernel_size_cnn2 = 1

  kernel_size_cnn1 = 3 if n_inp >= 3 else n_inp
  kernel_size_cnn2 = 3 if n_inp >= 5 else 1
  #! print ("n_inp:",n_inp, "    kernel 1:",kernel_size_cnn1, "      kernel 2:",kernel_size_cnn2)

  b_model.add(Conv1D(filters, kernel_size_cnn1, activation='relu', input_shape=(n_input, 1))) #n_filters=256, n_kernel=3
  b_model.add(Conv1D(filters, kernel_size_cnn2, activation='relu'))
  b_model.add(MaxPooling1D(padding='same'))
  b_model.add(Flatten())

  #b_model.add(BatchNormalization())

  ### Hidden layer 1
  b_model.add(Dense(250, activation = 'relu'))
  b_model.add(Dense(250, activation = 'relu'))
  b_model.add(Dense(250, activation = 'relu'))
  #model.add(Dropout(drop_rate, seed=123))

  ### Hidden layer 2
  b_model.add(Dense(250, activation = 'relu'))
  #b_model.add(Dense(250, activation = 'relu'))
  #b_model.add(Dense(250, activation = 'relu'))

  ### Output layer
  b_model.add(Dense(1))

  b_model.compile(loss='mse', optimizer=Adagrad(learning_rate = 0.005))
  return b_model

In [None]:
### Get combination of children
import itertools, copy, heapq

data = raw_d.drop(['scenario'],axis=1)

n_vars = 1  # <------------- SET
root = ['Ta']
listvar = ['Flow', 'DY', 'SF','SR','pp','HR','Wind']
listvar2 = copy.deepcopy(listvar)
c72 = list(itertools.combinations(listvar2,n_vars-1))
print (c72)
print (len(c72))
set_n = []
sets = []
for sid in c72:
  #nnsid = root + list(sid)+['Tw']
  nnsid = root + list(sid)+['Tw_59']
  print ("set_of_vars:", nnsid)
  set_n.append(nnsid)
  st_d = data[nnsid]
  sets.append(st_d)

print (len(sets))

[()]
1
set_of_vars: ['Ta', 'Tw_59']
1


In [None]:
set_n

[['Ta', 'Flow', 'SF', 'SR', 'Wind', 'Tw_59'],
 ['Ta', 'Flow', 'SF', 'pp', 'HR', 'Tw_59'],
 ['Ta', 'Flow', 'SF', 'pp', 'Wind', 'Tw_59'],
 ['Ta', 'Flow', 'SF', 'HR', 'Wind', 'Tw_59'],
 ['Ta', 'Flow', 'SR', 'pp', 'HR', 'Tw_59'],
 ['Ta', 'Flow', 'SR', 'pp', 'Wind', 'Tw_59'],
 ['Ta', 'Flow', 'SR', 'HR', 'Wind', 'Tw_59'],
 ['Ta', 'Flow', 'pp', 'HR', 'Wind', 'Tw_59'],
 ['Ta', 'DY', 'SF', 'SR', 'pp', 'Tw_59'],
 ['Ta', 'DY', 'SF', 'SR', 'HR', 'Tw_59'],
 ['Ta', 'DY', 'SF', 'SR', 'Wind', 'Tw_59'],
 ['Ta', 'DY', 'SF', 'pp', 'HR', 'Tw_59'],
 ['Ta', 'DY', 'SF', 'pp', 'Wind', 'Tw_59'],
 ['Ta', 'DY', 'SF', 'HR', 'Wind', 'Tw_59'],
 ['Ta', 'DY', 'SR', 'pp', 'HR', 'Tw_59'],
 ['Ta', 'DY', 'SR', 'pp', 'Wind', 'Tw_59'],
 ['Ta', 'DY', 'SR', 'HR', 'Wind', 'Tw_59'],
 ['Ta', 'DY', 'pp', 'HR', 'Wind', 'Tw_59'],
 ['Ta', 'SF', 'SR', 'pp', 'HR', 'Tw_59'],
 ['Ta', 'SF', 'SR', 'pp', 'Wind', 'Tw_59'],
 ['Ta', 'SF', 'SR', 'HR', 'Wind', 'Tw_59'],
 ['Ta', 'SF', 'pp', 'HR', 'Wind', 'Tw_59'],
 ['Ta', 'SR', 'pp', 'HR', 'Win

In [None]:
from termcolor import colored
from sklearn.model_selection import train_test_split
data = raw_d.drop(['scenario'],axis=1)

#sets = [set1,set2,set3,set4,set5,set6,set7]
labels=[str(r) for r in sets]
sets_names = [f'M{i}' for i in range(1, 12)]
#print("Models to be tested: ", sets_names)

names = set_n

scores_all_train_sets = []
scores_all_test_sets = []
ccc = 0
for sett in sets:
  #print ("Model M"+str(ccc+1)+"......")
  print (colored("Model M"+str(ccc+1)+"......"+str(names[ccc][:-1]), 'red'))
  sett = np.asarray(sett, dtype=np.float32) # Convert pandas df to np array

  ### split into Predictor and Response variables
  Xn = sett[:, :-1]
  yn = sett[:, -1]
  print ("Data dimension (Xn), (yn):", Xn.shape, yn.shape)

  # Split into train and test (Random split)

  # Test-size: 30%, Train-size: 70%
  train_xn, test_xn, train_yn, test_yn = train_test_split(Xn, yn, test_size=0.3, random_state=1)
  print("Training and test data dimensions: \n train_x, train_y, test_x, test_y: \n",
        train_xn.shape, test_xn.shape, train_yn.shape, test_yn.shape)

  n_epochs = 1500   #  <-----  you might SET
  batch_size = 100

  all_loss, all_val_loss = [],[]
  all_preds, all_scores_train, all_scores_test = [], [], []

  n_repeats = 5  #     <----------------------- SET
  for i in range(n_repeats):
    print ("Sim", i+1)
    b_model = best_nn_model(len(train_xn[0]))   #  -----> call model
    early_stop = EarlyStopping(monitor='loss', patience=20, verbose=0, mode='auto') #We will wait 'patience=10' epochs before training is stopped
    hist_model = b_model.fit(train_xn, train_yn,validation_data=(test_xn, test_yn),
                          epochs=n_epochs, batch_size=batch_size, verbose=0, callbacks=[early_stop])
    #print(hist_model.history['loss']) # Get list of 'loss' at each epoch
    all_loss.append(hist_model.history['loss']) # For other error reports see: https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras/
    all_val_loss.append(hist_model.history['val_loss']) # This loss is from the 'test' data

    print ("Loss:   ", hist_model.history['loss'][-1])
    print ("Val_loss:", hist_model.history['val_loss'][-1])


    ### Prediction
    ### Prediction on train
    pred_yn_train = b_model.predict(train_xn, verbose=0)
    pred_yn_train_flat = [item for sublist in pred_yn_train.tolist() for item in sublist]
    #print ("Prediction on train data (Sim",j+1, ") \n", pred_y_train_flat)
    all_preds.append(pred_yn_train_flat)
    #pred_y_test_flat.to_csv('predict.csv')  # save updated dataset

    # Prediction Error. Root mean squared error or RMSE
    error_train = sqrt(mean_squared_error(train_yn, pred_yn_train))#error = measure_rmse(test, predictions)
    #print("RMSE train (sim",j+1,"):",'  %.3f' % error_train) #E:

    all_scores_train.append(error_train) #all_scores.append(scores)
    #print (all_scores)

    ### Prediction on test
    pred_yn_test = b_model.predict(test_xn, verbose=0)
    pred_yn_test_flat = [item for sublist in pred_yn_test.tolist() for item in sublist]
    #print ("Prediction on test data (Sim",j+1, ") \n", pred_y_test_flat)
    all_preds.append(pred_yn_test_flat)
    #pred_y_test_flat.to_csv('predict.csv')  # save updated dataset

    # Prediction Error. Root mean squared error or RMSE
    error_test = sqrt(mean_squared_error(test_yn, pred_yn_test))#error = measure_rmse(test, predictions)
    #print("RMSE test(sim",j+1,"):",'   %.3f' % error_test) #E:

    all_scores_test.append(error_test) #all_scores.append(scores)
    #print (all_scores)
    print ()

  #print ("Tested variables (Header):", var_tested) # print column names
  # summarize and plot scores
  print ("Note: loss is MSE")
  scores_m, score_std = mean(all_scores_test), std(all_scores_test)
  print('%s: %.3f RMSE_avge (+/- %.3f)' % ('Average of '+str(n_repeats)+" repetitions", scores_m, score_std))


  #all_sets.append(all_scores_test) #Store scores of all models (M1, M2,...) as list od lists
  scores_all_train_sets.append(all_scores_train) #Store scores of all models (M1, M2,...) as list od lists
  scores_all_test_sets.append(all_scores_test) #Store scores of all models (M1, M2,...) as list od lists

  print ()
  ccc += 1
  if ccc%10 == 0:
    print ("Preliminar print")
    print (scores_all_train_sets)
    print (scores_all_test_sets)

print (scores_all_train_sets)
print (scores_all_test_sets)

[31mModel M1......['Ta'][0m
Data dimension (Xn), (yn): (5850, 1) (5850,)
Training and test data dimensions: 
 train_x, train_y, test_x, test_y: 
 (4095, 1) (1755, 1) (4095,) (1755,)
Sim 1
Loss:    2.210084915161133
Val_loss: 2.1816301345825195

Sim 2
Loss:    2.2057275772094727
Val_loss: 2.1699931621551514

Sim 3
Loss:    2.2049720287323
Val_loss: 2.174150228500366

Sim 4
Loss:    2.209399700164795
Val_loss: 2.1659343242645264

Sim 5
Loss:    2.2306597232818604
Val_loss: 2.2189114093780518

Note: loss is MSE
Average of 5 repetitions: 1.477 RMSE_avge (+/- 0.006)

[[1.48825834161544, 1.4823845618657165, 1.4838341781789195, 1.4825316377194078, 1.499803450740757]]
[[1.4770341553816353, 1.4730897462726873, 1.4744999118622515, 1.4717114400191043, 1.4896012507430336]]


In [None]:
from google.colab import files
import pandas as pd
all_sets_df = pd.DataFrame(scores_all_train_sets)
all_sets_df.to_csv('SB31_cnn_train.csv')
all_sets_df = pd.DataFrame(scores_all_test_sets)
all_sets_df.to_csv('SB31_cnn_test.csv')

In [None]:
#st1 = ['Ta', 'DY', 'SR', 'HR','Tw']
st20 = ['Ta', 'DY', 'SR', 'pp','HR', 'Wind', 'Tw_59']
st21 = ['Ta', 'SF', 'SR', 'pp', 'HR', 'Wind', 'Tw_59']

set_n = [st20, st21]

st_d20 = data[st20]
st_d21 = data[st21]

sets = [st_d20, st_d21]

In [None]:
from termcolor import colored
from sklearn.model_selection import train_test_split
data = raw_d.drop(['scenario'],axis=1)

#sets = [set1,set2,set3,set4,set5,set6,set7]
labels=[str(r) for r in sets]
sets_names = [f'M{i}' for i in range(1, 12)]
#print("Models to be tested: ", sets_names)

names = set_n

scores_all_train_sets = []
scores_all_test_sets = []
ccc = 0
for sett in sets:
  #print ("Model M"+str(ccc+1)+"......")
  print (colored("Model M"+str(ccc+1)+"......"+str(names[ccc][:-1]), 'red'))
  sett = np.asarray(sett, dtype=np.float32) # Convert pandas df to np array

  ### split into Predictor and Response variables
  Xn = sett[:, :-1]
  yn = sett[:, -1]
  print ("Data dimension (Xn), (yn):", Xn.shape, yn.shape)

  # Split into train and test (Random split)

  # Test-size: 30%, Train-size: 70%
  train_xn, test_xn, train_yn, test_yn = train_test_split(Xn, yn, test_size=0.3, random_state=1)
  print("Training and test data dimensions: \n train_x, train_y, test_x, test_y: \n",
        train_xn.shape, test_xn.shape, train_yn.shape, test_yn.shape)

  n_epochs = 1500   #  <-----  you might SET
  batch_size = 100

  all_loss, all_val_loss = [],[]
  all_preds, all_scores_train, all_scores_test = [], [], []

  n_repeats = 5  #     <----------------------- SET
  for i in range(n_repeats):
    print ("Sim", i+1)
    b_model = best_nn_model(len(train_xn[0]))   #  -----> call model
    early_stop = EarlyStopping(monitor='loss', patience=20, verbose=1, mode='auto') #We will wait 'patience=10' epochs before training is stopped
    hist_model = b_model.fit(train_xn, train_yn,validation_data=(test_xn, test_yn),
                          epochs=n_epochs, batch_size=batch_size, verbose=0, callbacks=[early_stop])
    #print(hist_model.history['loss']) # Get list of 'loss' at each epoch
    all_loss.append(hist_model.history['loss']) # For other error reports see: https://machinelearningmastery.com/display-deep-learning-model-training-history-in-keras/
    all_val_loss.append(hist_model.history['val_loss']) # This loss is from the 'test' data

    print ("Loss:   ", hist_model.history['loss'][-1])
    print ("Val_loss:", hist_model.history['val_loss'][-1])


    ### Prediction
    ### Prediction on train
    pred_yn_train = b_model.predict(train_xn, verbose=0)
    pred_yn_train_flat = [item for sublist in pred_yn_train.tolist() for item in sublist]
    #print ("Prediction on train data (Sim",j+1, ") \n", pred_y_train_flat)
    all_preds.append(pred_yn_train_flat)
    #pred_y_test_flat.to_csv('predict.csv')  # save updated dataset

    # Prediction Error. Root mean squared error or RMSE
    error_train = sqrt(mean_squared_error(train_yn, pred_yn_train))#error = measure_rmse(test, predictions)
    #print("RMSE train (sim",j+1,"):",'  %.3f' % error_train) #E:

    all_scores_train.append(error_train) #all_scores.append(scores)
    #print (all_scores)

    ### Prediction on test
    pred_yn_test = b_model.predict(test_xn, verbose=0)
    pred_yn_test_flat = [item for sublist in pred_yn_test.tolist() for item in sublist]
    #print ("Prediction on test data (Sim",j+1, ") \n", pred_y_test_flat)
    all_preds.append(pred_yn_test_flat)
    #pred_y_test_flat.to_csv('predict.csv')  # save updated dataset

    # Prediction Error. Root mean squared error or RMSE
    error_test = sqrt(mean_squared_error(test_yn, pred_yn_test))#error = measure_rmse(test, predictions)
    #print("RMSE test(sim",j+1,"):",'   %.3f' % error_test) #E:

    all_scores_test.append(error_test) #all_scores.append(scores)
    #print (all_scores)
    print ()

  #print ("Tested variables (Header):", var_tested) # print column names
  # summarize and plot scores
  print ("Note: loss is MSE")
  scores_m, score_std = mean(all_scores_test), std(all_scores_test)
  print('%s: %.3f RMSE_avge (+/- %.3f)' % ('Average of '+str(n_repeats)+" repetitions", scores_m, score_std))


  #all_sets.append(all_scores_test) #Store scores of all models (M1, M2,...) as list od lists
  scores_all_train_sets.append(all_scores_train) #Store scores of all models (M1, M2,...) as list od lists
  scores_all_test_sets.append(all_scores_test) #Store scores of all models (M1, M2,...) as list od lists

  print ()
  ccc += 1

  if ccc%10 == 0:
    print ("Preliminar print")
    print (scores_all_train_sets)
    print (scores_all_test_sets)

print (scores_all_train_sets)
print (scores_all_test_sets)


[31mModel M1......['Ta', 'Flow'][0m
Data dimension (Xn), (yn): (5850, 2) (5850,)
Training and test data dimensions: 
 train_x, train_y, test_x, test_y: 
 (4095, 2) (1755, 2) (4095,) (1755,)
Sim 1
Epoch 164: early stopping
Loss:    1.8376686573028564
Val_loss: 1.911328673362732

Sim 2
Epoch 218: early stopping
Loss:    1.8203673362731934
Val_loss: 1.9008475542068481

Sim 3
Epoch 126: early stopping
Loss:    1.8375784158706665
Val_loss: 1.9079190492630005

Sim 4
Epoch 125: early stopping
Loss:    1.8425848484039307
Val_loss: 1.9177422523498535

Sim 5
Epoch 195: early stopping
Loss:    1.8223576545715332
Val_loss: 1.89572274684906

Note: loss is MSE
Average of 5 repetitions: 1.381 RMSE_avge (+/- 0.003)

[31mModel M2......['Ta', 'DY'][0m
Data dimension (Xn), (yn): (5850, 2) (5850,)
Training and test data dimensions: 
 train_x, train_y, test_x, test_y: 
 (4095, 2) (1755, 2) (4095,) (1755,)
Sim 1
Epoch 355: early stopping
Loss:    1.697168231010437
Val_loss: 1.672511100769043

Sim 2
Epoc

In [None]:
from google.colab import files
import pandas as pd
all_sets_df = pd.DataFrame(scores_all_train_sets)
all_sets_df.to_csv('SB31_cnn_train.csv')
all_sets_df = pd.DataFrame(scores_all_test_sets)
all_sets_df.to_csv('SB31_cnn_test.csv')


In [None]:
### Plot of results
import matplotlib.pyplot as plt
scores_all_models = all_sets # dd1 #
#sets_names = ['M-32-21', 'M-256-21','M-256-31','M-32-31','M-128-31'] # It is computed above
### Plot performance of all the models
fig = plt.figure(figsize=(8, 4), dpi=80)
plt.boxplot(scores_all_models, labels=sets_names, showmeans=1,
            meanprops={"marker":"s","markersize":"4","markerfacecolor":"white",
                       "markeredgecolor":"blue", "markeredgewidth":"2"})

plt.xlabel("CNN Model")
plt.ylabel("RMSE")
plt.grid()
plt.show()

fig.savefig('cnn_sb31.png', dpi=540)

### Resources to print as pdf

In [None]:
!apt-get install texlive texlive-xetex texlive-latex-extra pandoc
!pip install pypandoc
!sudo apt-get install texlive-xetex texlive-fonts-recommended texlive-plain-generic

In [None]:
!jupyter nbconvert --to pdf /content/drive/MyDrive/Colab_Notebooks/MLP_models_for_classes/011_cnn_sb31_submit2.ipynb

[NbConvertApp] Converting notebook /content/drive/MyDrive/Colab_Notebooks/MLP_models_for_classes/011_cnn_sb31_submit2.ipynb to pdf
[NbConvertApp] Support files will be in 011_cnn_sb31_submit2_files/
[NbConvertApp] Making directory ./011_cnn_sb31_submit2_files
[NbConvertApp] Writing 77222 bytes to ./notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: ['xelatex', './notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: ['bibtex', './notebook']
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 165716 bytes to /content/drive/MyDrive/Colab_Notebooks/MLP_models_for_classes/011_cnn_sb31_submit2.pdf
