<a href="https://colab.research.google.com/github/mariajosemv/Skopt-hyperparameter-tutorial/blob/master/scikit_optimize_regression_tutorial.ipynb." target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Initialization

In [60]:
import pandas as pd
import numpy as np
import io
from google.colab import files

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [61]:
%cd '/content/drive/My Drive/Colab Notebooks/curso-redes-neuronales/proyecto-del-curso'
%ls

/content/drive/My Drive/Colab Notebooks/curso-redes-neuronales/proyecto-del-curso
cars.parquet        data-engineer.ipynb             experiment
craiglist_cars.csv  design-training-and-evaluation  experiment-back-up


In [62]:
cars = pd.read_parquet('cars.parquet')

# Train, validation and test division data

In [63]:
target = cars['price']
cars.drop('price', axis=1, inplace=True)

In [64]:
from sklearn.model_selection import train_test_split
# Train/test 80:20
x_train, x_test, y_train, y_test = train_test_split(cars, target, test_size=0.2,random_state=2020)
# Train/Validation 90:10
x_train, x_val, y_train, y_val = train_test_split(x_train,y_train, test_size=0.1, random_state=2020)

print("Shape of x_train:",x_train.shape)
print("Shape of x_test:",x_test.shape)
print("Shape of x_val:",x_val.shape)
print("Shape of y_train:",y_train.shape)
print("Shape of y_test:",y_test.shape)
print("Shape of y_val:",y_val.shape)

Shape of x_train: (312869, 99)
Shape of x_test: (86909, 99)
Shape of x_val: (34764, 99)
Shape of y_train: (312869,)
Shape of y_test: (86909,)
Shape of y_val: (34764,)


## Standarization

In [65]:
print(x_train.shape)
print(y_train.shape)

(312869, 99)
(312869,)


In [66]:
y_train = y_train.values.reshape(-1,1)
y_test = y_test.values.reshape(-1,1)
y_val = y_val.values.reshape(-1,1)

In [67]:
from sklearn.preprocessing import StandardScaler

# scaler for x
scaler = StandardScaler()
scaler.fit(x_train) 
x_train_scaled = scaler.transform(x_train)
x_val_scaled = scaler.transform(x_val)
x_test_scaled = scaler.transform(x_test)

# scaler for y
scaler2 = StandardScaler()
scaler2.fit(y_train) 
y_train_scaled = scaler2.transform(y_train)
y_val_scaled = scaler2.transform(y_val)
y_test_scaled = scaler2.transform(y_test)

# Estimated Arquitecture

- The input data has 99 entries, so the entry number of neurons would be 99 x 2 = 198 $\Rightarrow$ $2^8$ = 256. 
- As we are facing a regression problem, the activation function will be **Linear** and the metric will be **MSE**.

**Important**:

```
K.clear_session()
tensorflow.compat.v1.reset_default_graph()    
```

Running these two lines of code can solve a lot of the TensorFlow errors that seem impossible to read. They clear much of the information tensor flow has stored. 

> Run this code (i. e. the function `reset()`) before every hyperparameter or anything that makes a new Keras/TensorFlow model. 



In [None]:
def reset():
  K.clear_session()
  tensorflow.compat.v1.reset_default_graph()

reset()

# Optimization Functions
Tutorial guide:
- [Medium](https://medium.com/@crawftv/parameter-hyperparameter-tuning-with-bayesian-optimization-7acf42d348e1)
- [GitHub](https://github.com/crawftv/Skopt-hyperparameter-tutorial/blob/master/scikit_optimize_tutorial.ipynb)

See also:
- [Blairhudson](http://blairhudson.com/blog/posts/optimising-hyper-parameters-efficiently-with-scikit-optimize/)

In [68]:
!pip install scikit-optimize
import skopt

from skopt import gbrt_minimize, gp_minimize
from skopt.utils import use_named_args
from skopt.space import Real, Categorical, Integer  

import tensorflow
from tensorflow.python.keras import backend as K



## 1. Defining hyperparameters to optimize

In [69]:
import math
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.utils import plot_model

input_shape= x_train_scaled[0].shape
input_shape[0]

potency = int(round(math.log(input_shape[0]*2,2),0))
potency

2**potency

256

In [70]:
dim_num_dense_layers = Integer(low=1, high=4, name='num_dense_layers')
dim_num_input_nodes = Integer(low=1, high=512, name='num_input_nodes')
dim_num_dense_nodes = Integer(low=1, high=2**potency, name='num_dense_nodes')
dim_activation = Categorical(categories=['relu', 'sigmoid'],
                             name='activation')
dim_batch_size = Integer(low=16, high=2048, name='batch_size')
dim_num_epochs = Integer(low=3,high=100, name='num_epochs')

dimensions = [dim_num_dense_layers,
              dim_num_input_nodes,
              dim_num_dense_nodes,
              dim_activation,
              dim_batch_size,
              #dim_adam_decay,
             ]
default_parameters = [1,2**potency, 13, 'relu',64]

## 2. Create models that will be tested

In [71]:
def create_model(num_dense_layers,num_input_nodes,
                 num_dense_nodes, activation):
    #start the model making process and create our first layer
    model = Sequential()
    model.add(Dense(num_input_nodes, input_shape= input_shape, activation=activation
                   ))
    #create a loop making a new dense layer for the amount passed to this model.
    #naming the layers helps avoid tensorflow error deep in the stack trace.
    for i in range(num_dense_layers):
        name = 'layer_dense_{0}'.format(i+1)
        model.add(Dense(num_dense_nodes,
                        activation=activation,
                        name=name
                 ))
    # add dropout to avoid overfitting
    model.add(Dropout(0.2))
    
    #add our regression layer.
    model.add(Dense(1,activation='linear'))
    
    #setup our optimizer and compile
    model.compile(optimizer = "adam",loss="mse",metrics=["mean_absolute_error"])

    return model

## Fitness function

We use create_model to create our model, fit the model, print the accuracy, and delete the model.

In [74]:
@use_named_args(dimensions=dimensions)
def fitness(num_dense_layers, num_input_nodes, 
            num_dense_nodes,activation, batch_size):

    model = create_model(num_dense_layers=num_dense_layers,
                         num_input_nodes=num_input_nodes,
                         num_dense_nodes=num_dense_nodes,
                         activation=activation,
                        )
    

    #named blackbox becuase it represents the structure
    blackbox = model.fit(x=x_train_scaled,
                        y=y_train_scaled,
                        epochs=5,
                        batch_size=batch_size,
                        validation_split=0.10,
                        )
    
    #return the validation accuracy for the last epoch.

    mae = blackbox.history['mean_absolute_error'][-1]
    # Print the regression loss
    print()
    print(f"mae: {mae}")
    print()

    # Delete the Keras model with these hyper-parameters from memory.
    del model
    
    # Clear the Keras session, otherwise it will keep adding new
    # models to the same TensorFlow graph each time we create
    # a model with a different set of hyper-parameters.
    K.clear_session()
    tensorflow.compat.v1.reset_default_graph()    
    # the optimizer aims for the lowest score, so we return our negative loss
    return mae

# Start the search!

## Gaussian Process

In [75]:
gp_result = gp_minimize(func=fitness,
                            dimensions=dimensions,
                            n_calls=12,
                            noise= 0.01,
                            n_jobs=-1,
                            kappa = 5,
                            x0=default_parameters)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.4684898555278778

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.4366424083709717

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.43136730790138245

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.42775803804397583

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.48450008034706116

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.42936038970947266

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.43255022168159485

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.4532989263534546

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.43098586797714233

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.42819249629974365

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.41710367798805237

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.4418030381202698



In [76]:
# the optimal parameters are:
gp_result.x

[4, 145, 239, 'relu', 418]

In [77]:
final_gp_results = pd.concat([pd.DataFrame(gp_result.x_iters, 
                        columns = ["hidden layers",
                                   "input layer nodes",
                                   "hidden layer nodes",
                                    "activation function",
                                   "batch size"]),
(pd.Series(gp_result.func_vals, name="mae"))], axis=1)

final_gp_results

Unnamed: 0,hidden layers,input layer nodes,hidden layer nodes,activation function,batch size,mae
0,1,256,13,relu,64,0.46849
1,3,135,133,relu,1698,0.436642
2,4,209,80,relu,913,0.431367
3,2,216,185,relu,26,0.427758
4,3,391,56,sigmoid,1877,0.4845
5,3,462,118,relu,2005,0.42936
6,2,430,184,relu,1792,0.43255
7,1,95,200,relu,1681,0.453299
8,2,325,229,relu,1813,0.430986
9,3,136,87,relu,23,0.428192


In [78]:
# call the best model
model1 = create_model(gp_result.x[0],gp_result.x[1],gp_result.x[2],gp_result.x[3])
print(model1.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 145)               14500     
_________________________________________________________________
layer_dense_1 (Dense)        (None, 239)               34894     
_________________________________________________________________
layer_dense_2 (Dense)        (None, 239)               57360     
_________________________________________________________________
layer_dense_3 (Dense)        (None, 239)               57360     
_________________________________________________________________
layer_dense_4 (Dense)        (None, 239)               57360     
_________________________________________________________________
dropout (Dropout)            (None, 239)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 2

In [79]:
model1.fit(x_train_scaled,y_train_scaled, epochs=50)
model1.evaluate(x_test_scaled,y_test_scaled)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


[0.38941797614097595, 0.3912051320075989]

## Gradient Boosted Regression Trees

In [84]:
gbrt_result = gbrt_minimize(func=fitness,
                            dimensions=dimensions,
                            n_calls=12,
                            n_jobs=-1,
                            x0=default_parameters)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.46120312809944153

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.4563150405883789

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.44255489110946655

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.4303363561630249

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.41654181480407715

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.46644139289855957

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.4484395384788513

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.49194595217704773

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.44907909631729126

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.42472919821739197

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.46765390038490295

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

mae: 0.4310031831264496



In [85]:
final_gbrt_results = pd.concat([pd.DataFrame(gbrt_result.x_iters, 
                        columns = ["hidden layers",
                                   "input layer nodes",
                                   "hidden layer nodes",
                                    "activation function",
                                   "batch size"]),
(pd.Series(gbrt_result.func_vals, name="mae"))], axis=1)

final_gbrt_results

Unnamed: 0,hidden layers,input layer nodes,hidden layer nodes,activation function,batch size,mae
0,1,256,13,relu,64,0.461203
1,2,224,31,sigmoid,186,0.456315
2,3,172,58,relu,1604,0.442555
3,4,153,218,relu,1698,0.430336
4,3,320,240,relu,201,0.416542
5,2,427,241,sigmoid,1184,0.466441
6,1,141,206,relu,1544,0.44844
7,4,290,107,sigmoid,2043,0.491946
8,1,293,62,relu,1416,0.449079
9,3,162,223,relu,1180,0.424729


In [86]:
gbrt_result.x

[3, 320, 240, 'relu', 201]

In [87]:
# call the best model
model2 = create_model(gbrt_result.x[0],gbrt_result.x[1],gbrt_result.x[2],gbrt_result.x[3])
print(model2.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 320)               32000     
_________________________________________________________________
layer_dense_1 (Dense)        (None, 240)               77040     
_________________________________________________________________
layer_dense_2 (Dense)        (None, 240)               57840     
_________________________________________________________________
layer_dense_3 (Dense)        (None, 240)               57840     
_________________________________________________________________
dropout (Dropout)            (None, 240)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 241       
Total params: 224,961
Trainable params: 224,961
Non-trainable params: 0
__________________________________________________

In [88]:
model2.fit(x_train_scaled,y_train_scaled, epochs=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


[0.3929002285003662, 0.3984869420528412]

In [83]:
reset()

# Predictions

In [92]:
real=pd.DataFrame(y_train)
predict_gp = model1.predict(pd.DataFrame(x_train_scaled))
desregularization_gp = scaler2.inverse_transform(predict_gp)
pred_escal_gp =pd.DataFrame(desregularization_gp)
print(f"Predictions with the model optimized by Gaussian Process")
for i in range(0,5):
	print("Real=%s, Prediction=%s" % (real[0][i], pred_escal_gp[0][i]))

Predictions with the model optimized by Gaussian Process
Real=18650, Prediction=16810.457
Real=9950, Prediction=9598.377
Real=2000, Prediction=3817.151
Real=7999, Prediction=6342.5933
Real=23999, Prediction=26151.236


In [99]:
result = model1.evaluate(x_test_scaled,y_test_scaled)
for i in range(len(model1.metrics_names)):
 print("Metric ",model1.metrics_names[i],":",
str(round(result[i],2)))

Metric  loss : 0.39
Metric  mean_absolute_error : 0.39


In [90]:
predict_gbrt = model2.predict(pd.DataFrame(x_train_scaled))
desregularization_gbrt = scaler2.inverse_transform(predict_gbrt)
pred_escal_gbrt =pd.DataFrame(desregularization_gbrt)
print(f"Predictions with the model optimized by Gaussian Boosted Regression Trees")
for i in range(0,5):
	print("Real=%s, Prediction=%s" % (real[0][i], pred_escal_gbrt[0][i]))

Predictions with the model optimized by Gaussian Boosted Regression Trees
Real=18650, Prediction=13963.95
Real=9950, Prediction=8337.7
Real=2000, Prediction=4979.528
Real=7999, Prediction=6125.268
Real=23999, Prediction=24421.33


In [97]:
result = model1.evaluate(x_test_scaled,y_test_scaled)
for i in range(len(model2.metrics_names)):
 print("Metric ",model2.metrics_names[i],":",
str(round(result[i],2)))

Metric  loss : 0.39
Metric  mean_absolute_error : 0.39


In [98]:
reset()

# Conclusions

For this dataset, the model given by the Gradient Boosted Regession Trees Optimization is the most simple and gives the same error metrics than the model using Gaussian Process optimization