<a href="https://colab.research.google.com/github/shivendr7/VFD_NOMA/blob/main/VFD_NOMA_Evaluation(initializers)UsingGridSearchCV.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Hyperparameters tunning is an essential part of any Machine Learning project and one of the most time consuming. Even for the simplest models it can take hours to find the optimal parameters not mentioning neural nets that can be optimized day, weeks or even longer.

In [1]:
import numpy as np
import tensorflow as tf
from keras.layers import Dense, Dropout, BatchNormalization
from keras.models import Sequential
from keras.losses import MeanSquaredError, CosineSimilarity, MeanAbsoluteError, MeanSquaredLogarithmicError, MeanAbsolutePercentageError, Huber
from keras.optimizers import Adam

##GridSearch
The first and the simplest method to try is GridSearchCV which is included in sklearn.model_selection This approach just trying all available parameters' combinations 1 by 1 and choose the one with the best cross validation results.

This approach has several drawbacks:

It is very slow - you just try ALL combinations of ALL parameters and it takes a lot of time. Any additional parameter to variate multiply the number of iterations you need to complete. Imagine that you add to the parameter grid a new parameter with 10 possible values, this parameter can turn out to be meaningless but the computational time will be increased 10 times.
It can work only with discrete values. If the global optimum is on n_estimators=550, but you are doing GridSearchCV from 100 to 1000 with step 100, you will never reach the optimal point.
You need to know / guess the approximate localization of the optimum to complete the search in a reasonable time.
You can overcome some of this drawbacks: you can do grid search parameter by parameter, or use it several times starting from the broad grid with large steps and narrowing the boundaries and decreasing step sizes on any iterations. But is still will be very computationally intensive and long.

Let's estimate the time to do the Grid Search in our case. Let's suppose we want our grid to consist of 20 possible values of 'n_estimators' (100 to 2000), 19 values of 'max_depth' (2 to 20), and 5 values of 'learning_rate' (10e-4 to 0.1).

This means we need to compute cross_val_score 20*19*5 = 1 900 times. If 1 computation takes ~0.5-1.0 second, our grid search will last for ~15-30 minutes. It is too much for the dataset with ~400 data points.

In [2]:
!wget 'https://raw.githubusercontent.com/shivendr7/VFD_NOMA/main/Data_P1_100000samples.csv'

--2021-07-05 15:23:28--  https://raw.githubusercontent.com/shivendr7/VFD_NOMA/main/Data_P1_100000samples.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5587697 (5.3M) [text/plain]
Saving to: ‘Data_P1_100000samples.csv’


2021-07-05 15:23:29 (66.1 MB/s) - ‘Data_P1_100000samples.csv’ saved [5587697/5587697]



In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split
df=pd.read_csv('Data_P1_100000samples.csv')
X=np.array(df[df.columns[:9]])
y=np.array(df[df.columns[-1]])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=30)
X_train.shape, y_test.shape

((67000, 9), (33000,))

In [9]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

In [7]:
def create_model(init='he_uniform'):
  mape=MeanAbsolutePercentageError()
  model=Sequential()
  model.add(Dense(128, activation='relu', kernel_initializer=init, input_shape=(9,), kernel_regularizer='l2'))
  #model.add(Dropout(0.08))
  #model.add(Dense(128, activation='relu', kernel_initializer='he_uniform', kernel_regularizer='l2')) #-0
  #model.add(Dropout(0.04))
  #model.add(BatchNormalization())
  model.add(Dense(64, activation='relu', kernel_initializer=init, kernel_regularizer='l2'))
  model.add(Dropout(0.02))
  #model.add(BatchNormalization())
  model.add(Dense(64, activation='relu', kernel_initializer=init, kernel_regularizer='l2'))
  #model.add(Dropout(0.04))
  model.add(BatchNormalization())  
  model.add(Dense(32, activation='relu', kernel_initializer=init, kernel_regularizer='l2'))
  model.add(Dropout(0.02))
  #model.add(BatchNormalization())
  model.add(Dense(32, activation='relu', kernel_initializer=init, kernel_regularizer='l2'))
  #model.add(Dropout(0.02))
  #model.add(BatchNormalization())
  model.add(Dense(1))
  h_loss=Huber(delta=6)
  model.compile(loss=[mape, h_loss], optimizer=Adam(learning_rate=0.001))
  return model

In [18]:
model=KerasClassifier(build_fn=create_model, verbose=0, batch_size=25, epochs=50, validation_split=0.3)
init_mode=['uniform', 'he_uniform']
param_grid=dict(init=init_mode)
grid=GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3, scoring='neg_mean_absolute_error')

In [19]:
results=grid.fit(X_train, y_train)

In [21]:
grid_result=results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: -0.165893 using {'init': 'uniform'}
-0.165893 (0.000139) with: {'init': 'uniform'}
-0.165893 (0.000139) with: {'init': 'he_uniform'}


##Random Search 
is on average more effective than Grid Search.

Main advantages:

Don't spend time on meaningless parameters. On every step random search variate all parameters.

On average finds ~optimal parameters much faster than Grid search.
It is not limited by grid when we optimize continuous parameters.
Disadvantages:

It may not find the global optimal parameter on a grid.

All steps are independent. On every particular step it does not use any information about the results gathered so far. But they can be useful. For example, if we found a good solution it can be useful to search around it to find even better point comparing to looking at other completely random variants.