<a href="https://colab.research.google.com/github/pkmariya/Scaler01/blob/master/hpt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Hyperparameter Tuning Example in ML


Hyperparameter tuning in machine learning (ML) and deep learning (DL) refers to the process of optimizing the configuration settings used to control the learning process and structure of a machine learning or deep learning model. These hyperparameters are not learned from the data; instead, they are set prior to the training process and have a significant effect on the performance of the model.

Hyperparameters can include:

1.   Learning rate: The step size at which the model learns from the data.
2.   Number of epochs: The number of times the learning algorithm will work through the entire training dataset.
3.   Batch size: The number of training examples utilized in one iteration.
4.   Network architecture specifics: Such as the number of layers and the number of units per layer in neural networks.
5.   Regularization parameters: Such as weight decay coefficients or dropout rates.
6.   Optimization algorithms: Such as Adam, SGD, RMSprop, etc.

The goal of hyperparameter tuning is to find the set of hyperparameters that yields the best performance on a given task. This is typically measured by a performance metric such as accuracy, F1 score, or mean squared error, depending on the problem at hand.

The methods used for hyperparameter tuning can generally be applied both in ML and DL, and they include:

**Grid Search**: An exhaustive search that systematically goes through every combination of hyperparameters in a predefined grid. It's computationally expensive, especially with a large number of hyperparameters.

**Random Search**: Instead of trying out every possible combination, random search selects random combinations of hyperparameters to try. This method can sometimes find a good set of hyperparameters more quickly than grid search.

**Bayesian Optimization**: This approach models the performance function of the hyperparameters using a Gaussian process or a similar probabilistic model and then selects hyperparameters to try based on how likely they are to yield a performance improvement.

**Gradient-based Optimization**: For some kinds of hyperparameters, it’s possible to compute the gradient with respect to the hyperparameters and optimize them directly. This is more common in deep learning with specific types of hyperparameters.

**Evolutionary Algorithms**: These algorithms simulate the process of natural selection to select, mutate, and recombine hyperparameters in search of the most effective configuration.

**Hyperband**: A bandit-based approach that dynamically allocates resources to hyperparameter configurations and rapidly eliminates the poor-performing ones while concentrating more resources on promising configurations.

**Population-based Training (PBT)**: An approach that trains a population of models with different hyperparameters concurrently, and periodically adjusts these hyperparameters based on the performance of the models.

In practice, the choice of hyperparameter tuning method depends on the size of the hyperparameter space, the computational resources available, the nature of the problem, and the time constraints. For deep learning, due to the larger computational cost and more extensive hyperparameter space, methods like Bayesian optimization, Hyperband, and PBT are often preferred, as they can be more efficient compared to grid or random search.

In [2]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report

In [8]:
# Load the Iris dataset
iris = load_iris
X, y = iris.data, iris.target

In [9]:
# Split the dataset into training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [10]:
# Define the Model
svc = SVC()

## Tuning the parameters

 In the context of support vector machines (SVMs), the parameters C, kernel, and gamma are critical hyperparameters that can significantly influence the performance of the model. When we perform hyperparameter tuning, we aim to find the best combination of these hyperparameters that will yield the highest accuracy (or another performance metric) for our model on the given data.

Here's what each hyperparameter represents:

C (Regularization parameter): The C parameter trades off correct classification of training examples against maximization of the decision function's margin. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly by giving the model freedom to select more samples as support vectors. C can be thought of as the "cost" of misclassification. A large C gives you low bias and high variance, while a small C will give you higher bias and lower variance.

kernel: The kernel type to be used in the SVM algorithm. The kernel function transforms the input data into a higher dimensional space where a linear separator may be found if the data is not linearly separable in the original space. Common kernel functions include:

linear: Linear kernel (no transformation).
rbf: Radial basis function kernel (default), good for non-linear data.
poly: Polynomial kernel.
sigmoid: Sigmoid kernel.
gamma (Kernel coefficient for 'rbf', 'poly', and 'sigmoid'): Defines how far the influence of a single training example reaches. Low values mean 'far' and high values mean 'close'. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors.


In this dictionary:

C is set to be tested with the values [1, 10, 100, 1000].

kernel is set to try all four listed types: ['linear', 'rbf', 'poly', 'sigmoid'].

gamma is set to try 'scale' (uses 1 / (n_features * X.var()) as value of gamma), 'auto' (uses 1 / n_features), and the specific values [0.001, 0.0001].

The GridSearchCV function from scikit-learn will then perform a grid search over all possible combinations of these hyperparameter values, which in this case would be 4 (C values) x 4 (kernel types) x 4 (gamma values) = 64 models to train and evaluate using cross-validation to find the best combination based on the selected performance metric.

In [11]:
# Set the parameters by cross-validation
tuned_parameters = {
    'C': [1, 10, 100, 1000],
    'kernel': ['linear', 'rbf', 'poly', 'sigmoid'],
    'gamma': ['scale', 'auto', 0.001, 0.0001],
}

In [12]:
# Instantiate GridSearchCV object
clf = GridSearchCV(svc, tuned_parameters, cv=5, scoring='accuracy')

In [13]:
# Fit the grid search to the data
clf.fit(X_train, y_train)

In [14]:
# View the best parameters found by GridSearchCV
print("Best parameters found on training set: ")
print(clf.best_params_)

Best parameters found on training set: 
{'C': 1, 'gamma': 'scale', 'kernel': 'linear'}


In [15]:
# Evaluate the best model found the test set
print("Detailed Classification Report:")
y_true, y_pred = y_test, clf.predict(X_test)
print(classification_report(y_true, y_pred))

Detailed Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



### Hyperparameter Tuning Example in DL

In [2]:
import tensorflow as tf
from tensorflow import keras
from keras import layers

In [4]:
!pip install keras-tuner

Collecting keras-tuner
  Downloading keras_tuner-1.4.6-py3-none-any.whl (128 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/128.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━[0m [32m92.2/128.9 kB[0m [31m2.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m128.9/128.9 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Collecting kt-legacy (from keras-tuner)
  Downloading kt_legacy-1.0.5-py3-none-any.whl (9.6 kB)
Installing collected packages: kt-legacy, keras-tuner
Successfully installed keras-tuner-1.4.6 kt-legacy-1.0.5


In [5]:
from keras_tuner.tuners import RandomSearch

In [6]:
# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
x_train = x_train.reshape(-1, 28*28)
x_test = x_test.reshape(-1, 28*28)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [7]:
# Define a model builder function
def build_model(hp):
  model = keras.Sequential()
  model.add(layers.Input(shape=(784,)))

  # Tune the number of units in the first Dense layer
  hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
  model.add(layers.Dense(units=hp_units, activation='relu'))

  # Tune the learning rate for the optimizer
  hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])

  model.add(layers.Dense(10, activation='softmax'))

  model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

  return model

In [8]:
# Instantiate the tuner
tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=5,
    executions_per_trial=3,
    directory='my_dir',
    project_name='helloworld'
)

In [9]:
# Create a callback to stop training early after reaching a certain value for the validation loss
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

In [10]:
from gc import callbacks
# Execute the hyperparameter search
tuner.search(x_train, y_train, epochs=5, validation_split=0.2, callbacks=[stop_early])

Trial 5 Complete [00h 03m 02s]
val_accuracy: 0.9588333169619242

Best val_accuracy So Far: 0.9763611157735189
Total elapsed time: 00h 13m 42s


In [11]:
# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]


In [16]:
print(f"""
The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")


The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is 352 and the optimal learning rate for the optimizer
is 0.001.



In [15]:
# Build the model with the optimal hyperparameters and train it on the data for 50 epochs
model = tuner.hypermodel.build(best_hps)
history = model.fit(x_train, y_train, epochs=5, validation_split=0.2)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
