# Hyperparameter Tuning and Neural Architecture Search

## Introduction

Optimizing the performance of neural networks involves carefully selecting hyperparameters and designing effective architectures. Hyperparameter tuning and Neural Architecture Search (NAS) are crucial processes in machine learning that aim to automate and optimize these selections. This tutorial explores techniques for hyperparameter tuning, delves into automated architecture search methods, and provides practical examples with code implementations. We'll also discuss the underlying mathematics and reference key papers that have shaped this field.

## Table of Contents

1. [Understanding Hyperparameters](#1)
   - [What are Hyperparameters?](#1.1)
   - [Common Hyperparameters in Neural Networks](#1.2)
2. [Hyperparameter Tuning Techniques](#2)
   - [Grid Search](#2.1)
   - [Random Search](#2.2)
   - [Bayesian Optimization](#2.3)
   - [Hyperband](#2.4)
3. [Implementing Hyperparameter Tuning](#3)
   - [Using Scikit-Learn](#3.1)
   - [Using Keras Tuner](#3.2)
4. [Neural Architecture Search (NAS)](#4)
   - [What is NAS?](#4.1)
   - [Reinforcement Learning for NAS](#4.2)
   - [Evolutionary Algorithms for NAS](#4.3)
5. [Implementing NAS with AutoKeras](#5)
   - [AutoKeras Overview](#5.1)
   - [Example Implementation](#5.2)
6. [Latest Developments in NAS](#6)
   - [Efficient NAS](#6.1)
   - [Neural Architecture Optimization](#6.2)
7. [Conclusion](#7)
8. [References](#8)

<a id="1"></a>
## 1. Understanding Hyperparameters

<a id="1.1"></a>
### What are Hyperparameters?

Hyperparameters are configuration variables set before training a model. They are not learned from the data but significantly influence the model's performance. Examples include learning rates, number of layers, batch sizes, and activation functions.

<a id="1.2"></a>
### Common Hyperparameters in Neural Networks

- **Learning Rate (\( \eta \))**: Controls the step size during optimization.
- **Batch Size**: Number of samples processed before the model is updated.
- **Number of Layers and Neurons**: Determines the depth and width of the network.
- **Activation Functions**: Functions like ReLU, Sigmoid, or Tanh applied to neurons.
- **Dropout Rate**: Fraction of neurons to drop during training to prevent overfitting.
- **Weight Initialization Methods**: Strategies for initializing network weights.

**Mathematical Representation of Learning Rate:**

During gradient descent optimization, weights are updated as:

\[
    w_{t+1} = w_t - \eta \nabla L(w_t)
\]

Where:

- $( w_t )$: Current weights
- $( \eta )$: Learning rate
- $( \nabla L(w_t) )$: Gradient of the loss function

<a id="2"></a>
## 2. Hyperparameter Tuning Techniques

<a id="2.1"></a>
### Grid Search

Grid Search exhaustively tries all combinations from a predefined set of hyperparameter values.

- **Advantages**:
  - Simple to implement
  - Guarantees finding the optimal combination within the grid
- **Disadvantages**:
  - Computationally expensive
  - Not feasible with large numbers of hyperparameters

<a id="2.2"></a>
### Random Search

Random Search samples hyperparameters randomly from a distribution.

- **Advantages**:
  - More efficient than Grid Search
  - Can find good hyperparameters with fewer iterations
- **Disadvantages**:
  - May miss optimal values

**Reference:**

- Bergstra, J., & Bengio, Y. (2012). *Random Search for Hyper-Parameter Optimization*. Journal of Machine Learning Research, 13, 281–305.

<a id="2.3"></a>
### Bayesian Optimization

Bayesian Optimization builds a probabilistic model to estimate the performance of hyperparameters and selects the next hyperparameters to evaluate based on past results.

- **Advantages**:
  - Efficient in exploring the hyperparameter space
  - Incorporates prior knowledge
- **Disadvantages**:
  - Computational overhead in building the model

**Mathematical Concept:**

Uses Gaussian Processes (GP) to model the objective function $( f )$. The next hyperparameter $( x )$ is chosen by maximizing the acquisition function $( a(x) )$:

$[
    x_{next} = \arg\max_{x} a(x | \mathcal{D})
]$

Where $( \mathcal{D} )$ represents observed data.

<a id="2.4"></a>
### Hyperband

Hyperband is a bandit-based approach that allocates resources to promising configurations.

- **Advantages**:
  - Efficient resource allocation
  - Can handle a large number of hyperparameters
- **Disadvantages**:
  - Requires careful tuning of internal parameters

**Reference:**

- Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2018). *Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization*. Journal of Machine Learning Research, 18(185), 1–52.

<a id="3"></a>
## 3. Implementing Hyperparameter Tuning

<a id="3.1"></a>
### Using Scikit-Learn

Scikit-Learn provides tools for Grid Search and Random Search.

**Example: Grid Search with Scikit-Learn**

In [None]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

# Load data
data = load_iris()
X = data.data
y = data.target

# Define the parameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['rbf', 'linear']
}

# Create a base model
svc = SVC()

# Instantiate the grid search model
grid_search = GridSearchCV(estimator=svc, param_grid=param_grid, cv=5, 
                           n_jobs=-1, verbose=2)

# Fit the grid search to the data
grid_search.fit(X, y)

# Best parameters
print("Best Parameters:", grid_search.best_params_)

# Best estimator
best_model = grid_search.best_estimator_

<a id="3.2"></a>
### Using Keras Tuner

Keras Tuner is a library for hyperparameter tuning with TensorFlow and Keras.

**Example: Hyperparameter Tuning with Keras Tuner**

In [None]:
# Install Keras Tuner (if not already installed)
# !pip install keras-tuner

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from kerastuner import RandomSearch

# Load data (e.g., MNIST)
(x_train, y_train), (x_val, y_val) = tf.keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_val = x_val.astype('float32') / 255.0

# Define the model builder function
def build_model(hp):
    model = keras.Sequential()
    model.add(keras.layers.Flatten(input_shape=(28, 28)))
    hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
    model.add(keras.layers.Dense(units=hp_units, activation='relu'))
    model.add(keras.layers.Dense(10, activation='softmax'))
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
    model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# Instantiate the tuner
tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=5,
    executions_per_trial=3,
    directory='my_dir',
    project_name='helloworld')

# Perform hyperparameter search
tuner.search(x_train, y_train,
             epochs=5,
             validation_data=(x_val, y_val))

# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"Best number of units: {best_hps.get('units')}")
print(f"Best learning rate: {best_hps.get('learning_rate')}")

<a id="4"></a>
## 4. Neural Architecture Search (NAS)

<a id="4.1"></a>
### What is NAS?

Neural Architecture Search automates the process of designing neural network architectures.

- **Goal**: Find optimal architectures without manual intervention.
- **Challenges**: High computational cost and large search space.

**Reference:**

- Zoph, B., & Le, Q. V. (2017). *Neural Architecture Search with Reinforcement Learning*. [arXiv:1611.01578](https://arxiv.org/abs/1611.01578)

<a id="4.2"></a>
### Reinforcement Learning for NAS

- **Controller RNN**: Generates architectural hyperparameters.
- **Training Process**:
  - Sample an architecture.
  - Train it on the dataset.
  - Use the performance as a reward to update the controller.

**Mathematical Representation:**

The controller aims to maximize expected reward $( J(\theta) )$:

$[
    J(\theta) = \mathbb{E}_{P(a; \theta)} [R(a)]
]$

- $( \theta )$: Parameters of the controller
- $( a )$: Architecture sampled
- $( R(a) )$: Reward (e.g., validation accuracy)

<a id="4.3"></a>
### Evolutionary Algorithms for NAS

- **Population-Based Methods**: Evolve architectures over generations.
- **Operations**:
  - Mutation
  - Crossover
- **Selection**: Based on fitness scores (e.g., accuracy).

**Reference:**

- Real, E., et al. (2019). *Regularized Evolution for Image Classifier Architecture Search*. [arXiv:1802.01548](https://arxiv.org/abs/1802.01548)

<a id="5"></a>
## 5. Implementing NAS with AutoKeras

<a id="5.1"></a>
### AutoKeras Overview

AutoKeras is an open-source library for AutoML, which includes NAS capabilities.

- **Features**:
  - Automatic model search
  - Easy-to-use interface
- **Installation**:
  - `pip install autokeras`

<a id="5.2"></a>
### Example Implementation

In [None]:
# Install AutoKeras (if not already installed)
# !pip install autokeras

import autokeras as ak
import tensorflow as tf

# Load data (e.g., CIFAR-10)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Normalize data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Define the model
autokeras_model = ak.ImageClassifier(
    max_trials=3,  # It tries 3 different models.
    overwrite=True)

# Fit the model
autokeras_model.fit(x_train, y_train, epochs=5)

# Evaluate the best model
accuracy = autokeras_model.evaluate(x_test, y_test)
print(f"Test accuracy: {accuracy}")

**Explanation**:

- **max_trials**: Number of different models to try.
- **ImageClassifier**: Automatically searches for the best model for image classification.

<a id="6"></a>
## 6. Latest Developments in NAS

<a id="6.1"></a>
### Efficient NAS

**Efficient NAS** aims to reduce the computational cost of architecture search.

- **Techniques**:
  - Weight sharing among architectures
  - One-shot models

**Reference:**

- Pham, H., et al. (2018). *Efficient Neural Architecture Search via Parameter Sharing*. [arXiv:1802.03268](https://arxiv.org/abs/1802.03268)

<a id="6.2"></a>
### Neural Architecture Optimization

**Neural Architecture Optimization** integrates NAS with network pruning and quantization.

- **Goals**:
  - Optimize architectures for deployment on resource-constrained devices.

**Reference:**

- Liu, C., et al. (2018). *Progressive Neural Architecture Search*. [arXiv:1712.00559](https://arxiv.org/abs/1712.00559)

<a id="7"></a>
## 7. Conclusion

Hyperparameter tuning and Neural Architecture Search are vital for optimizing neural network performance. While hyperparameter tuning focuses on finding the best settings for predefined models, NAS automates the design of optimal architectures. With the advancement of tools like Keras Tuner and AutoKeras, these processes have become more accessible. Ongoing research continues to make these methods more efficient and practical for real-world applications.

<a id="8"></a>
## 8. References

1. Bergstra, J., & Bengio, Y. (2012). *Random Search for Hyper-Parameter Optimization*. Journal of Machine Learning Research, 13, 281–305.
2. Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2018). *Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization*. Journal of Machine Learning Research, 18(185), 1–52.
3. Zoph, B., & Le, Q. V. (2017). *Neural Architecture Search with Reinforcement Learning*. [arXiv:1611.01578](https://arxiv.org/abs/1611.01578)
4. Real, E., et al. (2019). *Regularized Evolution for Image Classifier Architecture Search*. [arXiv:1802.01548](https://arxiv.org/abs/1802.01548)
5. Pham, H., et al. (2018). *Efficient Neural Architecture Search via Parameter Sharing*. [arXiv:1802.03268](https://arxiv.org/abs/1802.03268)
6. Liu, C., et al. (2018). *Progressive Neural Architecture Search*. [arXiv:1712.00559](https://arxiv.org/abs/1712.00559)

---

This notebook provides a comprehensive overview of hyperparameter tuning and neural architecture search. You can run the code cells to see how these techniques are implemented and experiment with different settings and tools.