# **Hyperparameter Tuning**

| | |
|-|-|
| Author(s) | [Keeyana Jones](https://github.com/keeyanajones/) |

## **Overview**

Hyperparameter tuning, often simply called hyperparameter optimization, is the process of finding the optimal set of hyperparameters for a machine learning model to achieve the best possible performance on a given dataset.  

### **What are Hyperparameters?**

Before diving into tuning, its crucial to understand what hyperparameters are in contrast to model parameters:
- **Model Parameters:** These are the internal variables or weights that a machine learning model learns from the training data. Examples include the coefficients in a linear regression model or the weights and biases in a neural network. These are learned during the training process.  
- **Hyperparameters:** These are configuration settings that are external to the model and are not learned from the data. Instead, they are set before the training process begins and control how the model learns or its overall structure. 

### **Example of Hyperparameters:**
- Neural Networks:
   - Learning rate
   - Number of layers
   - Number of neurons/units per layer
   - Activation functions
   - Optimizer (e.g., Adam, SGD, RMSprop)
   - Batch size
   - Number of epochs
   - Dropout rate
   - Regularization strength (L1, L2)

- Support Vector Machines (SVMs):
   - Kernel type (e.g., linear, RBF, polynomial)
   - Regularization parameters (C)
   - Gamma for RBF kernel

- Decision Trees/Random Forests/Gradient Boosting:
   - Max depth of trees:
   - Number of estimators (trees) 
   - Minimum samples per leaf
   - Learning rate for boosting
   
- K-Means Clustering:
   - Number of clusters (k) 

### **Why is Hyperparameter Tuning Important?**

The choice of byperparameters can dramatically impact a models performance.
- **Underfitting/Overfitting:** Incorrect hyperparameters can lead to a model that either underfits (too simple, can't capture patterns) or overfits (too complex, memorizes training data but performs poorly on new data).
- **Convergence Speed:** The learning rate, for instance, dictates how quickly an optimization algorithm converges. Too high, and it might overshoot, too low and it might take too long to converge.
- **Generalization:** Well-tuned hyperparameters help the model generalize better to unseen data.
- **Resource Efficiency:** Optimal hyperparameters can lead to faster training times and more efficient use of computational resources.

### **How is Hyperparameter Tuning Done?**

Hyperparameter tuning involves defining a search space for each hyperparameter and then using a strategy to explore that space to find the best combination. Evaluation is always done on a validation set (or via cross-validation) to avoid overfitting to the test set.

Common strategies include:

1. Manual Search (Trial and Error):
   - **Description:** An experienced data scientist manually tweaks hyperparameters based on intuition, past experience, and observed model performance. 
   - **Pros:** Can be effective for simple models or when starting out, leverages human expertise.
   - **Cons:** Time consuming, unscalable, prone to missing optimal combinations, subjective.
2. Grid Search:
   - **Description:** Defines a discrete set of values for each hyperparameter.  The algorithm then trains a model for every possible combination of these values.
   - **Example:** if `learning_rate = [0.01, 0.001]` and `batch_size = [32, 64]`, Grid Search will test (0.01, 32), (0.01, 64), (0.001, 32), (0.001, 64).
   - **Pros:** Simple to understand and implement, guarantees finding the best combination within the defined grid.
   - **Cons:** Computationally expensive and time consuming as the number of hyperparameters and values increases (curse of dimensionality).  Not efficient if many hyperparameters are irrelevant or interact in complex ways.
3. Random Search:
   - **Description:** Instead of trying every combination, random search samples a fixed number of combinations from the defined search space. 
   - **Pros:** More efficient than Grid Search, especially for high dimensional search spaces, because its more likely to hit promising regions that Grid Search might miss if its grid is too coarse. Often finds good enough solutions much faster.
   - **Cons:** Doesn't guarantee finding the absolute best combination, might miss the very best if the sampling budget is too small. 
4. Bayesian Optimization:
   - **Description:** A more intelligent and efficient approach.  It builds a probabilistic model (surrogate model) of the objective function (e.g., validation accuracy) based on past evaluation results.  It then uses this model to intelligently choose the next set of hyperparameters to evaluate, aiming to maximize the objective function while minimizing the number of evaluations.  It balances exploration (trying new areas) and exploitation (refining promising areas). 
   - **Pros:** Significantly more efficient than Grid or Random Search, especially for expensive to evaluate models (e.g., deep neural networks), often finds better optima in fewer iterations.
   - **Cons:** More complex to implement, can be slower initially due to overhead of building the probabilistic model, sequential in nature (can't easily parallelize all evaluations).
   - **Popular Libraries:** Hyperopt, Optuna, Scikit-Optimize (skopt), Spearmint.
5. Gradient-Based Optimization:
   - **Description:** Treat hyperparameter tuning as an optimization problem where you want to minimize a validation loss function with respect to hyperparameters. This involves computing gradients of the validation loss with respect to hyperparameters (e.g., using implicit differentiation or meta learning approaches). 
   - **Pros:** Potentially very efficient, especially for models with many hyperparameters.
   - **Cons:** Complex to implement, not applicable to all hyperparameters (e.g., discrete choices). More of a research area.
6. Evolutionary Algorithms:
   - **Description:** inspired by natural selection. A population of hyperparameter sets is randomly initialized.  The best performing sets are selected, mutated, and combined to create a new generation, iteratively searching for optimal solutions. 
   - **Pros:** Can explore complex, non convex search spaces effectively, highly parallelizable.
   - **Cons:** Can be computationally intensive, might take many generations to converge.

### **Tools for Hyperparameter Tuning:**

Many ML frameworks and dedicated libraries offer built in or integrated hyperparameter tuning capabilities:
- **scikit-learn:** `GridSearchCV`, `RandomizedSearchCV`
- **Keras/TensorFlow:** `tf.keras.callbacks.TensorBoard` (for visualization during manual tuning) 
- **PyTorch Lighting:** Integrates with various loggers and tuners (e.g., Optuna, Weights & Biases)
- Dedicated Libraries:
   - **Hyperopt:** Bayesian Optimization
   - **Optuna:** Flexible framework for hyperparameter optimization, supports various samplers.
   - **Weights & Biases (W&B):** Powerful tool for hyperparameter optimization with excellent visualization and collaboration features.
   - **Comet ML:** Similar to W&B, offers hyperparameter optimization along side experiment tracking. 
   - **Ray Tune:** Scalable hyperparameter tuning library for various ML frameworks.
   - **Ax/BoTorch (Meta):** Flexible platform for adaptive experimentation and Bayesian optimization.
- **Cloud ML Platforms:** AWS SagMaker, Google Cloud Vertex AI, Azure Machine Learning all offer managed hyperparameter tuning services that abstract away much of the infrastructure complexity.

### **Best Practices for Hyperparameter Tuning:**

- **Define a Search Space:** Clearly define the range or set of possible values for each hyperparameter.
- **Start Broad, Then Narrow:** Begin with a wide search space using Random Search or a coarse Grid Search. Once promising regions are identified, narrow down the search space and use a finer-grained search (or Bayesian Optimization).
- **Use a Validation Set/Cross Validation:** Never tune hyperparameters on the test set.  Always use a separate validation set or cross validation to get an unbiased estimate of performance during turning.
- **Track Experiments:** Use an experiment tracking tool (like MLflow, W&B, Comet ML) to log eery hyperparameter configuration and its corresponding metrics.  This is crucial for reproducibility and comparison.
- **Early Stopping:** Implement early stopping to halt training runs that are not performing well, saving computational resources.
- **Parallelization:** Utilize parallel computing where possible to run multiple hyperparameter configurations simultaneously.
- **Understand Your Model:** A good understanding of your models architecture and the role of different hyperparameters can guide your search and make it more efficient.

Hyperparameter tuning often more art than science in its initial stages but with systematic approaches and the right tools, it becomes a scientific process that significantly contributes to building high performing and robust machine learning models.  

----