# Tables of Content:


Hyperparameter Optimization: A Short Review of Algorithms

# INTRODUCTION 
Optimization is one of the core components of machine learning. The essence of most machine learning algorithms is to build an optimization model and learn the parameters in the objective function from the given data.

Almost all machine learning algorithms can be formulated as an optimization problem to find the extremum of an objective function. Building models and constructing reasonable objective functions are the first step in machine learning methods. With the determined objective function, appropriate numerical or analytical optimization methods are usually used to solve the optimization problem.

For supervised learning, the goal is to find an optimal
mapping function $f(x)$ to minimize the loss function of the
training samples,

$$
min_{\theta}\frac{1}{N}\sum_{i=1}^{N}L(y^{i},f(x^{i},\theta))
$$

where N is the number of training samples, θ is the parameter
of the mapping function, xi is the feature vector of the ith
samples, y i is the corresponding label, and L is the loss
function.

Many of these parameters affect the model's quality in predicting results from new input data. These parameters are intrinsic to the models as they represent their characteristics, such as the number of interactions, regularization term, tree size, etc. These parameters need to be chosen before the algorithm's learning process begins and are referred to as hyperparameters. The process of selecting these parameters to improve the model's performance is known as hyperparameter tuning (GOODFELLOW; BENGIO; COURVILLE, 2016; KUHN; JOHNSON, 2013; PROBST; BISCHL; BOULESTEIX, 2018; YANG; SHAMI, 2020; HUTTER; KOTTHOFF; VANSCHOREN, 2019; ZHENG, 2022).

The process of tuning hyperparameters is generally treated as a black-box optimization problem. Formally, we can define the problem as follows: Let:


The process of tuning hyperparameters is generally treated as a black-box optimization problem. Formally, we can define the problem as follows:
$$
    \mathcal{H} = \mathcal{H}_1 \times \mathcal{H}_2 \times ... \times \mathcal{H}_k
$$
where $\mathcal{H}$ is the hyperparameter space and an algorithm $a \in \mathcal{A}$, where $\mathcal{A}$ is a set of machine learning algorithms. Each $\mathcal{H}_i$ represents a set of possible values for the $i^{th}$ hyperparameter of $a$ ($i \in {1, ..., k}$). Let $\mathcal{D}$ be a dataset, where $D \in \mathcal{D}$ is a specific dataset in $\mathcal{D}$. The function $f: \mathcal{A} \times \mathcal{D} \times \mathcal{H} \rightarrow \mathbb{R}$ calculates the model's performance using algorithm $a \in \mathcal{A}$ on dataset $D \in \mathcal{D}$ given the hyperparameter configuration $h = (h_1, h_2, ..., h_k) \in \mathcal{H}$.

Given $a \in \mathcal{A}$, $\mathcal{H}$, and $D \in \mathcal{D}$, the objective of tuning the hyperparameters is to find $h^* = (h_1^, h_2^, ..., h_k^*)$ such that:

It often requires deep knowledge of machine learning algorithms and appro-
priate hyper-parameter optimization techniques. Although several automatic
optimization techniques exist, they have different strengths and drawbacks
when applied to different types of problems

# Grid search

![](img/grid_layout.png)

# Random search

![](img/random_layout.png)