# What is `Cost function`?

####  A cost function is a way to measure how well a model or algorithm is performing. It calculates the difference between the actual results and the predicted results. The goal is to make this difference as small as possible, so the model can make better predictions. Think of it like a score that tells you how wrong the model is: the lower the score, the better the model.


### <ol> <li> `Mean Squared Error (MSE)`:
<li> <b> Use:</b> Regression problems.
<li> <b> Definitions: </b>The average of the squared differences between actual and predicted values.

#### $ mse = \frac{1}{n} \sum \limits_{i=1}^n (y_i - y_{predicted})^2$ <br>
#### $ mse = \frac{1}{n} \sum \limits_{i=1}^n (y_i - (mx_i + b))^2$


### <ol start='2'> <li> `Mean Absolute Error (MAE)`:
<li> <b> Use:</b> Regression problems.
<li> <b> Definitions: </b>The average of the absolute differences between actual and predicted values.

#### $\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$


### <ol start='3'> <li> `Root Mean Square Error (RMSE)`:
<li> <b> Use:</b> Regression problems.
<li> <b> Definitions: </b>The square root of the average of the squared differences between actual and predicted values.

#### $\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$

### <ol start='4'> <li> `Huber Loss`:
<li> <b> Use:</b> Regression problems.
<li> <b> Definitions: </b>A combination of MSE and MAE, less sensitive to outliers.

#### 
$\text{Huber}(y, \hat{y}) = 
\begin{cases} 
\frac{1}{2}(y - \hat{y})^2 & \text{for } |y - \hat{y}| \leq \delta \\
\delta |y - \hat{y}| - \frac{1}{2}\delta^2 & \text{otherwise}
\end{cases}
$

### <ol start='5'> <li> `Logarithmic Loss (Log Loss)`:
<li> <b> Use:</b> Classification problems.
<li> <b> Definitions: </b>Measures the performance of a classification model where the output is a probability value between 0 and 1.

#### $\text{Log Loss} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$


### <ol start='6'> <li> `Cross-Entropy Loss`:
<li> <b> Use:</b> Classification problems.
<li> <b> Definitions: </b>A generalization of Log Loss for multi-class classification.



#### $\text{Cross-Entropy Loss} = -\sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})$


### <ol start='7'> <li> `Binary Cross-Entropy Loss`:
<li> <b> Use:</b> Binary classification problems..
<li> <b> Definitions: </b>Measures the performance of a binary classification model where the output is a probability value between 0 and 1.


#### $\text{Log Loss} = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$



### <ol start='8'> <li> `Categorical Cross-Entropy`:
<li> <b> Use:</b> Multi-class classification problems
<li> <b> Definitions: </b>Measures the performance of a multi-class classification model where each class is assigned a probability.

#### $\text{Cross-Entropy Loss} = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{C} y_{i,c} \log(\hat{y}_{i,c})$

### <ol start='9'> <li> `Sparse Categorical Cross-Entropy`:
<li> <b> Use:</b> Multi-class classification problems
<li> <b> Definitions: </b>Similar to Categorical Cross-Entropy but used when the labels are integers instead of one-hot encoded vectors.

#### $\text{Sparse Cross-Entropy Loss} = -\frac{1}{n} \sum_{i=1}^{n} \log(\hat{y}_{i, y_i}) $


### <ol start='10'> <li> `Hinge Loss`:
<li> <b> Use:</b> Used for "maximum-margin" classification.
<li> <b> Definitions: </b>A generalization of Log Loss for multi-class classification.

#### $ \text{Hinge Loss} = \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i) $

### <ol start='11'> <li> `Kullback-Leibler (KL) Divergence`:
<li> <b> Use:</b> Probability distributions.
<li> <b> Definitions: </b>Measures how one probability distribution diverges from a second, expected probability distribution.

#### $ \text{KL}(P || Q) = \sum_{i} P(i) \log \frac{P(i)}{Q(i)} $

# What is `Gradient Descent`?

####  Gradient descent is a way to teach computers how to improve at tasks by making small adjustments. Imagine you're on a hill and you want to find the lowest point. You take a step downhill, check if you're closer to the bottom, and then keep taking steps in the direction that goes down the steepest. In the same way, gradient descent helps a computer model learn by adjusting its settings step-by-step to make its predictions more accurate.


 ![image.png](attachment:image.png)

### Minima:

<li>Minima (plural of minimum) are the lowest points of something, like a function, graph, or terrain.
<li>Think of it like the bottom of a valley or the lowest point on a roller coaster.
<li>In math, we look for minima to find the smallest value of something, like the least expensive item in a store.<br>

### Maxima:

<li>Maxima (plural of maximum) are the highest points of something.
<li>Imagine the top of a mountain or the highest point on a roller coaster.
<li>In math, we search for maxima to find the largest value of something, like the tallest building in a city.<br>

Minima and maxima are important in many areas of life and science, helping us find the best or worst of something, whether it's cost, height, or even how good a computer program is at a task.


Local and global maxima and minima are concepts used in mathematics and optimization to describe the highest (maxima) and lowest (minima) points in a function or a dataset. Here's a simple explanation:

##### <u>Local Minima and Maxima:</u>

<li><b>Local Minima:</b> A local minimum is the lowest point in a particular region of a graph or a function. It's where the function is lower than nearby points but not necessarily the absolute lowest point overall.

<li><b>Local Maxima: </b> A local maximum is the highest point in a particular region of a graph or a function. It's where the function is higher than nearby points but not necessarily the absolute highest point overall.

##### <u>Global Minima and Maxima:</u>

<li> <b>Global Minima:</b>  The global minimum is the absolute lowest point of a function across its entire range. It's the lowest value the function can achieve over its entire domain.

<li> <b>Global Maxima: </b> The global maximum is the absolute highest point of a function across its entire range. It's the highest value the function can achieve over its entire domain.



# <center> References
<ol> 
    <li> https://youtube.com/playlist?list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&si=AyNmAxSMC-FFYBfU </li>
    <li> https://www.mathsisfun.com/calculus/derivatives-introduction.html </li>