# <center>Hyperparameter Optimization or Tuning</center>

> * is the process of finding best Hyperparameters for a given dataset. 
> * Best Hyperparameters are those that minimizes the generalization error (not necessarily loss error).
> * The critical step is to choose how many different Hyperparameter combinations we are going to test because -  
> * greater the no. of Hypermater combinations greater the chance of getting better model and so does greater the computational cost
>> **No. of Hyperparameter Combinations ⋉ Better Model ⋉ Computing Cost**

---

## Aim

1. Improve performance of the ML Model.
2. Understand advanced optimization techniques.
3. Learn how to use various open source packages.
4. Participate and Lead in data science competitions.

## Packages Used

1. [scikit-learn](https://scikit-learn.org/stable/)
2. [scikit-optimize](https://scikit-optimize.github.io/stable/)
3. [Hyperopt](http://hyperopt.github.io/hyperopt/)
4. [OPTUNA](https://optuna.org)
5. [KerasTuner](https://keras.io/keras_tuner/)
6. [SMAC3](https://automl.github.io/SMAC3/main/)

## More Resources

**1. How can I learn more about relevant Machine Learning and Data Science skills?**
* [Resources to learn more about Python programming](https://trainindata.medium.com/discover-the-best-resources-to-learn-python-for-data-science-35b87d38fadf)
* [Resources to learn more about Machine Learning](https://trainindata.medium.com/find-out-the-best-resources-to-learn-machine-learning-cd560beec2b7)
* [Resources to learn more about Data Science](https://trainindata.medium.com/discover-the-best-resources-to-learn-data-science-2978499fc804)


Keep in mind that machine learning is a very extensive field, and therefore you will likely need to visit multiple courses and resources to get a broad understanding of the different algorithms.


**2) I would like to know more about variable pre-processing and data cleaning for machine learning. What can I do?**


In the final section of this course you will find a link to comprehensive course on feature engineering. Meanwhile, have a look at these articles:


* [Feature Engineering for Machine Learning: A Comprehensive Overview](https://trainindata.medium.com/feature-engineering-for-machine-learning-a-comprehensive-overview-a7ad04c896f8)
* [Best Resources to Learn about Feature Engineering for Machine Learning](https://trainindata.medium.com/best-resources-to-learn-feature-engineering-for-machine-learning-6b4af690bae7)
* [Practical Code Implementation of Feature Engineering Techniques with Python](https://towardsdatascience.com/practical-code-implementations-of-feature-engineering-for-machine-learning-with-python-f13b953d4bcd)


**3) I would like to learn more about feature selection for machine learning. Do you know of good resources?**


In the final section of this course you will find a link to comprehensive course on feature selection. Meanwhile, have a look at these articles:
* [Feature Selection for Machine Learning: A Comprehensive Overview](https://trainindata.medium.com/feature-selection-for-machine-learning-a-comprehensive-overview-bd571db5dd2d)


[You can also find resources to improve your skills further.](https://trainindata.medium.com/)

# 1. Performance Metrics

[scikit-learn metrics](https://scikit-learn.org/stable/modules/model_evaluation.html)

## a. Classification Performance Metrics

Two types of Classification Metrics, that are:

1. dependent of the probability threshold
    * Accuracy
    * Precision, Recall, f-score
    * False Positive Rate (FPR) and False Negative Rate (FNR)
2. independent of the probability threshold
    * ROC-AUC
    
**Accuracy:** is the percentage of correct predictions.</br>
90% accuracy means out of 10 predictions made by the Model, 9 of them were correct. 90/100, 900/1000 ...
> accuracy = no. of correct predictions / total no. of predictions 

<img src="images/confusion_matrix.png"/>

**Precision or Positive Predicted Value = TP / (TP + FP)** 
> * is True Positive divided by Total no. of Positive Classes predicted by the Model (no matter it was actually positive or not) i.e True Positive + False Positive.

**Recall or Sensitivity or True Positive Rate = TP / (TP+FN)** 
> * is True Positive divided by Total no. of actual Positive classes i,e True Positive + False Negative (Actually it was Positive but predicted Negative).
> * E.g: Suppose your gf asks you to recall how many number of times you both went for a date. So, the total number of dates you were able to recall correctly will be your True Positive(TP) and Total no. of dates you recalled incorrectly will be your False Negative(FN) i.e they were positive but you recalled it incorrectly (negative).

**f-score = 2 x precision x recall / precision + recall** 
> is weighted harmonic mean of precision and recall.

**False Positive Rate (FPR) = FP / (FP + TN)**
> is out of all the Negative Class (TN + FP) , how many of them were incorrectly predicted as Positive though they were Negative i,e FP

**False Negative Rate (FNR) = FN / (FN + TP)**
>is out of all the Positive Class (TP + FN) , how many of them were incorrectly predicted as Negative though they were Positive i,e FP

**ROC-AUC:**
>

<img src="images/roc_auc_curve.png"/>

**Log Loss Function $$J(\theta) = -\frac{1}{m} \sum_{i=1}^m y^{(i)}\log (h(z(\theta)^{(i)})) + (1-y^{(i)})\log (1-h(z(\theta)^{(i)}))\tag{5} $$**


**Note:**
> * Accuracy, Precision, Recall, f1-score, ROC-AUC curve **inversely proportional to** model performance i.e (smaller the value greater the model)
> * FPR, FNR, Log Loss **directly proportional to** model performance i.e (greater the value greater the model)

## b. Regression Performance Metrics

<img src="images/regression.png"/>

**Note:**
* **MSE, RMSE, MAE is inversely proportional to model performance** We want to minimize the value of MSE, RMSE, MAE because closer the value to 0 greater is the performance of the model, since the loss is minimum near zero.
> **MSE, RMSE, MAE:** are the measure of the distance between the true label and predicted label, lower the distance (value) better the model.
* **R2 score is directly proportional to model performance** We want to increase the value of r2 since closer the value to 1 greater is the model.
> **R2 score:** is the measure of variability of the dataset. If r2 score of the model is 0.4 it means that the model explains 40% of the variability present in the dataset and the rest it cannot. And if r2 score is 1 that means the model is a perfect model because it explains 100% of the variability present in the dataset.

## c. Create your own Perfrormance Metrics with scikit-learn

# 2. Cross Validation Schemes

## a. K-Fold CV

## b. Leave One Out (LOOCV)

## c. Leave P Out (LPOCV)

## d. Repeated K-Fold CV

## e. Stratified Cross-Validation

## f. Nested Cross-Validation

# 3. Basic Search Algorithms

## a. Manual Search CV 

## b. Grid Search CV

## c. Random Search CV

# 4. Bayesian Optimization

# 5. Advanced Optimization Techniques

# 6. Open Source Packages