## Supervised Machine Learning Models - Part 2

### Table of Contents

* [PART 1](#chapter1)  

    * [Ensemble Learning algorithms](#section_1_1)
        1. [Bagging algorithms](#Section_1_1_1)
        2. [Random Forest](#section_2_1_1)
        3. [Boosting algorithms](#section_3_1_1)
             * [Gradient Boosting](#section_3_2_1)
             * [XGBoost & AdaBoost](#section_3_2_2)
        4. [Stacking](#section_4_1_1)
<br>

* [PART 2](#chapter2)

    * [Cross Validation](#section_4_1)
    * [HyperParameter Tuning](#section_5_1)
        * [GridsearchCV](#section_5_1_1)
        * [RandomSearchCV](#section_5_1_2)
    * [Model Comparison](#section_6_1)

In [8]:
'''
Important Notebook tips

<font color='Brown'>**Binary Classifier:**</font>

<br>

<img src="https://editor.analyticsvidhya.com/uploads/85598tomek.png" width="300"/> <br>

<p float="left">
<img src="Images/ML_3.png" width="300"/>
<img src="Images/ML2.png" width="450"/>
<p>
'''

'\nImportant Notebook tips\n\n<font color=\'Brown\'>**Binary Classifier:**</font>\n\n<br>\n\n<img src="https://editor.analyticsvidhya.com/uploads/85598tomek.png" width="300"/> <br>\n\n<p float="left">\n<img src="Images/ML_3.png" width="300"/>\n<img src="Images/ML2.png" width="450"/>\n<p>\n'

## PART 1 <a class="anchor" id="chapter1"></a>

## Ensemble Learning algorithms <a class="anchor" id="section_1_1"></a>






### 1. Bagging algorithms <a class="anchor" id="Section_1_1_1"></a>

### 2. Random Forest <a class="anchor" id="section_2_1_1"> </a>

### 3. Boosting algorithms <a class="anchor" id="section_3_1_1"></a>

#### 3.A Gradient Boosting <a class="anchor" id="section_3_2_1"></a>

#### 3.B XG Boosting & ADA Boosting <a class="anchor" id="section_3_2_2"></a>

### 4 Stacking <a class="anchor" id="section_4_4_1"></a>

## PART 2 <a class="anchor" id="chapter2"></a>

### Cross Validation <a class="anchor" id="section_4_1"></a>






**Cross-validation** is a statistical method used to estimate the performance (or accuracy) of machine learning models. It is used to protect against overfitting in a predictive model, particularly in a case where the amount of data may be limited. In cross-validation, you make a fixed number of folds (or partitions) of the data, run the analysis on each fold, and then average the overall error estimate.

<font color='Brown'>**Need of CV:**</font>

1. To Avoid Overfitting:
When we train a model on the training set, it tends to overfit most of the time, thus we utilise regularisation approaches to avoid this. Because we only have a few training instances, we must be cautious while lowering the number of training samples and conserving them for testing.

2. Support Model tuning:
Finding the best combination of model parameters is a common step to tune an algorithm toward learning the dataset’s hidden patterns. But doing this step on a simple training-testing split is typically not recommended. The model performance is usually very sensitive to such parameters and adjusting those based on a predefined dataset split should be avoided. It can cause the model to overfit and reduce its ability to generalize.

<font color='Brown'>**Types of CV:**</font>

1. Train/Test Split: Taken to one extreme, k may be set to 2 (not 1) such that a single train/test split is created to evaluate the model.
2. K-Fold Cross Validation
3. Stratified K-fold Cross-Validation
4. Leave One-out Cross Validation
5. Holdout Method

### K-Fold Cross Validation

<font color='Brown'>**How it works:**</font>


1. Pick a number of folds – K. Usually, k is 5 or 10 but you can choose any number which is less than the dataset’s length.
2. Split the dataset into k equal (if possible) parts (they are called folds)
3. Choose k – 1 folds as the training set. The remaining fold will be the test set
4. Train the model on the training set. On each iteration of cross-validation, you must train a new model independently of the model trained on the previous iteration
5. Validate on the test set and save the result
6. Repeat steps 3 – 6 *K* times. Each time use the remaining  fold as the test set. In the end, you should have validated the model on every fold that you have.


<img src="https://editor.analyticsvidhya.com/uploads/16042grid_search_cross_validation.png" width="500"/> <br>


<font color='Brown'>**Advantages of Cross-Validation :**</font>

1. Use All Your Data
2. Parameters Fine-Tuning

In [19]:
# from sklearn.model_selection import cross_val_score
# print(cross_val_score(model, X_train, y_train, cv=5))

### HyperParameter Tuning <a class="anchor" id="section_5_1"></a>

Choosing the correct set of hyperparameters to tune the models minimizes the loss function and achieves better results. 

**Model parameters:** These are the parameters that are estimated by the model from the given data. <br>
**Model hyperparameters:** These are the parameters that cannot be estimated by the model from the given data. These parameters are used to estimate the model parameters.

<font color='Brown'>**How it works:**</font>

Cross-Validation has two main steps: splitting the data into subsets (called folds) and rotating the training and validation among them. The splitting technique commonly has the following properties:

- Each fold has approximately the same size.
- Data can be randomly selected in each fold or stratified.​
- All folds are used to train the model except one, which is used for validation. That validation fold should be rotated until all folds have become a validation fold once and only once.​
- Each example is recommended to be contained in one and only one fold.​

K-fold and CV are two terms that are used interchangeably. K-fold is just describing how many folds you want to split your dataset into. Many libraries use k=10 as a default value representing 90% going to training and 10% going to the validation set. The next figure describes the process of iterating over the picked ten folds of the dataset.

<font color='Brown'>**Types of Hyperparameter tuning:**</font>  

1. **Manual:** select hyperparameters based on intuition/experience/guessing, train the model with the hyperparameters, and score on the validation data. Repeat process until you run out of patience or are satisfied with the results.
2. **Grid Search:** set up a grid of hyperparameter values and for each combination, train a model and score on the validation data. In this approach, every single combination of hyperparameters values is tried which can be very inefficient!
3. **Random search:** set up a grid of hyperparameter values and select random combinations to train the model and score. The number of search iterations is set based on time/resources.

<font color='Brown'>**Important Parameters:**</font>  

- **get_params** -->  Get parameters for this estimator.

- **cv** -->  Determines the cross-validation splitting strategy - *None*, to use the default 5-fold cross validation,

- **best_estimator_** -->  Estimator which gave highest score (or smallest loss if specified) on the left out data

- **best_score_** -->  Mean cross-validated score of the best_estimator. 

- **best_params_** -->  Parameter setting that gave the best results on the hold out data. 

#### 1. GridSearchCV <a class="anchor" id="section_5_1_1"></a>


In the grid search method, we create a grid of possible values for hyperparameters. Each iteration tries a combination of hyperparameters in a specific order. It fits the model on each combination of hyperparameters possible and records the model performance. Finally, it returns the best model with the best hyperparameters.

- **param_grid** -->  Dictionary with parameters names (str) as keys and distributions or lists of parameters to try. 

#### 2. RandomsearchCV <a class="anchor" id="section_5_1_2"></a>

In the random search method, we create a grid of possible values for hyperparameters. Each iteration tries a random combination of hyperparameters from this grid, records the performance, and lastly returns the combination of hyperparameters that provided the best performance.


- **param_distributions** -->  Dictionary with parameters names (str) as keys and distributions or lists of parameters to try.

### Model Comparison <a class="anchor" id="section_6_1"></a>

1. Time complexity

2. Space complexity

3. Sample complexity

4. Bias-variance tradeoff

5. Methodology, Assumptions and Objectives