## Decision Trees

- A decision tree is a supervised learning algorithm used for both classification and regression tasks. It models decisions and their possible consequences in a tree-like structure, making it intuitive and easy to interpret.

- Intuituion
Decision trees are powerful algorithms used in both classification and regression tasks.They operate by recursively partitioning the data based on feature values, forming a tree structure where each node represents a decision point.
This tree structure enables efficient inference, where a new instance is classified or predicted (regression) by traversing down the tree.

- Information gain is a key concept in decision trees, used to decide which feature should be used to split a node. It measures the reduction in entropy, or uncertainty, when a dataset is split based on a particular attribute. Here's a detailed explanation:
Definition
Entropy: In the context of decision trees, entropy is a measure of impurity or disorder within a dataset. It quantifies the uncertainty involved in predicting the class of a randomly selected instance.
Information Gain: It is the reduction in entropy achieved by partitioning the dataset according to an attribute. It quantifies how much information a feature provides about the class.


## Bias 
- Bias refers to the error introduced by approximating a real-world problem, which may be complex, by a simplified model. High bias occurs when the model is too simple and makes strong assumptions about the data. This often leads to underfitting.

- High Bias: Models with high bias tend to oversimplify the data, leading to systematic errors and underfitting. They fail to capture the underlying patterns in the data

## Variance
- Definition: Variance measures how much the model's predictions vary for different training datasets. It reflects the model's sensitivity to small fluctuations in the training data.
- High Variance: Models with high variance are highly sensitive to the training data and may capture noise as if it were a true pattern, leading to overfitting


## Bias-Variance decomposition

Bias/variance decomposes the expected loss into three terms:

1. bias: how wrong the expected prediction is (corresponds to underfitting)
2. variance: the amount of variability in the predictions (corresponds to
overfitting)
3. Bayes error: the inherent unpredictability of the targets or noise in data

![](https://www.cs.cornell.edu/courses/cs4780/2023fa/lectures/images/bias_variance/bullseye.png)



- Bagging:
 - Reduce variance by averaging the predictions from multiple models.
 - Does not increase bias.
- Boosting:
 - Reduce Bias by fitting multiple models which focus on shortcomings of previous models
 - Could potentially increase variance but not by much.


 - Bagging
 Bootstrap Aggregating (Bagging) enhances the robustness of machine learning models by training multiple base models on randomly sampled subsets of the dataset, mitigating overfitting.
 Bagging, short for Bootstrap Aggregating, is an ensemble learning technique used in machine learning to improve the accuracy and stability of models by reducing variance. It involves training multiple base models independently on random subsets of the training data, which are created through a process called bootstrap sampling. This means that each subset is generated by sampling the original dataset with replacement, allowing some data points to appear multiple times in a subset while others may be omitted

 ![](https://media.geeksforgeeks.org/wp-content/uploads/20230731175958/Bagging-classifier.png)


 ## Random Forest
 Bagged decision trees, with one extra trick to de-correlate the predictions
 In each node, choose a random set of k input features, and only consider splits on those features.
 Probably the best black-box ML algorithm
 they often work well with no tuning.
 Most widely used algorithms in Kaggle competitions and industry

 ![](https://serokell.io/files/vz/vz1f8191.Ensemble-of-decision-trees.png)



 # Boosting

 Boosting is an ensemble learning technique in machine learning that aims to improve the predictive performance of models by converting weak learners into a strong learner. It is particularly effective in reducing both bias and variance, leading to more accurate predictions.

 ![](https://media.geeksforgeeks.org/wp-content/uploads/20210707140911/Boosting.png)

- AdaBoost (Adaptive Boosting): One of the first and most popular boosting algorithms, AdaBoost adjusts the weights of the training data based on the errors of previous classifiers, effectively focusing on the difficult cases.
- Gradient Boosting: This algorithm builds models sequentially by optimizing a differentiable loss function. Unlike AdaBoost, which adjusts weights, Gradient Boosting fits new models to the residual errors of previous models, thereby reducing bias.
- XGBoost (Extreme Gradient Boosting): An optimized version of Gradient Boosting, XGBoost is designed for efficiency and scalability. It includes features like parallel processing and regularization to prevent overfitting.