# Ensembles

At a high level, Ensemble Methods is about brining together multiple models (called weak learners) so that the result is an incredibly powerful and more accurate model (called a strong learner).

There are several strategies and tricks involved in this, and this lesson will especially focus on bagging and boosting.

![image.png](attachment:image.png)


Commonly the "weak" learners you use are decision trees. In fact, the default for most ensemble methods is a decision tree in sklearn. However, this value can change to any of the models seen so far.

![image-2.png](attachment:image-2.png)

# Why Would We Want to Ensemble Learners Together?

There are two competing variables in finding a well-fitting machine learning model:

- Bias
- Variance.


### Bias

When a model has a high bias, this means that means it doesn't do a good job of bending to the data. 

An example of an algorithm that usually has a high bias is linear regression. 

Even with completely different datasets, we end up with the same line fit to the data. 

When models have high bias, this is bad.

![image.png](attachment:image.png)

### Variance

When a model has high variance, this means that it changes drastically to meet the needs of every point in our dataset.

Linear models like the one above has low variance, but high bias. 

An example of an algorithm that tends to have high variance and low bias is a decision tree (especially decision trees with no early stopping parameters). A decision tree, as a high variance algorithm, will attempt to split every point into its own branch if possible. 


This is a trait of high variance, low bias algorithms - they are extremely flexible to fit exactly whatever data they see.

By combining algorithms, we can often build models that perform better by meeting in the middle in terms of bias and variance. 

There are some other tactics that are used to combine algorithms in ways that help them perform better as well.

These ideas are based on minimizing bias and variance based on mathematical theories, like the central limit theorem.


# Introducing Randomness Into Ensembles

Another method that is used to improve ensemble methods is to introduce randomness into high variance algorithms before they are ensembled together. The introduction of randomness combats the tendency of these algorithms to overfit (or fit directly to the data available). There are two main ways that randomness is introduced:

- Bootstrap the data - that is, sampling the data with replacement and fitting your algorithm to the sampled data.
- Subset the features - in each split of a decision tree or with each algorithm used in an ensemble, only a subset of the total possible features are used.