# Flaws of linear regression
- overfitting with many input features
- difficult to express non-linear relationships

# Algorithms:
- Lasso Regression
- Ridge Regression
- Elastic-net
- Random forest
- Boosted trees

## Overfitting:
Overfitting means the machine is memorizing the training data instead of learning the pattern from data.

### How overfitting occurs:
- model is too complex

### Skills to fight overfitting:
- regularizations
- ensembling
- train/test splitting
- cross-validation


# Regularizations

## Cost functions
- measure the error of predictions
- quantify how inaccurate
- compare performance accross models
- aka loss functions / error functions

# Lasso
- relies on absolute size
- coefficient 0 (auto feature selection)
- strength of penalty should be tuned
- stronger penalty, more coefficient push to 0

```
from sklearn.linear_model import Lasso
```

# Ridge
- relies on squared error
- leads to smaller coefficient, doesn't force to 0 (feature shrinkage)
- penalty strength should be tuned
- stronger penalty, more coefficient push to 0

```
from sklearn.linear_model import Ridge
```

# Elastic-net
- compromise between Lasso and Ridge
- combines both L1(absolute) and L2(square) penalties
- ratio of L1 and L2 should be tuned
- strength of penalty should be tuned
- setting ratio to 0 or 1

```
from sklearn.linear_model import ElasticNet
```

## Notes:
- in some cases, lasso and ridge are easier to tune

# Overcome difficulty in expressing non-linear relationships

## Ensembling
- combining multiple individual models into a single model

## Ensembling methods
- Bagging
    * reduce the chance of overfitting complex models
    * train large number of 'strong' learners in **parallel**
        - (strong learners is model that's allowed to have high complexity)
    * combine all strong learners to smooth out their predictions
- Boosting
    * improve predictive flexibility of simple models
    * train large number of 'weak' learners in **sequence**
        - a weak learner is model that has limited complexity
    * each one in the sequence focuses on learning from **mistakes** of the one before it
    * combines all weak learners into single strong learner
    
- Both of the methods aimed to approach problems in opposite directions


# Tree ensembling algorithms
## Random forests
- train large number of **strong decision trees**
- combine predictions through bagging
- Sources of **randomness**:
    * each decision tree only allowed to choose from **random subset of features** to split on
    * each decision tree only trained on **random subset of observations** (resampling)
    
```
from sklearn.ensemble import RandomForestRegressor
```

## Boosted trees
- sequeunce of weak, constrained decision trees and combine their predictions through boosting
- each decision tree allows a **maximum depth** (should be tuned)
- each decision tree tries to correct the prediction errors of the one before it
- highest performance ceiling
- more complicated than random forests

```
from sklearn.ensemble import GradientBoostingRegressor
```

