#Bagging
 - Bagging, also known as bootstrap aggregating, is for running multiple models in parallel (the models don't use each other's results in order to predict). Each model gets a vote on the final prediction.

 - For classification problems (predicting a categorical value), we choose the label with the most votes.

 - For regression problems (predicting a continuous value), we average the values given by all the models.

 - You can bag with any collection of algorithms, giving them each a vote to the final prediction.

#Random Forest
The idea is to repeatedly randomly select data from the dataset (with replacement) and build a Decision Tree with each new sample. The default is to have the randomly selected data be the same size as the initial dataset. Note that since we are sampling with replacement, many data points will be repeated in the sample and many won't be included.

Random Forests also limit each node of the Decision Tree to only consider splitting on a random subset of the features.

##Out of Bag Error 
We can analyze a Random Forest using the standard cross validation method of splitting the dataset into a training set and a testing set. However, if we're clever, we notice that each tree doesn't see all of the training data, so we can use the skipped data to cross validate each tree individually.

when selecting from the dataset, about one third of the data is left out (discussed here if you want to think about the math). So every data point can be tested with about 1/3 of the trees. We calculate the percent of these that we get correct, and this is the out-of-bag error.

It has been proven that this is sufficient and that cross validation is not strictly necessary for a random forest, but we often still use it as that makes it easier to compare with other models.

##Feature Importance:
ten still use it as that makes it easier to compare with other models.

Breiman, the originator of random forests, uses out-of-bag error to determine feature importance, discussed here. The idea is to compare the out-of-bag error of the trees with the out-of-bag error of the trees if you change the feature's value (basically, if we screw with the value of the feature, how much does that impact the total error?).
```
For every tree:
        Take the data that is not covered by the tree.Randomly permute the values of the feature (i.e. keep the same values, but shuffle them around the data points).
    Calculate the OOB error on the data with the feature values permuted.
    Subtract the permutated OOB from the OOB of the original data to get the feature importance on this tree.
    Average all the individual feature importances to get the feature importance.
```

Sklearn:

Their method doesn't involve using the out-of-bag score. Basically, the higher in the tree the feature is, the more important it is in determining the result of a data point. The expected fraction of data points that reach a node is used as an estimate of that feature's importance for that tree. Then average those values across all trees to get the feature's importance.

####example instantiations of algorithms  

In [1]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.datasets import load_boston
from sklearn.cross_validation import train_test_split, cross_val_score
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble.partial_dependence import plot_partial_dependence
import matplotlib.pyplot as plt
import numpy as np



get a test dataset for this example

In [3]:
boston = load_boston() 
y = boston.target #house prices
x = boston.data #13 features

#put it into a test train set
x_train, x_test, y_train, y_test =  train_test_split(x, y, test_size = .2)

###Instantiating Estimators:

####RandomForest

In [1]:
rf = RandomForestRegressor(n_estimators=100,
                           n_jobs=-1,
                           random_state=1)

#from class notes:
random_forest_grid = {'max_depth': [3, None],
                      'max_features': ['sqrt', 'log2', None],
                      'min_samples_split': [1, 2, 4],
                      'min_samples_leaf': [1, 2, 4],
                      'bootstrap': [True, False],
                      'n_estimators': [10, 20, 40],
                      'random_state': [1]}


NameError: name 'RandomForestRegressor' is not defined

####Boosting 

In [None]:
gbrt = GradientBoostingRegressor(learning_rate=0.1,
                                 loss='ls',
                                 n_estimators=100,
                                 random_state=1)

#####max_depth:
1. controls depth of interaction
2. so, how many branches the tree goes. May be a stump of 1
3. normally no more than 4, 6.

#####min_samples_per_leaf
1. don't want too few leaves (like only 1) , because then overfit to outliers

#####n_estimators
1. number of trees grown

#####learning_rate
1. slow is good. Lower rate needs more estimators 
2. really important parameter to tune!

###Stochastic Gradient Boosting
both of these can improve accuracy and reduce runtime
#####max_features
1. good when lots of features- randomly sample them

####sub_sample
1. random subset of training set


In [None]:
abr = AdaBoostRegressor(DecisionTreeRegressor(),
                        learning_rate=0.1,
                        loss='linear',
                        n_estimators=100,
                        random_state=1)

at every tree iteration,
anything that was wrong (Residual) is given a weight for the next iteration

###Run GradientBoosting in SKLEARN

In [None]:
from sklearn.grid_search import GridSearchCV

#make a dictinoary of the parameters you want to alter in the gridsearch
param_grid = {'learning rate': [0.1 0.05, 0.01] 
             'max depth': [4,6]
             'min_samples_leaf': [3,4,5,9,17]
             'max_features': [1,3,8]}



'''instantiate your estimator. So, 
    AdaBoostRegressor
    GradientBoostingRegressor
    RandomForestRegressor'''
#if changing learning rate, maybe make n estimators even higher...
est = GradientBoostingRegressor(n_estimators = 3000)

#run through ALL the parameters, and fit the data to all permutations
gs_cv = GridSearchCV(est, param_grid).fit(x_train,y_train)
#this will tell you what the best permaeters were
gs_cv.best_params

#if you want to call the chosen model, can also use this:
best_rf_model = rf_gridsearch.best_estimator_

In [None]:
#another example
rf_gridsearch = GridSearchCV(RandomForestRegressor(),
                             random_forest_grid,
                             n_jobs=-1,
                             verbose=True,
                             scoring='mean_squared_error')

rf_gridsearch.fit(train_x, train_y)

print "best parameters:", gridcv.best_params_

best_rf_model = rf_gridsearch.best_estimator_