# Improving performance with ensembles
* Ensembles can give us the boost in accuracy on our dataset.
## Combining Models Into Ensemble Predictions
* The three most popular methods for combining the predictions from different models are  :
### Bagging
* Building Multiple models from different subsamples of the training dataset.
### Boosting
* Building multiple models each of which learns to fix the prediction errors of a prior model in the sequence of models.
### Voting
* Building multiple models and simple statistics(like calculating the mean) are used to combine predictions.

## 1. Bagging Algorithms
* **Bootstrap Aggregation(Bagging)** involves taking multiple samples from our training dataset and training a model for each sample.
* **The final output prediction is averaged across the predictions of all sub-models**.
* The three bagging models are : 
    * **Bagged Decision Trees**
    * **Random Forest**
    * **Extra Trees**


### 1.1 Bagged Decision Trees
* Bagging performs best with algorithms that have high variance.
* A popular example are decision trees,often constructed without pruning.

In [1]:
## Bagged Decision Trees for Classification
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
filename = 'pima-indians-diabetes.data.csv'
names = ['preg','plas','pres','skin','test','mass','pedi','age','class']
dataframe = read_csv(filename,names=names)
array = dataframe.values
X = array[:,0:8]
Y = array[:,8]
kfold = KFold(n_splits=10,random_state=None)
cart = DecisionTreeClassifier()
num_trees = 100
model = BaggingClassifier(base_estimator=cart,n_estimators=num_trees,random_state=None)
results = cross_val_score(model,X,Y,cv=kfold)
print(results.mean())

0.7577238550922762


### 1.2 Random Forest
* It is an extension of bagged decision trees
* Samples of the training dataset are taken with replacement,but the trees are constructed in a way that reduces the correlation between individual classifiers.
* Only a random subset of features are considered for each split.


In [2]:
# Random Forest Classification
from sklearn.ensemble import RandomForestClassifier
## split points chosen from a random selection of 3 features.
max_features = 3
model = RandomForestClassifier(n_estimators=num_trees,max_features=max_features)
results = cross_val_score(model,X,Y,cv=kfold)
print(results.mean())

0.7499829118250172


### 1.3 Extra Trees

In [3]:
# Extra Trees Classification
from sklearn.ensemble import ExtraTreesClassifier
max_features = 7
model = ExtraTreesClassifier(n_estimators=num_trees,max_features=max_features)
results = cross_val_score(model,X,Y,cv=kfold)
print(results.mean())

0.7512645249487355


## 2. Boosting Algorithms
* It creates a sequence of models that attempt to correct the mistakes of the models before them in the sequence.
* The two most common boosting ensemble machine learning algorithms are:
    * **AdaBoost**
    * **Stochastic Gradient Boosting**

### 2.1 AdaBoost
* It generally works by weighting instances in the dataset by how easy or difficult they are to classify,allowing the algorithm to pay or less attention to them in the construction of subsequent models.

In [4]:
# AdaBoost Classification
from sklearn.ensemble import AdaBoostClassifier
num_trees = 30
model = AdaBoostClassifier(n_estimators=num_trees,random_state=None)
results = cross_val_score(model,X,Y,cv=kfold)
print(results.mean())

0.760457963089542


### 2.2 Stochastic Gradient Boosting
* Also called Gradient Boosting Machines

In [5]:
# Stochastic Gradient Boosting Classification
from sklearn.ensemble import GradientBoostingClassifier
num_trees = 100
model = GradientBoostingClassifier(n_estimators=num_trees,random_state=None)
results = cross_val_score(model,X,Y,cv=kfold)
print(results.mean())

0.7642857142857143


## 3. Voting Ensemble
* Voting is one of the simplest way of combining the predictions from multiple machine learning algorithms.
* It works by first creating two or more standalone models from our training dataset.
* A voting classifier can then be used to wrap our models and average the predictions of the sub-models when asked to make predictions for new data.
* More advanced methods can learn how to best weight the predictions from the sub-models,this is called stacking(stacked aggregation).


In [6]:
# Voting Ensemble for Classification
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier
# create the sub models
estimators = []
model1 = LogisticRegression()
estimators.append(('logistic',model1))
model2 = DecisionTreeClassifier()
estimators.append(('cart',model2))
model3 = SVC()
estimators.append(('svm',model3))

# create the estimate model
ensemble = VotingClassifier(estimators)
results = cross_val_score(ensemble,X,Y,cv=kfold)
print(results.mean())

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logist

0.7669002050580997


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


# Summary
* We learned about :
    * Bagging ensembles including Bagged Decision Trees,Random Forest and Extra Trees
    * Boo