# Ensemble Averaging
- Engineering multiple different models and allow them to form an opinion toward the final prediction.
- Train each of the predictive models using the same set of training data
- Predict the result by using each and every models and average their values

## How to Apply
1. Develop multiple predictive models that are each capable of making their own predictions
2. Train each of the predictive models using the same set of training data
3. Predict the result by using each and every models and average their values

## Why Does it work

- Bias and Variance
- Bias: High Bias has Lower flexibility and underfits
- Variance: measures how the performance of the model on different training data. High variance leads to overfitting

### Import Modules

In [7]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import VotingClassifier 
from sklearn.neighbors import KNeighborsClassifier

### Make dataset with 50 features and 10000

In [4]:
X, y = make_classification(n_samples=10000, n_features=50, n_redundant= 15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=43)

# Classification

### Train and Test 3 separate models

In [5]:
models = [('Decision Tree', DecisionTreeClassifier()),
         ('KNN', KNeighborsClassifier()),
         ('Naive Bayes', GaussianNB())]
for name, model in models:
    model.fit(X_train, y_train)
    
    prediction = model.predict(X_test)
    score = accuracy_score(y_test, prediction)
    print('{} Model Accuracy: {}'.format(name,score))

Decision Tree Model Accuracy: 0.9066666666666666
KNN Model Accuracy: 0.9003333333333333
Naive Bayes Model Accuracy: 0.884


### Voting classifier accuracy

In [6]:
ensemble = VotingClassifier(estimators=models)
ensemble.fit(X_train, y_train)
prediction = ensemble.predict(X_test)
score = accuracy_score(y_test, prediction)
print('Ensemble Model Accuracy: {}'.format(score))

Ensemble Model Accuracy: 0.9153333333333333


# Regression

In [11]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_regression
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import VotingRegressor

In [12]:
X, y = make_regression(n_samples=10000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=43)

In [17]:
models = [('Support Vector', SVR()),
         ('Decision Tree', DecisionTreeRegressor()),
         ('KNN', KNeighborsRegressor())]
scores = []
for name, model in models:
    model.fit(X_train, y_train)
    
    prediction = model.predict(X_test)
    score = mean_squared_error(y_test, prediction, squared = False)
    scores.append(score)
    print('{} Model RMSE: {}'.format(name,score))

Support Vector Model RMSE: 63.22316958162251
Decision Tree Model RMSE: 82.15828410504501
KNN Model RMSE: 52.76639058475637


In [18]:
ensemble = VotingRegressor(estimators=models)
ensemble.fit(X_train, y_train)
prediction = ensemble.predict(X_test)
score = mean_squared_error(y_test, prediction, squared = False)
print('Ensemble Model RMSE: {}'.format(score))

Ensemble Model RMSE: 50.988429787045675


In [None]:
#