** Model aggregation **

* Bagging algorithm
* Boosting algorithm
* Voting algorithm



1. This approach allows us to improve the model accuracy.
2. Lower error.
3. Higher consistency that means avoids over fitting.
4. Reduce bias and variance error.**

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, BaggingClassifier, AdaBoostClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

**Loading Iris Dataset**

In [None]:
data = pd.read_csv('../input/iris-flower-dataset/IRIS.csv', header=0)


In [None]:
data.head(5)

In [None]:
X = data.drop(columns=['species'], axis=1)
y = data['species']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)

# Decision Tree Classifier


In [None]:
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

In [None]:
model.score(X_test, y_test)

In [None]:
model.score(X_train, y_train)

# It's an overfitting

# Random forest classifier 


In [None]:
modelRF = RandomForestClassifier(n_estimators=10)
modelRF.fit(X_train,y_train)

In [None]:
modelRF.score(X_test, y_test)

In [None]:
modelRF.score(X_train, y_train)

# Still Overfitting

# Now let's start with bagging classifier from sklearn


In [None]:
model_bagging = BaggingClassifier(DecisionTreeClassifier(),max_samples=0.5, max_features=1, n_estimators=20)

In [None]:
model_bagging.fit(X_train, y_train)

In [None]:
model_bagging.score(X_test, y_test)

# Let's try ada-boosting


In [None]:
model_bagging = AdaBoostClassifier(DecisionTreeClassifier(), n_estimators=10, learning_rate=1)

In [None]:
model_bagging.fit(X_train, y_train)

In [None]:
model_bagging.score(X_test, y_test)

# Let's check out voting ensemble classifier


In [None]:
lr = LogisticRegression()
svm = SVC(kernel='poly', degree=2)
dt = DecisionTreeClassifier()

In [None]:
final_model = VotingClassifier(estimators=[('lr', lr),('dt', dt), ('svm', svm)], voting='hard')

In [None]:
final_model.fit(X_train, y_train)

In [None]:
final_model.score(X_test, y_test)

# The lesson in this article is that we can improve model accuracy using model aggregation. However, in this case model accuracy is not improved. So, we can conclude that this dataset or approach wasn't appropriate to improve the model accuracy for this particular dataset. You can try MNIST default iris or digit dataset to do the model aggregation. You will observe the changes in model accuracy. 