**Chapter 7 - Ensembles** <br>

My solutions for the exercises

<td>
    <a href="https://colab.research.google.com/github/nikitaosovskiy/hadnson_ml/blob/main/08-ensembles/ensembles.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
</td>

In [14]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from scipy.stats import mode

%matplotlib inline
import matplotlib.pyplot as plt

# Ex 8

*Task:* To Compare a solo models with ensemble

## Load data

In [4]:
mnist = fetch_openml('mnist_784', version=1, as_frame=False)

In [5]:
X = mnist['data']
y = mnist['target']

In [6]:
X_train, X_val, X_test, y_train, y_val, y_test = X[:50000], X[50000:60000], X[60000:], y[:50000], y[50000:60000], y[60000:]

## Training

In [11]:
rnd_forest = RandomForestClassifier()
rnd_forest.fit(X_train, y_train)

RandomForestClassifier()

In [16]:
y_pred = rnd_forest.predict(X_test)

accuracy_score(y_test, y_pred)

0.9681

In [12]:
extra_clf = ExtraTreesClassifier()
extra_clf.fit(X_train, y_train)

ExtraTreesClassifier()

In [17]:
y_pred = extra_clf.predict(X_test)

accuracy_score(y_test, y_pred)

0.9703

In [13]:
svm_clf = SVC()
svm_clf.fit(X_train, y_train)

SVC()

In [18]:
y_pred = svm_clf.predict(X_test)

accuracy_score(y_test, y_pred)

0.9785

## Making ensemble

### #1

In [19]:
fin_pred = []

In [20]:
for clf in (rnd_forest, extra_clf, svm_clf):
    predictions = clf.predict(X_val)
    fin_pred.append(predictions)

In [34]:
pred_results = mode(fin_pred, axis=0).mode[0]

In [35]:
accuracy_score(y_val, pred_results)

0.9768

### #2

In [45]:
rnd_forest = RandomForestClassifier()
extra_clf = ExtraTreesClassifier()
svm_clf = SVC(probability=True)

In [46]:
voting_clf = VotingClassifier(
    estimators=[('rnd_for', rnd_forest), ('extra_clf', extra_clf), ('svc', svm_clf)],
    voting='soft')

In [47]:
voting_clf.fit(X_train, y_train)

VotingClassifier(estimators=[('rnd_for', RandomForestClassifier()),
                             ('extra_clf', ExtraTreesClassifier()),
                             ('svc', SVC(probability=True))],
                 voting='soft')

In [48]:
y_pred = voting_clf.predict(X_val)

accuracy_score(y_val, y_pred)

0.9813

Accuracy of VotingClassifier is better than accurancy of solo models

# Ex 9

*Task:* To make a Ensemble with Stacking

In [7]:
rnd_forest = RandomForestClassifier()
extra_clf = ExtraTreesClassifier()
svm_clf = SVC(probability=True)

In [8]:
rnd_forest.fit(X_train, y_train)
extra_clf.fit(X_train, y_train)
svm_clf.fit(X_train, y_train)

SVC(probability=True)

In [10]:
predictions = []
for model in (rnd_forest, extra_clf, svm_clf):
    predictions.append(model.predict(X_val))

In [11]:
predictions = np.array(predictions).T

In [15]:
blender = DecisionTreeClassifier()

blender.fit(predictions, y_val)

DecisionTreeClassifier()

In [16]:
test_predictions = []
for model in (rnd_forest, extra_clf, svm_clf):
    test_predictions.append(model.predict(X_test))

In [17]:
test_predictions_predictions = np.array(test_predictions).T

In [18]:
y_pred = blender.predict(test_predictions)

In [19]:
accuracy_score(y_test, y_pred)

0.9756

Stacking predictions not as good as the previous classifier. So in this situation preferable to use *VotingClassifier with soft voting*