#### 1. If you have trained five different models on the exact same training data, and they all achieve 95% precision, is there any chance that you can combine these models to get better results? If so, how? If not, why?

You could combine them in an ensemble to improve the precision. The improvement will be denpendent on the degree of independency of the models. To increase performance, some form of random sampling (bagging or pasting) would be necessary. Since they are all strong predictors already, the improvement will most likely be subtle, unless the number of models increase dramatically (how many?). Furthermore, if the models output probabilities, soft-voting (votes weighted by each class probability) might even be an extra improvement.

#### 2. What is the difference between hard and soft voting classifiers?

In a ensemble classifier, hard voting picks the most voted on class (statistical mode) as the ensemble prediction. Soft-voting selects the most voted class weighted by the class probability (confidence of the model on the specific classification). Therefore, for soft-voting to be possible, each predictor in the ensemble must output a probability for each predicted class. 

#### 3. Is it possible to speed up training of a bagging ensemble by distributing it across multiple servers? What about pasting ensembles, boosting ensembles, Random Forests, or stacking ensembles?

Some form of pararell computing is possible in all cited options, except for boosting ensembles. Boosting ensembles work by adding a new predictor based on the results of the previous one, therefore the models are trained sequentially, which makes paralell computing difficult. Regarding the other ensemble methods listed above, all predictors can be trained simultaneously, and therefore it is possible to distribute them across multiple servers. Regarding Stacking ensembles, although each layer is trained on the results of the previous one, all the predictors on the same layer must be independent, and therefore could be distributed across multiple servers.

#### 4. What is the benefit of out-of-bag evaluation?

Indenpendet of the amount of data, when performing bagging on average 37% of the data is not sampled (when on a ensemble, this 37% is of couse different for each predictor/model). Therefore this out-of-bag data can be used as a validation set to measure the model's performance, before testing on the test set itself.

Book answer: With out-of-bag evaluation, each predictor in a bagging ensemble is evaluated using instances that it was not trained on (they were held out). This makes it possible to have a fairly unbiased evaluation of the ensemble without the need for an additional validation set. Thus, you have more instances available for training, and your ensemble can perform slightly better.

#### 5. What makes Extra-Trees more random than regular Random Forests? How can this extra randomness help? Are Extra-Trees slower or faster than regular Random Forests?

Both Extra-Trees and Random Forests, when perform splitting on a node, only a random sample of the features is used. But when using Extra-Trees, the splitting threshold itself is random, skipping the minimization of the loss function, which makes the algorithm faster. This randomness might help to prevent overfitting (decreasing variance), therefore it is a form of regularization. 

#### 6. If your AdaBoost ensemble underfits the training data, which hyperparameters should you tweak and how?

You could increase the number of estimators, or decrease the regularization on the base estimators. Maybe increase learning rate?

#### 7. If your Gradient Boosting ensemble overfits the training set, should you increase or decrease the learning rate?

Book answer: If your Gradient Boosting ensemble overfits the training set, you should try decreasing the learning rate. You could also use early stopping to find the right number of predictors (you probably have too many).

#### 8. Load the MNIST data (introduced in Chapter 3), and split it into a training set, a validation set, and a test set (e.g., use 50,000 instances for training, 10,000 for validation, and 10,000 for testing). Then train various classifiers, such as a Random Forest classifier, an Extra-Trees classifier, and an SVM classifier. Next, try to combine them into an ensemble that outperforms each individual classifier on the validation set, using soft or hard voting. Once you have found one, try it on the test set. How much better does it perform compared to the individual classifiers?
#### 9. Run the individual classifiers from the previous exercise to make predictions on the validation set, and create a new training set with the resulting predictions: each training instance is a vector containing the set of predictions from all your classifiers for an image, and the target is the image’s class. Train a classifier on this new training set. Congratulations, you have just trained a blender, and together with the classifiers it forms a stacking ensemble! Now evaluate the ensemble on the test set. For each image in the test set, make predictions with all your classifiers, then feed the predictions to the blender to get the ensemble’s predictions. How does it compare to the voting classifier you trained earlier?

In [1]:
#Standard libs
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#Sklearn
from sklearn.datasets import fetch_openml
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import ExtraTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import BaggingClassifier

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


  return f(*args, **kwds)


## Exercise 8

In [11]:
mnist = fetch_openml('mnist_784', version=1, as_frame=False)

In [12]:
mnist

{'data': array([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]]),
 'target': array(['5', '0', '4', ..., '4', '5', '6'], dtype=object),
 'frame': None,
 'categories': {},
 'feature_names': ['pixel1',
  'pixel2',
  'pixel3',
  'pixel4',
  'pixel5',
  'pixel6',
  'pixel7',
  'pixel8',
  'pixel9',
  'pixel10',
  'pixel11',
  'pixel12',
  'pixel13',
  'pixel14',
  'pixel15',
  'pixel16',
  'pixel17',
  'pixel18',
  'pixel19',
  'pixel20',
  'pixel21',
  'pixel22',
  'pixel23',
  'pixel24',
  'pixel25',
  'pixel26',
  'pixel27',
  'pixel28',
  'pixel29',
  'pixel30',
  'pixel31',
  'pixel32',
  'pixel33',
  'pixel34',
  'pixel35',
  'pixel36',
  'pixel37',
  'pixel38',
  'pixel39',
  'pixel40',
  'pixel41',
  'pixel42',
  'pixel43',
  'pixel44',
  'pixel45',
  'pixel46',
  'pixel47',
  'pixel48',
  'pixe

In [18]:
X, y = mnist["data"], mnist["target"]

In [23]:
scaler = StandardScaler()
X_sc = scaler.fit_transform(X)

In [30]:
X_zero, X_test, y_zero, y_test = X_sc[:60000], X_sc[60000:], y[:60000], y[60000:]

In [31]:
X_train, X_val, y_train, y_val = train_test_split(X_zero, y_zero, test_size=10000, random_state=42)

In [32]:
len(X_train)

50000

In [33]:
len(X_val)

10000

In [34]:
# Training a KNN

knn = KNeighborsClassifier(n_neighbors=4, weights='distance')

knn.fit(X_train, y_train)

KNeighborsClassifier(n_neighbors=4, weights='distance')

In [35]:
y_pred = knn.predict(X_val)

In [37]:
accuracy_score(y_val, y_pred)

0.9515

In [39]:
#Training a Logistic Regressor

log_reg = LogisticRegression(max_iter=10000)
log_reg.fit(X_train,y_train)

LogisticRegression(max_iter=10000)

In [40]:
y_pred = log_reg.predict(X_val)
accuracy_score(y_val, y_pred)

0.915

In [42]:
#Training Extra-Trees

extra_trees = ExtraTreeClassifier()
extra_trees.fit(X_train, y_train)

ExtraTreeClassifier()

In [43]:
y_pred = extra_trees.predict(X_val)

In [44]:
accuracy_score(y_val, y_pred)

0.8102

In [45]:
X_train.shape[0]

50000

In [46]:
#Training Random Forest

forest = RandomForestClassifier(verbose=3, random_state=42)

In [47]:
forest.fit(X_train, y_train)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


building tree 1 of 100


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.3s remaining:    0.0s


building tree 2 of 100


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.6s remaining:    0.0s


building tree 3 of 100
building tree 4 of 100
building tree 5 of 100
building tree 6 of 100
building tree 7 of 100
building tree 8 of 100
building tree 9 of 100
building tree 10 of 100
building tree 11 of 100
building tree 12 of 100
building tree 13 of 100
building tree 14 of 100
building tree 15 of 100
building tree 16 of 100
building tree 17 of 100
building tree 18 of 100
building tree 19 of 100
building tree 20 of 100
building tree 21 of 100
building tree 22 of 100
building tree 23 of 100
building tree 24 of 100
building tree 25 of 100
building tree 26 of 100
building tree 27 of 100
building tree 28 of 100
building tree 29 of 100
building tree 30 of 100
building tree 31 of 100
building tree 32 of 100
building tree 33 of 100
building tree 34 of 100
building tree 35 of 100
building tree 36 of 100
building tree 37 of 100
building tree 38 of 100
building tree 39 of 100
building tree 40 of 100
building tree 41 of 100
building tree 42 of 100
building tree 43 of 100
building tree 44 of 100

[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:   29.6s finished


RandomForestClassifier(random_state=42, verbose=3)

In [48]:
y_pred = forest.predict(X_val)
accuracy_score(y_val, y_pred)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.5s finished


0.9682

In [50]:
#Training SVC

svc = SVC(verbose=True)
svc.fit(X_train, y_train)

[LibSVM]

SVC(verbose=True)

In [51]:
y_pred = svc.predict(X_val)
accuracy_score(y_val, y_pred)

0.9655

In [53]:
#Combining and Voting

voting_clf = VotingClassifier(estimators=[('knn', knn), ('log_reg', log_reg), ('extra_trees', extra_trees),
                                          ('forest', forest), ('svc', svc)], voting='hard', verbose=True)


In [54]:
voting_clf.fit(X_train, y_train)

[Voting] ...................... (1 of 5) Processing knn, total=   0.0s
[Voting] .................. (2 of 5) Processing log_reg, total= 5.0min
[Voting] .............. (3 of 5) Processing extra_trees, total=   0.5s
building tree 1 of 100


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.4s remaining:    0.0s


building tree 2 of 100


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.8s remaining:    0.0s


building tree 3 of 100
building tree 4 of 100
building tree 5 of 100
building tree 6 of 100
building tree 7 of 100
building tree 8 of 100
building tree 9 of 100
building tree 10 of 100
building tree 11 of 100
building tree 12 of 100
building tree 13 of 100
building tree 14 of 100
building tree 15 of 100
building tree 16 of 100
building tree 17 of 100
building tree 18 of 100
building tree 19 of 100
building tree 20 of 100
building tree 21 of 100
building tree 22 of 100
building tree 23 of 100
building tree 24 of 100
building tree 25 of 100
building tree 26 of 100
building tree 27 of 100
building tree 28 of 100
building tree 29 of 100
building tree 30 of 100
building tree 31 of 100
building tree 32 of 100
building tree 33 of 100
building tree 34 of 100
building tree 35 of 100
building tree 36 of 100
building tree 37 of 100
building tree 38 of 100
building tree 39 of 100
building tree 40 of 100
building tree 41 of 100
building tree 42 of 100
building tree 43 of 100
building tree 44 of 100

[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:   32.1s finished


[Voting] ................... (4 of 5) Processing forest, total=  32.2s
[LibSVM][Voting] ...................... (5 of 5) Processing svc, total= 5.7min


VotingClassifier(estimators=[('knn',
                              KNeighborsClassifier(n_neighbors=4,
                                                   weights='distance')),
                             ('log_reg', LogisticRegression(max_iter=10000)),
                             ('extra_trees', ExtraTreeClassifier()),
                             ('forest',
                              RandomForestClassifier(random_state=42,
                                                     verbose=3)),
                             ('svc', SVC(verbose=True))],
                 verbose=True)

In [55]:
y_pred = voting_clf.predict(X_val)
accuracy_score(y_val, y_pred)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.4s finished


0.9709

In [56]:
y_pred = voting_clf.predict(X_test)
accuracy_score(y_test, y_pred)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.6s finished


0.9663

## Exercise 9

In [57]:
y_pred

array(['7', '2', '1', ..., '4', '5', '6'], dtype=object)

In [58]:
len(y_pred)

10000

In [59]:
y_pred.ndim

1

In [61]:
type(y_pred[0])

str

In [63]:
for i, clf in enumerate(['knn', 'log_reg', 'extra_trees', 'forest', 'svc']):
    print(f'Index:{i}, clf:{clf}')

Index:0, clf:knn
Index:1, clf:log_reg
Index:2, clf:extra_trees
Index:3, clf:forest
Index:4, clf:svc


In [67]:
y_pred.shape = (len(y_pred),1)

In [68]:
y_pred

array([['7'],
       ['2'],
       ['1'],
       ...,
       ['4'],
       ['5'],
       ['6']], dtype=object)

In [69]:
len(y_pred)

10000

In [70]:
test = np.hstack((y_pred,y_pred))

In [77]:
test = np.hstack((test,test))
test

array([['7', '7', '7', '7'],
       ['2', '2', '2', '2'],
       ['1', '1', '1', '1'],
       ...,
       ['4', '4', '4', '4'],
       ['5', '5', '5', '5'],
       ['6', '6', '6', '6']], dtype=object)

In [72]:
test[0]

array(['7', '7'], dtype=object)

In [82]:
##### Ignore the above (testing) ###########

classifiers = [knn, log_reg, extra_trees, forest, svc]

X_stack = np.zeros((10000,1), dtype=str)

for i, clf in enumerate(classifiers):
    y_pred = clf.predict(X_val)
    y_pred.shape = (len(y_pred), 1)
    if i==0:
        X_stack = y_pred
    else:
        X_stack = np.hstack((X_stack, y_pred))
    

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.4s finished


In [83]:
X_stack

array([['7', '7', '7', '7', '7'],
       ['3', '3', '3', '3', '3'],
       ['8', '8', '8', '8', '8'],
       ...,
       ['9', '9', '9', '9', '9'],
       ['8', '8', '8', '8', '8'],
       ['1', '2', '2', '2', '1']], dtype=object)

In [88]:
forest_stack = RandomForestClassifier(verbose=3, random_state=42)
forest_stack.fit(X_stack, y_val)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s


building tree 1 of 100
building tree 2 of 100
building tree 3 of 100
building tree 4 of 100
building tree 5 of 100
building tree 6 of 100
building tree 7 of 100
building tree 8 of 100
building tree 9 of 100
building tree 10 of 100
building tree 11 of 100
building tree 12 of 100
building tree 13 of 100
building tree 14 of 100
building tree 15 of 100
building tree 16 of 100
building tree 17 of 100
building tree 18 of 100
building tree 19 of 100
building tree 20 of 100
building tree 21 of 100
building tree 22 of 100
building tree 23 of 100
building tree 24 of 100
building tree 25 of 100
building tree 26 of 100
building tree 27 of 100
building tree 28 of 100
building tree 29 of 100
building tree 30 of 100
building tree 31 of 100
building tree 32 of 100
building tree 33 of 100
building tree 34 of 100
building tree 35 of 100
building tree 36 of 100
building tree 37 of 100
building tree 38 of 100
building tree 39 of 100
building tree 40 of 100
building tree 41 of 100
building tree 42 of 100
b

[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.4s finished


RandomForestClassifier(random_state=42, verbose=3)

In [92]:
#### Computing predictions using the test set

X_stack_test = np.zeros((10000,1), dtype=str)

for i, clf in enumerate(classifiers):
    y_pred = clf.predict(X_test)
    y_pred.shape = (len(y_pred), 1)
    if i==0:
        X_stack_test = y_pred
    else:
        X_stack_test = np.hstack((X_stack_test, y_pred))
    

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.4s finished


In [93]:
X_stack_test

array([['7', '7', '7', '7', '7'],
       ['2', '2', '2', '2', '2'],
       ['1', '1', '1', '1', '1'],
       ...,
       ['4', '4', '4', '4', '4'],
       ['5', '5', '5', '5', '5'],
       ['6', '6', '6', '6', '6']], dtype=object)

In [95]:
y_stack_test_pred = forest_stack.predict(X_stack_test)
accuracy_score(y_test, y_stack_test_pred)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    0.1s finished


0.967