1. If all the models are different, then they can be combined for better results with a voting ensemble to yield an better model. It works better if the models are different or if the models have some elements of randomness.

2. Hard voting classifiers have a majority vote system in which the classes that are "voted" the most are predicted. Soft voting classifiers compute and then average out all the probabilities for each class and predicts the class that has the highest average probability. Soft voting only works with classifiers that can output class probabilities

3. Bagging can be sped up and distributed among multiple servers. Each server could contain its own subset of the data and a classifier. 
    - The same thing can be done with pasting except without replacement
    - boosting ensembles don't need it because it trains sequentially. Two models don't train at the same time
    - Different serves could train different trees of a random forest
    - stacking ensembles can be sped up by having different models on different servers and he blender (meta learner) on its own server. redictors on one layer can only be trained by the predictors in the layer before it
   
4. Out of bag (OOB) instances are ones that have not been seen by the predictors in a bagging ensemble when it trains predicts on random subsets. On average, only 63% of instances in a training set are seen, while the other 37% are not. With OOB evaluation, the OOB instances are used to evalute the model that was trained. *oob_score param in BaggingClassifier for Scikit*

5. Extra trees randomizes the threshold for the splitting of the features rather than using the best possible threshold that regular decision trees do. This method is faster a regular random forest because finding the best possible threshold for splitting is the most time consuming task of decision trees. 

6. If the adaboost ensemble is underfitting the training data, the hyperparameters to tweak is the number of estimators. Increase the number of estimators. Also, find the best hyperparameters for the base estimator.

7. In Gradient Boosting, *learning_rate* scales the weight of each tree. So a higher learning_rate will decrease the number of predictors needed to fit the training data, but will have low generalization and increase the chance of overfitting; it will also run faster. Vice versa for lower learning_rate

## 8. 

In [91]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X, y = digits.data, digits.target
X_train, X_residue, y_train, y_residue = train_test_split(X, y)
X_validation, X_test, y_validation, y_test = train_test_split(X_residue, y_residue, test_size=0.25)

In [92]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.svm import SVC

rf_clf = RandomForestClassifier(n_estimators=10, random_state=42)
et_clf = ExtraTreesClassifier(n_estimators=10, random_state=42)
svm_clf = SVC(probability=True, random_state=42)

In [93]:
rf_clf.fit(X_train, y_train)
rf_clf.score(X_validation, y_validation)

0.9406528189910979

In [94]:
et_clf.fit(X_train, y_train)
et_clf.score(X_validation, y_validation)

0.9643916913946587

In [95]:
svm_clf.fit(X_train, y_train)
svm_clf.score(X_validation, y_validation)

0.9851632047477745

In [96]:
from sklearn.ensemble import VotingClassifier

estimators = [('random forest', rf_clf), ('extra trees', et_clf), ('SVM classifier', svm_clf)]

hard_voting_clf = VotingClassifier(estimators, voting="hard")
soft_voting_clf = VotingClassifier(estimators, voting="soft")

In [97]:
hard_voting_clf.fit(X_train, y_train)
hard_voting_clf.score(X_validation, y_validation)

0.9851632047477745

In [98]:
soft_voting_clf.fit(X_train, y_train)
soft_voting_clf.score(X_validation, y_validation)

0.9910979228486647

In [99]:
hard_voting_clf.score(X_test, y_test)

0.9911504424778761

In [100]:
svm_clf.score(X_test, y_test)

0.9911504424778761

In [101]:
soft_voting_clf.score(X_test, y_test)

0.9911504424778761

### 9. 

In [118]:
import numpy as np

data = {"data": np.empty((len(y_validation), len(estimators)), dtype=np.float32), "target": y_validation}

for i, (_, estimator) in enumerate(estimators):
    y_validation_pred = estimator.predict(X_validation)
    data["data"][:, i] = y_validation_pred #each feature are the predictions of one estimator
    

In [124]:
rf_clf_blender = RandomForestClassifier(oob_score=True, random_state=42)
rf_clf_blender.fit(data["data"], data["target"])

RandomForestClassifier(oob_score=True, random_state=42)

In [125]:
rf_clf_blender.oob_score_

0.9792284866468842

In [127]:
test_data = {"data": np.empty((len(y_test), len(estimators)), dtype=np.float32), "target": y_test}

for i, (_, estimator) in enumerate(estimators):
    test_data["data"][:, i] = estimator.predict(X_test)
    
y_pred = rf_clf_blender.predict(test_data["data"])

In [129]:
from sklearn.metrics import accuracy_score

accuracy_score(y_test, y_pred)

0.9823008849557522

The accuracy is lower than the voting classifier