# Ensemble Learning
Load the `mnist` dataset. Split it into training and test sets. Train and test an ensemble method, such as random forest or a gradient boosting model, using scikit-learn. Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

## Importing Modules

In [1]:
import pandas as pd
import sklearn.metrics
import sklearn.ensemble
import sklearn.model_selection
import plotly.express as px

## Loading the Dataset

In [2]:
df = pd.read_csv("../../datasets/mnist.csv")
df = df.set_index("id")
df.head(3)

Unnamed: 0_level_0,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
31953,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
34452,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
60897,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Splitting the Data into Training and Test Sets

In [3]:
x = df.drop(["class"], axis=1)
y = df["class"]
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y)

## Model Selection and Hyperparameter Tuning

In [4]:
# Random Forest --------------------
parameters_grid = {
    "criterion": ["gini", "entropy"], 
    "n_estimators": range(50, 260, 50), # [50, 100, 150, 200, 250]
}
model_1 = sklearn.model_selection.GridSearchCV(sklearn.ensemble.RandomForestClassifier(), 
                                               parameters_grid, scoring="accuracy", cv=5, n_jobs=-1)
model_1.fit(x_train, y_train)
print("Accuracy of best Random Forest classfier = {:.2f}".format(model_1.best_score_))
print("Best found hyperparameters of Random Forest classfier = {}".format(model_1.best_params_))
# -----------------------------------

# Gradient Boosting -------------------------------
parameters_grid = {
    "learning_rate": [0.01, 0.05, 0.1, 0.15, 0.2], 
    "n_estimators": range(50, 260, 50),
}
model_2 = sklearn.model_selection.GridSearchCV(sklearn.ensemble.GradientBoostingClassifier(), 
                                               parameters_grid, scoring="accuracy", cv=5, n_jobs=-1)
model_2.fit(x_train, y_train)
print("Accuracy of best Gradient Boosting classfier = {:.2f}".format(model_2.best_score_))
print("Best found hyperparameters of Gradient Boosting classifier = {}".format(model_2.best_params_))
# -----------------------------------

Accuracy of best Random Forest classfier = 0.93
Best found hyperparameters of Random Forest classfier = {'criterion': 'gini', 'n_estimators': 250}
Accuracy of best Gradient Boosting classfier = 0.92
Best found hyperparameters of Gradient Boosting classifier = {'learning_rate': 0.2, 'n_estimators': 250}


## Testing the Best Model

In [5]:
y_predicted = model_1.predict(x_test)
accuracy = sklearn.metrics.accuracy_score(y_test, y_predicted)
cm = sklearn.metrics.confusion_matrix(y_test, y_predicted)
precision, recall, f1, support = sklearn.metrics.precision_recall_fscore_support(y_test, y_predicted)

print("Accuracy =", accuracy)
print("Precision =", precision)
print("Recall =", recall)
print("F1-Score =", f1)
print("Confusion Matrix:\n", cm)

Accuracy = 0.932
Precision = [0.98019802 0.95833333 0.94565217 0.89690722 0.94791667 0.89583333
 0.9375     0.91891892 0.93617021 0.89690722]
Recall = [0.99       0.97457627 0.92553191 0.89690722 0.91919192 0.86
 0.92783505 0.94444444 0.91666667 0.95604396]
F1-Score = [0.98507463 0.96638655 0.93548387 0.89690722 0.93333333 0.87755102
 0.93264249 0.93150685 0.92631579 0.92553191]
Confusion Matrix:
 [[ 99   0   0   0   0   0   0   0   1   0]
 [  0 115   0   0   0   2   0   0   1   0]
 [  0   0  87   1   1   0   1   4   0   0]
 [  1   0   2  87   0   3   1   1   1   1]
 [  0   0   0   0  91   0   0   1   1   6]
 [  1   1   0   6   1  86   3   1   0   1]
 [  0   1   1   0   0   4  90   0   1   0]
 [  0   1   1   0   2   0   0 102   1   1]
 [  0   2   1   2   0   1   1   0  88   1]
 [  0   0   0   1   1   0   0   2   0  87]]
