<font size="6">**Project definition**</font>

This project aims to evaluate the performance of individual models compared to their combined effect. I will utilize the MNIST dataset, dividing it into a training set (60,000 instances), and a test set (10,000 instances). Various classifiers, including Random Forest, Extra Trees, and Support Vector Machines (SVM), will be trained on the training set. These classifiers will then be combined into an ensemble, and their performances will be assessed on the test set using hard voting.

<font size="5">**Import required libraries**</font>

In [45]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.datasets import fetch_openml
from sklearn.metrics import accuracy_score
import numpy as np

<font size="5">**Define score function**</font>

In [4]:
def Score(classifier, X_data, y_data):
    prediction = classifier.predict(X_data)
    score = accuracy_score(y_data, prediction)
    return score

<font size="5">**Split the dataset into training and test sets**</font>

In [7]:
# Step 1: Load MNIST dataset
mnist = fetch_openml('mnist_784', version = 1)
X = mnist['data']
y = mnist['target']

# Step 2: Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y , train_size = 60000, test_size = 10000, random_state = 42)

<font size="5">**Fitting classifiers and predicting on test set**</font>

In [10]:
rf_clf = RandomForestClassifier(n_estimators = 100, random_state = 42)
svm_clf = SVC(random_state = 42)
et_clf = ExtraTreesClassifier(n_estimators = 100, random_state = 42)
voting_clf = VotingClassifier(estimators = [('rf',rf_clf), ('svm',svm_clf), ('et',et_clf)],voting='hard')
    
classifiers = [rf_clf, svm_clf, et_clf, voting_clf]
scores = []
for clf in classifiers:
    print("Training the", clf)
    clf.fit(X_train, y_train)
    scores.append(Score(clf, X_test, y_test))

# print the results
print(f'Test score for RF is {scores[0]}, for SVM is {scores[1]}, and for ET is {scores[2]}')
print(f'Test score for the ensemble of all three is {scores[3]}')

Training the RandomForestClassifier(random_state=42)
Training the SVC(random_state=42)
Training the ExtraTreesClassifier(random_state=42)
Training the VotingClassifier(estimators=[('rf', RandomForestClassifier(random_state=42)),
                             ('svm', SVC(random_state=42)),
                             ('et', ExtraTreesClassifier(random_state=42))])
Test score for RF is 0.9674, for SVM is 0.9773, and for ET is 0.9682
Test score for the ensemble of all three is 0.9723


**The test results indicate that the SVM classifier outperformed the other models on the test set. The combined ensemble achieved better performance than the worst model, which is significant.
Although the ensemble's performance is slightly lower than SVM, it offers the advantage of integrating multiple models to produce reliable results.**