**In this project, we will do Voting classifier on the MNIST datset**

In [1]:
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X,y = mnist.data, mnist.target


In [2]:
print("Shape of X:", X.shape)  # (70000, 784)
print("Shape of y:", y.shape)  # (70000,)

Shape of X: (70000, 784)
Shape of y: (70000,)


In [5]:
y = y.astype(int)

In [6]:
X_train, y_train = X[:50_000], y[:50_000]
X_valid, y_valid = X[50_000:60_000], y[50_000:60_000]
X_test, y_test = X[60_000:], y[60_000:]

In [9]:
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.svm import LinearSVC
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier # Add this line to import the class


In [10]:
rnd_forest_clf = RandomForestClassifier(n_estimators=100, random_state=42)
extra_trees_clf = ExtraTreesClassifier(n_estimators=100, random_state=42)
svm_clf = LinearSVC(max_iter=100,tol=20,dual=True,random_state=42)
mlp_clf = MLPClassifier(random_state=42)

In [11]:
estimators = [rnd_forest_clf, extra_trees_clf, svm_clf, mlp_clf]
for estimator in estimators:
    print("Training the", estimator)
    estimator.fit(X_train, y_train)

Training the RandomForestClassifier(random_state=42)
Training the ExtraTreesClassifier(random_state=42)
Training the LinearSVC(dual=True, max_iter=100, random_state=42, tol=20)
Training the MLPClassifier(random_state=42)


Let's see each estimators score

In [12]:
[estimator.score(X_valid, y_valid) for estimator in estimators]

[0.9736, 0.9743, 0.8662, 0.9613]

the linear SVM is well outperformed by other classifiers

Now let's combine them in a voting classifier and see if the it will outperform all of the other estimators

In [13]:
from sklearn.ensemble import VotingClassifier

In [14]:
voting_clf = VotingClassifier(
    estimators=[
        ('rnd_forest_clf', rnd_forest_clf),
        ('extra_trees_clf', extra_trees_clf),
        ('svm_clf', svm_clf),
        ('mlp_clf', mlp_clf)
    ]
)
voting_clf.fit(X_train,y_train)

In [15]:
voting_clf.score(X_valid,y_valid)

0.975

Now lets remove the SVM classifier, because it is hurting our performance

In [16]:
voting_clf.named_estimators_

{'rnd_forest_clf': RandomForestClassifier(random_state=42),
 'extra_trees_clf': ExtraTreesClassifier(random_state=42),
 'svm_clf': LinearSVC(dual=True, max_iter=100, random_state=42, tol=20),
 'mlp_clf': MLPClassifier(random_state=42)}

In [17]:
svm_clf_trained = voting_clf.named_estimators_.pop('svm_clf')
voting_clf.estimators_.remove(svm_clf_trained)

In [18]:
voting_clf.named_estimators_

{'rnd_forest_clf': RandomForestClassifier(random_state=42),
 'extra_trees_clf': ExtraTreesClassifier(random_state=42),
 'mlp_clf': MLPClassifier(random_state=42)}

In [19]:
voting_clf.score(X_valid, y_valid)

0.9761

Yes, with Linear SVM, the score was 0.975, while without it, 0.9761

lets try soft instead of hard

In [20]:
voting_clf.voting = 'soft'
voting_clf.score(X_valid, y_valid)

0.9703

***No , hard voting is better***

In [21]:
voting_clf.voting = 'hard'
voting_clf.score(X_test, y_test)

0.9733

In [24]:
[estimator.score(X_test, y_test)
 for estimator in voting_clf.estimators_]

[0.968, 0.9703, 0.9618]