In [11]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [12]:
# Step 1: Splitting the Dataset
digits = load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

First, I imported the scikit-learn package using the command `from sklearn.model_selection import train_test_split`.

Then, I defined the testing and training variables. I allocated 20% of my data set for testing, which is conventionally advisable. This left the remaining 80% to be used as the training data set.

In [13]:
# Step 2: kNN Classifier
knn = KNeighborsClassifier(n_neighbors=10)
knn.fit(X_train, y_train)
y_pred_knn = knn.predict(X_test)

The code first creates a k-nearest neighbors classifier object called knn. The n_neighbors parameter is set to 10(k = 10), which means that the model will consider the 10 nearest neighbors when making predictions. Then, the classifier is fitted to the training data. This means that the model learns the relationships between the features and the target variable from the training data. Finally, the classifier makes predictions on the test data. The predictions are stored in the variable y_pred_knn.

In [14]:
# Evaluation metrics for kNN
accuracy_knn = accuracy_score(y_test, y_pred_knn)
precision_knn = precision_score(y_test, y_pred_knn, average='macro')
recall_knn = recall_score(y_test, y_pred_knn, average='macro')
f1_knn = f1_score(y_test, y_pred_knn, average='macro')

print("kNN Classifier Metrics:")
print("Accuracy:", accuracy_knn)
print("Precision:", precision_knn)
print("Recall:", recall_knn)
print("F1 Score:", f1_knn)

kNN Classifier Metrics:
Accuracy: 0.9833333333333333
Precision: 0.9836898803735068
Recall: 0.9840780141843972
F1 Score: 0.9835964539483891


The accuracy metric was used to evaluate the model's performance in the classification task. The model achieved an accuracy of 98.3%, which indicates that it made correct predictions for a significant portion of the test data. This suggests that the model successfully learned meaningful patterns from the training data and performed well on unseen instances. The high accuracy of the model highlights its effectiveness in the classification task.

In [15]:
# Step 3: SVM Classifier
svm = SVC(kernel='rbf', C=1.0, gamma='scale')
svm.fit(X_train, y_train)
y_pred_svm = svm.predict(X_test)

# Evaluation metrics for SVM
accuracy_svm = accuracy_score(y_test, y_pred_svm)
precision_svm = precision_score(y_test, y_pred_svm, average='macro')
recall_svm = recall_score(y_test, y_pred_svm, average='macro')
f1_svm = f1_score(y_test, y_pred_svm, average='macro')

print("SVM Classifier Metrics:")
print("Accuracy:", accuracy_svm)
print("Precision:", precision_svm)
print("Recall:", recall_svm)
print("F1 Score:", f1_svm)

SVM Classifier Metrics:
Accuracy: 0.9861111111111112
Precision: 0.9871533861771657
Recall: 0.9865978306216103
F1 Score: 0.9868277979964809


The SVM classifier is initialized with the parameters kernel='rbf', C=1.0, and gamma='scale'. These parameters define the kernel function, regularization parameter, and kernel coefficient, respectively. The classifier is trained using the training data (X_train and y_train).
The trained SVM classifier is used to predict the labels for the test data (X_test). The predicted labels are stored in the variable y_pred_svm. To evaluate the performance of the SVM classifier, several metrics are computed, including accuracy, precision, recall, and F1 score. These metrics are calculated by comparing the predicted labels (y_pred_svm) with the ground truth labels (y_test). The calculated metrics are then output to the console.

In [16]:
print("Performance Comparison:")
if accuracy_svm > accuracy_knn:
    print("SVM classifier outperforms kNN classifier.")
else:
    print("kNN classifier outperforms SVM classifier.")

Performance Comparison:
SVM classifier outperforms kNN classifier.


The kNN classifier and the SVM classifier have been evaluated and compared using various performance metrics. Based on the evaluation results, the SVM classifier outperforms the kNN classifier in terms of accuracy, precision, recall, and F1 score.

The kNN classifier achieved an accuracy of 0.9833, while the SVM classifier achieved a slightly higher accuracy of 0.9861. Similarly, the precision, recall, and F1 score of the SVM classifier (0.9872, 0.9866, and 0.9868, respectively) are slightly higher compared to those of the kNN classifier (0.9837, 0.9841, and 0.9836, respectively).

These results suggest that the SVM classifier has a slightly better overall performance in classifying the given dataset compared to the kNN classifier.
In terms of recommendations, the choice between the kNN classifier and the SVM classifier depends on the specific requirements and characteristics of the problem at hand:

kNN Classifier: The kNN classifier is a simple and intuitive algorithm that can perform well in scenarios where the data has clear decision boundaries and instances of the same class are clustered together. It is also suitable when there is a need for interpretability or when the dataset is relatively small. However, kNN can suffer from high computational complexity and may not scale well to large datasets.

SVM Classifier: The SVM classifier is a powerful algorithm that can handle both linearly separable and non-linearly separable datasets. It can capture complex relationships between features and create optimal decision boundaries. SVMs are particularly effective in high-dimensional spaces and can handle large datasets. However, SVMs can be computationally expensive and may require tuning of hyperparameters.

In future situations, if the dataset is small, interpretable results are desired, or the data exhibits clear clusters, the kNN classifier may be a suitable choice. On the other hand, if the dataset is large, contains complex relationships, or requires non-linear decision boundaries, the SVM classifier may be more appropriate. It is important to consider the specific characteristics and requirements of the problem when selecting the appropriate model.

In [17]:
# 4 = Video

In [18]:
from sklearn.model_selection import GridSearchCV

# Define the parameter grid for grid search
param_grid = {'n_neighbors': [5, 10, 15, 20]}  # Example values, you can adjust them

# Create a kNN classifier
knn = KNeighborsClassifier()

# Perform grid search
grid_search = GridSearchCV(knn, param_grid, cv=5)
grid_search.fit(X_train, y_train)

# best parameter
best_k = grid_search.best_params_['n_neighbors']

# Fit the classifier with the best parameter to the training data
knn_best = KNeighborsClassifier(n_neighbors=best_k)
knn_best.fit(X_train, y_train)

# Predict the labels for the test data
y_pred_best = knn_best.predict(X_test)

# accuracy
accuracy_best = accuracy_score(y_test, y_pred_best)

print(accuracy_best)

0.9861111111111112


We defined a parameter grid for grid search, created a kNN classifier, and performed grid search using GridSearchCV. We obtained the best parameter and created a new kNN classifier with that parameter. Then, we fitted it to the training data, made predictions on the test data, and calculated the accuracy of the classifier with the best parameter.

In [19]:
from sklearn.model_selection import cross_val_score
cross_val_scores = cross_val_score(knn, X_train, y_train, cv=5)

# Calculate the mean accuracy from cross-validation scores
mean_accuracy = cross_val_scores.mean()

We used cross_val_score to perform cross-validation on the kNN classifier. We specified the number of folds (cv=5) and obtained the cross-validation scores. Finally, we calculated the mean accuracy from the cross-validation scores.