Implement k-fold cross validation by taking digit dataset from 1 to 9 from sklearn
library. Use the following learning models to check the performances and also figure out
which is performing best.
1. Logistic Regression
2. Support Vector Machine (SVM)
3. Random Forest

In [2]:
from sklearn.datasets import load_digits
from sklearn.model_selection import KFold, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
import numpy as np

In [30]:
# Load digits dataset
data = load_digits()
X, y = data.data, data.target

In [35]:
# Filter the dataset to only include digits 1 to 9
mask = (y >= 1) & (y <= 9)
X, y = X[mask], y[mask]

In [32]:
# Define the models to evaluate
models = {
    "Logistic Regression": LogisticRegression(max_iter=10000, solver='lbfgs',multi_class='multinomial'),
    "Support Vector Machine": SVC(kernel='linear', C=1),
    "Random Forest": RandomForestClassifier(n_estimators=100, random_state=42)
}

In [33]:
# Set up k-fold cross-validation
k = 5 # Number of folds
kf = KFold(n_splits=k, shuffle=True, random_state=42)
# Evaluate each model
results = {}
for model_name, model in models.items():
    scores = cross_val_score(model, X, y, cv=kf, scoring='accuracy')
    results[model_name] = {
    "Mean Accuracy": np.mean(scores),
    "Standard Deviation": np.std(scores)
    }



In [28]:
# Print the results
for model_name, metrics in results.items():
    print(f"{model_name}:")
    print(f"Mean Accuracy: {metrics['Mean Accuracy']:.4f}")
    print(f"Standard Deviation: {metrics['Standard Deviation']:.4f}\n")
# Determine the best-performing model
best_model = max(results, key=lambda x: results[x]['Mean Accuracy'])
print(f"Best-performing model: {best_model} with mean accuracy {results[best_model]['Mean Accuracy']:.4f}")

Logistic Regression:
Mean Accuracy: 0.9636
Standard Deviation: 0.0041

Support Vector Machine:
Mean Accuracy: 0.9778
Standard Deviation: 0.0050

Random Forest:
Mean Accuracy: 0.9790
Standard Deviation: 0.0094

Best-performing model: Random Forest with mean accuracy 0.9790
