Aim: Implement k-fold cross validation for iris flower dataset from sklearn library. Use
the following learning models to check the performances and also figure out which is
performing best.
1. Logistic Regression
2. Support Vector Machine (SVM)
3. Random Forest
4. Decision Tree

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import KFold, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier #, DecisionTreeRegressor
import numpy as np

In [2]:
# Load digits dataset
data = load_iris()
X, y = data.data, data.target

In [3]:
# Filter the dataset to only include digits 1 to 9
mask = (y >= 1) & (y <= 9)
X, y = X[mask], y[mask]

In [5]:
# Define the models to evaluate
models = {
    "Logistic Regression": LogisticRegression(max_iter=10000, solver='lbfgs',multi_class='multinomial'),
    "Support Vector Machine": SVC(kernel='linear', C=1),
    "Random Forest": RandomForestClassifier(n_estimators=100, random_state=42),
    "Decision Tree Classifier": DecisionTreeClassifier(criterion="gini", max_depth=3,random_state=42),
    # "Decision Tree Regressor": DecisionTreeRegressor(criterion="squared_error",max_depth=3 ,random_state=42)
}

In [6]:
# Set up k-fold cross-validation
k = 5 # Number of folds
kf = KFold(n_splits=k, shuffle=True, random_state=42)
# Evaluate each model
results = {}
for model_name, model in models.items():
    scores = cross_val_score(model, X, y, cv=kf, scoring='accuracy')
    results[model_name] = {
    "Mean Accuracy": np.mean(scores),
    "Standard Deviation": np.std(scores)
    }



In [8]:
# Print the results
for model_name, metrics in results.items():
    print(f"{model_name}:")
    print(f"Mean Accuracy: {metrics['Mean Accuracy']:.4f}")
    print(f"Standard Deviation: {metrics['Standard Deviation']:.4f}\n")
# Determine the best-performing model
best_model = max(results, key=lambda x: results[x]['Mean Accuracy'])
print(f"Best-performing model: {best_model} with mean accuracy {results[best_model]['Mean Accuracy']:.4f}")

Logistic Regression:
Mean Accuracy: 0.9600
Standard Deviation: 0.0200

Support Vector Machine:
Mean Accuracy: 0.9800
Standard Deviation: 0.0400

Random Forest:
Mean Accuracy: 0.9300
Standard Deviation: 0.0245

Decision Tree Classifier:
Mean Accuracy: 0.9400
Standard Deviation: 0.0200

Best-performing model: Support Vector Machine with mean accuracy 0.9800
