In [1]:
import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
import pandas as pd

# Load dataset
digits = load_digits()
X = digits.data
y = digits.target

# Define model
model = LogisticRegression(max_iter=1000)

# Apply 5-fold cross-validation
scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')

# Print results
print("Cross-Validation Accuracy Scores:", scores)
print("Mean Accuracy:", np.mean(scores))

# Optional: Display results as DataFrame
df = pd.DataFrame({'Fold': list(range(1, 6)), 'Accuracy': scores})
print(df)


Cross-Validation Accuracy Scores: [0.92222222 0.87222222 0.94150418 0.94150418 0.89693593]
Mean Accuracy: 0.9148777468276075
   Fold  Accuracy
0     1  0.922222
1     2  0.872222
2     3  0.941504
3     4  0.941504
4     5  0.896936


### Topic 20 – Cross-Validation

In this notebook, we apply **K-Fold Cross-Validation** (with `k=5`) to evaluate the performance of a model.

Steps:
1. Loaded the `digits` dataset from `sklearn.datasets`.
2. Defined a `LogisticRegression` model.
3. Used `cross_val_score()` with 5 folds and accuracy as the scoring metric.

Cross-validation helps by:
- Providing a more reliable estimate of model performance,
- Reducing the risk of overfitting to a single train/test split,
- Making better use of available data.

We reported:
- The accuracy for each fold,
- The mean accuracy across all folds.
