# ***Performing K-fold Cross Validation on IRIS Dataset***

### *Importing the Models and libraries for the project*

In [14]:
from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
import numpy as np

### *Loading the IRIS Dataset and splitting into Features (X) and Target Labels (y)*

In [15]:
data = load_iris()
X, y = data.data, data.target

### Define K-Fold Cross Validation (K=5)

*   n_splits=5: The dataset is split into 5 parts (folds)
*   shuffle=True: Randomly shuffles the data before splitting
*   random_state=42: Ensures reproducibility

In [16]:
kf = KFold(n_splits=5, shuffle=True, random_state=42)

### *Logistic Regression Model*

In [17]:
model = LogisticRegression()

### *Perform K-Fold Cross Validation*

In [18]:
accuracies = []
for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]  # Splitting data into training and testing
    y_train, y_test = y[train_index], y[test_index]  # Splitting target labels

    model.fit(X_train, y_train)  # Train the model on training data
    predictions = model.predict(X_test)  # Make predictions on test data

    acc = accuracy_score(y_test, predictions)  # Calculate accuracy for the current fold
    accuracies.append(acc)  # Store the accuracy

### *Computing Accuracy & Standard Deviation*

In [19]:
average_accuracy = np.mean(accuracies)

In [20]:
std_deviation = np.std(accuracies)

### *Results*

In [21]:
print(f"Accuracy scores for each fold: {accuracies}\n")
print(f"Average Accuracy: {average_accuracy:.4f}\n")
print(f"Standard Deviation: {std_deviation:.4f}\n")

Accuracy scores for each fold: [1.0, 1.0, 0.9333333333333333, 0.9666666666666667, 0.9666666666666667]

Average Accuracy: 0.9733

Standard Deviation: 0.0249



<div style="text-align: justify;">
<i><b>As we can see that using k-fold cross validation on IRIS dataset has yeilded significant results with over 97% accuracy. Our standard deviation is around 2% which means we are at the low bias and low variance state which is optimal for our Model.
</b></i></div>