Implement K-Fold Cross-Validation
Implement a function to generate train and test splits for K-Fold Cross-Validation. Your task is to divide the dataset into k folds and return a list of train-test indices for each fold.

Example:
Input:
k_fold_cross_validation(np.array([0,1,2,3,4,5,6,7,8,9]), np.array([0,1,2,3,4,5,6,7,8,9]), k=5, shuffle=False)
Output:
[([2, 3, 4, 5, 6, 7, 8, 9], [0, 1]), ([0, 1, 4, 5, 6, 7, 8, 9], [2, 3]), ([0, 1, 2, 3, 6, 7, 8, 9], [4, 5]), ([0, 1, 2, 3, 4, 5, 8, 9], [6, 7]), ([0, 1, 2, 3, 4, 5, 6, 7], [8, 9])]
Reasoning:
The function splits the dataset into 5 folds without shuffling and returns train-test splits for each iteration.

	•	n_samples // k: The base size of each fold.
	•	n_samples % k: If the dataset size is not perfectly divisible by k, some folds need an extra data point.
	•	Example (n_samples = 12, k = 5):
	•	Base fold size: 12 // 5 = 2
	•	Extra samples: 12 % 5 = 2
	•	Fold sizes: [3, 3, 2, 2, 2]

The function np.full() creates a NumPy array filled with a specified value.
np.full(shape, fill_value, dtype=None)

arr = np.full((3, 4), 99)  # 3 rows, 4 columns, all values 99
print(arr)

In [None]:
import numpy as np

def k_fold_cross_validation(X: np.ndarray, y: np.ndarray, k=5, shuffle=True, random_seed=None):
    """
    Return train and test indices for k-fold cross-validation.
    """
    n_samples = len(X)
    indices = np.arange(n_samples)

    if shuffle:
        if random_seed is not None:
            np.random.seed(random_seed)
        np.random.shuffle(indices)

    fold_sizes = np.full(k, n_samples // k, dtype=int)
    fold_sizes[:n_samples % k] += 1

    # Split Indices into Folds
    current = 0
    folds = []
    for fold_size in fold_sizes:
        folds.append(indices[current:current + fold_size])
        current += fold_size

    #Generate Train-Test Splits
    # 	•	For each fold i:
	# •	test_idx: The i-th fold is used as the test set.
	# •	train_idx: All remaining folds are concatenated to form the training set.
	# •	The results are stored as tuples (train indices, test indices).
    result = []
    for i in range(k):
        test_idx = folds[i]
        train_idx = np.concatenate(folds[:i] + folds[i+1:])
        result.append((train_idx.tolist(), test_idx.tolist()))

    return result

In [2]:
k_fold_cross_validation(np.array([0,1,2,3,4,5,6,7,8,9]), np.array([0,1,2,3,4,5,6,7,8,9]), k=5, shuffle=False)

[([2, 3, 4, 5, 6, 7, 8, 9], [0, 1]),
 ([0, 1, 4, 5, 6, 7, 8, 9], [2, 3]),
 ([0, 1, 2, 3, 6, 7, 8, 9], [4, 5]),
 ([0, 1, 2, 3, 4, 5, 8, 9], [6, 7]),
 ([0, 1, 2, 3, 4, 5, 6, 7], [8, 9])]

In [3]:
import random

def k_fold_cross_validation(X: list, y: list, k=5, shuffle=True, random_seed=None):
    """
    Perform k-fold cross-validation without NumPy.
    Returns a list of (train_indices, test_indices) tuples.
    """
    n_samples = len(X)
    indices = list(range(n_samples))  # Create a list of indices

    if shuffle:
        if random_seed is not None:
            random.seed(random_seed)  # Set seed for reproducibility
        random.shuffle(indices)  # Shuffle the indices

    # Determine the fold sizes
    fold_sizes = [n_samples // k] * k
    for i in range(n_samples % k):  # Distribute remaining samples
        fold_sizes[i] += 1

    # Split indices into folds
    folds = []
    current = 0
    for fold_size in fold_sizes:
        folds.append(indices[current:current + fold_size])
        current += fold_size

    # Generate train-test splits
    result = []
    for i in range(k):
        test_idx = folds[i]
        train_idx = sum((folds[j] for j in range(k) if j != i), [])  # Concatenate remaining folds
        result.append((train_idx, test_idx))

    return result

In [4]:
X = ["sample1", "sample2", "sample3", "sample4", "sample5", "sample6"]
y = [0, 1, 0, 1, 0, 1]

folds = k_fold_cross_validation(X, y, k=3, shuffle=True, random_seed=42)

for i, (train_idx, test_idx) in enumerate(folds):
    print(f"Fold {i+1}:")
    print(f"  Train indices: {train_idx}")
    print(f"  Test indices: {test_idx}")

Fold 1:
  Train indices: [2, 4, 0, 5]
  Test indices: [3, 1]
Fold 2:
  Train indices: [3, 1, 0, 5]
  Test indices: [2, 4]
Fold 3:
  Train indices: [3, 1, 2, 4]
  Test indices: [0, 5]
