# **Problem Statement**  
## **9. Implement k-fold cross-validation using only NumPy.**

Implement k-fold cross-validation from scratch using only NumPy.

Given a dataset (X, y):
- Split it into k equal folds
- For each iteration:
    - Use one fold as validation data
    - Use remaining folds as training data
- Return train/validation splits for each fold

No use of sklearn.model_selection.

### Constraints & Example Inputs/Outputs

### Constraints
- Use NumPy only
- Dataset size n ≥ k
- Folds should be approximately equal in size
- Random shuffling should be optional
- Should work for any model (model-agnostic)

### Example Input:
```python
X = [1, 2, 3, 4, 5]
y = [10, 20, 30, 40, 50]
k = 5
```

### Expected Output:

Each fold returns:
- X_train, y_train
- X_val, y_val

Example (Fold 1):
```python
Train: [2,3,4,5]
Val:   [1]
```

### Solution Approach

### k-Fold Cross-Validation Logic

1. Shuffle dataset (optional)
2. Split indices into k equal parts
3. For each fold:
    - Select one part as validation
    - Remaining parts as training
4. Yield or store train/validation splits


### Why k-Fold?
- Reduces overfitting
- Uses full dataset efficiently
- More reliable performance estimation

### Solution Code

In [1]:
# Approach 1: Brute Force k-Fold (Explicit Loops)
import numpy as np

def k_fold_bruteforce(X, y, k, shuffle=True, random_state=None):
    X = np.array(X)
    y = np.array(y)
    n = len(X)
    
    indices = np.arange(n)
    
    if shuffle:
        if random_state is not None:
            np.random.seed(random_state)
        np.random.shuffle(indices)
    
    fold_size = n // k
    folds = []
    
    for i in range(k):
        start = i * fold_size
        end = start + fold_size if i != k - 1 else n
        
        val_idx = indices[start:end]
        train_idx = np.concatenate((indices[:start], indices[end:]))
        
        folds.append((
            X[train_idx], y[train_idx],
            X[val_idx], y[val_idx]
        ))
    
    return folds


### Alternative Solution

In [4]:
# Approach 2: Optimized k-Fold (Using NumPy Split)
def k_fold_optimized(X, y, k, shuffle=True, random_state=None):
    X = np.array(X)
    y = np.array(y)
    
    indices = np.arange(len(X))
    
    if shuffle:
        if random_state is not None:
            np.random.seed(random_state)
        np.random.shuffle(indices)
    
    folds = np.array_split(indices, k)
    results = []
    
    for i in range(k):
        val_idx = folds[i]
        train_idx = np.hstack(folds[:i] + folds[i+1:])
        
        results.append((
            X[train_idx], y[train_idx],
            X[val_idx], y[val_idx]
        ))
    
    return results


### Alternative Approaches

- Leave-One-Out Cross-Validation (LOOCV)
- Stratified k-Fold (class-balanced)
- Time-Series Cross-Validation
- sklearn KFold (not allowed here)

### Test Case

In [2]:
# Test Case 1: Simple Sequential Data

X = [1, 2, 3, 4, 5]
y = [10, 20, 30, 40, 50]

folds = k_fold_bruteforce(X, y, k=5, shuffle=False)

for i, (X_tr, y_tr, X_val, y_val) in enumerate(folds):
    print(f"Fold {i+1}")
    print("Train X:", X_tr, "Train y:", y_tr)
    print("Val X:", X_val, "Val y:", y_val)
    print("-" * 30)


Fold 1
Train X: [2 3 4 5] Train y: [20 30 40 50]
Val X: [1] Val y: [10]
------------------------------
Fold 2
Train X: [1 3 4 5] Train y: [10 30 40 50]
Val X: [2] Val y: [20]
------------------------------
Fold 3
Train X: [1 2 4 5] Train y: [10 20 40 50]
Val X: [3] Val y: [30]
------------------------------
Fold 4
Train X: [1 2 3 5] Train y: [10 20 30 50]
Val X: [4] Val y: [40]
------------------------------
Fold 5
Train X: [1 2 3 4] Train y: [10 20 30 40]
Val X: [5] Val y: [50]
------------------------------


In [5]:
# Test Case 2: Optimized Version with Shuffle

folds = k_fold_optimized(X, y, k=3, shuffle=True, random_state=42)

for i, (X_tr, y_tr, X_val, y_val) in enumerate(folds):
    print(f"Fold {i+1}")
    print("Train:", X_tr, y_tr)
    print("Val:", X_val, y_val)
    print("-" * 30)



Fold 1
Train: [3 1 4] [30 10 40]
Val: [2 5] [20 50]
------------------------------
Fold 2
Train: [2 5 4] [20 50 40]
Val: [3 1] [30 10]
------------------------------
Fold 3
Train: [2 5 3 1] [20 50 30 10]
Val: [4] [40]
------------------------------


In [6]:
# Test Case 3: Larger Dataset

X = np.arange(20)
y = X * 2

folds = k_fold_optimized(X, y, k=4)

for i, (_, _, X_val, _) in enumerate(folds):
    print(f"Fold {i+1} validation size:", len(X_val))


Fold 1 validation size: 5
Fold 2 validation size: 5
Fold 3 validation size: 5
Fold 4 validation size: 5


## Complexity Analysis

### Time Complexity
- Splitting: O(n)
- Total: O(k × n)

### Space Complexity

O(n) for indices and splits

#### Thank You!!