In [None]:
#Aim: To Perform Cross-Validation Types

In [None]:
#Theory
'''
Eg:
Think of cross-validation as checking your answers on a practice test before taking the real exam. Instead of relying on a
single set of test questions, you split your practice test into multiple smaller
 tests and average your scores to get a more reliable estimate of how well you'll do on the real test.

 Question: What is cross-validation, and why is it important in machine learning?
Answer: Cross-validation is a technique used to assess the performance of a machine learning model by
 dividing the dataset into multiple subsets, training the model on some subsets, and evaluating it on the remaining subsets.
 It's important because it provides a more accurate estimate of a model's performance on unseen data and helps identify overfitting.

Question: What are the different types of cross-validation?
Answer: The most common types of cross-validation are:

K-Fold Cross-Validation: The dataset is divided into k subsets, or folds. The model is trained on k-1 folds and evaluated on the remaining fold.
This process is repeated k times, with each fold used once as the validation data.
Leave-One-Out Cross-Validation (LOOCV): Each data point is used as the validation data, and the model is trained on all other data points.
This process is repeated for each data point in the dataset.
Stratified K-Fold Cross-Validation: Similar to k-fold cross-validation, but it ensures that each fold has approximately the same
 proportion of target classes as the entire dataset, which is useful for imbalanced datasets.
Time Series Cross-Validation: Specifically designed for time series data, where data is split into consecutive blocks, and
 each block is used as a validation set while the preceding blocks are used for training.
Question: How do you choose the appropriate type of cross-validation for a given dataset?
Answer: The choice of cross-validation depends on various factors such as dataset size, data distribution, and the presence of temporal dependencies.
 For small to moderate-sized datasets, k-fold cross-validation is commonly used. For time series data, time series
 cross-validation is more appropriate. Stratified k-fold is preferred for imbalanced datasets.

Question: What are the advantages and disadvantages of cross-validation?
Answer:

Advantages: It provides a more accurate estimate of a model's performance, reduces the risk of overfitting,
 and helps in selecting the best model hyperparameters.
Disadvantages: It can be computationally expensive, especially for large datasets and complex models. It may also
introduce variability in model performance due to random splitting of data.
Question: How do you implement cross-validation in practice?
Answer: Cross-validation can be implemented using libraries such as scikit-learn in Python. You typically use
functions like cross_val_score or cross_validate to perform cross-validation and evaluate the model's performance
using appropriate metrics. Additionally, you can customize cross-validation strategies based on your specific requirements.



'''

In [None]:

#1)	Write a Python code to perform cross-validation types
#Code:
import numpy as np
from sklearn.model_selection import KFold, StratifiedKFold, LeaveOneOut

# Sample data
x = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([1, 2, 1, 2])

# Function to print cross-validation scores
def print_scores(scores):
    print("Cross-validation scores:", scores)
    print("Mean Accuracy:", np.mean(scores))

# Perform k-fold cross-validation
def k_fold_cross_validation(x, y, k):
    kf = KFold(n_splits=k)
    scores = []
    for train_index, test_index in kf.split(x):
        x_train, x_test = x[train_index], x[test_index]
        y_train, y_test = y[train_index], y[test_index]
        # Train your model and evaluate
        # For demonstration, just printing the train and test indices
        print("Train indices:", train_index)
        print("Test indices:", test_index)
        scores.append(0.8)  # Dummy accuracy
    print_scores(scores)

# Perform stratified k-fold cross-validation
def stratified_k_fold_cross_validation(x, y, k):
    skf = StratifiedKFold(n_splits=k)
    scores = []
    for train_index, test_index in skf.split(x, y):
        x_train, x_test = x[train_index], x[test_index]
        y_train, y_test = y[train_index], y[test_index]
        # Train your model and evaluate
        # For demonstration, just printing the train and test indices
        print("Train indices:", train_index)
        print("Test indices:", test_index)
        scores.append(0.8)  # Dummy accuracy
    print_scores(scores)

# Perform leave-one-out cross-validation
def leave_one_out_cross_validation(x, y):
    loo = LeaveOneOut()
    scores = []
    for train_index, test_index in loo.split(x):
        x_train, x_test = x[train_index], x[test_index]
        y_train, y_test = y[train_index], y[test_index]
        # Train your model and evaluate
        # For demonstration, just printing the train and test indices
        print("Train indices:", train_index)
        print("Test indices:", test_index)
        scores.append(0.8)  # Dummy accuracy
    print_scores(scores)

# Example usage
print("K-fold cross-validation:")
k_fold_cross_validation(x, y, k=2)
print("\nStratified K-Fold Cross-Validation:")
stratified_k_fold_cross_validation(x, y, k=2)
print("\nLeave-One-Out Cross-Validation:")
leave_one_out_cross_validation(x, y)


K-fold cross-validation:
Train indices: [2 3]
Test indices: [0 1]
Train indices: [0 1]
Test indices: [2 3]
Cross-validation scores: [0.8, 0.8]
Mean Accuracy: 0.8

Stratified K-Fold Cross-Validation:
Train indices: [2 3]
Test indices: [0 1]
Train indices: [0 1]
Test indices: [2 3]
Cross-validation scores: [0.8, 0.8]
Mean Accuracy: 0.8

Leave-One-Out Cross-Validation:
Train indices: [1 2 3]
Test indices: [0]
Train indices: [0 2 3]
Test indices: [1]
Train indices: [0 1 3]
Test indices: [2]
Train indices: [0 1 2]
Test indices: [3]
Cross-validation scores: [0.8, 0.8, 0.8, 0.8]
Mean Accuracy: 0.8
