# 5-Fold Cross Validation Tutorial with PyTorch Example
# 5-Fold Cross Validation Tutorial with PyTorch Example

## What is Dropout?

Dropout is a regularization technique used in neural networks to prevent overfitting. It was introduced in a research paper by Hinton et al. in 2012. The idea is to randomly drop out (i.e., set to zero) a fraction of the neurons during training, which helps to prevent the model from relying too heavily on any single neuron.

## How Does Dropout Work?

During training, each neuron has a probability `p` of being dropped out. This means that the neuron's output is set to zero with probability `p`, and the neuron is not updated during backpropagation. The remaining neurons have their outputs scaled up by a factor of `1/(1-p)` to maintain the same expected value.

## Why Does Dropout Work?

Dropout has several benefits:

1. **Prevents overfitting**: By randomly dropping out neurons, the model is forced to learn multiple representations of the data, which helps to prevent overfitting.
2. **Improves generalization**: Dropout helps the model to generalize better to new, unseen data.
3. **Reduces co-adaptation**: Dropout breaks the co-adaptation between neurons, which helps to reduce overfitting.

## What is 5-Fold Cross Validation?

5-Fold Cross Validation is a technique used to evaluate the performance of a machine learning model. It involves splitting the dataset into 5 folds, training the model on 4 folds, and evaluating its performance on the remaining fold. This process is repeated 5 times, with each fold serving as the validation set once.

## Why Do We Need 5-Fold Cross Validation?

5-Fold Cross Validation helps to prevent overfitting and underfitting by providing a more accurate estimate of the model's performance. It also helps to reduce the variance of the model's performance by averaging the results over multiple folds.

## Comparison with Dropout

5-Fold Cross Validation and Dropout are both techniques used to prevent overfitting in machine learning models. However, they serve different purposes:

* Dropout is a regularization technique that randomly drops out neurons during training to prevent overfitting.
* 5-Fold Cross Validation is a technique used to evaluate the performance of a model by splitting the dataset into multiple folds and training the model on each fold.


In [78]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.model_selection import train_test_split, KFold
from sklearn.datasets import load_iris
import numpy as np

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Convert to tensors
X_train = torch.from_numpy(X_train).float()
X_test = torch.from_numpy(X_test).float()
y_train = torch.from_numpy(y_train).long()
y_test = torch.from_numpy(y_test).long()

In [79]:
# print some y values
print(y_train[:5])

tensor([1, 2, 2, 1, 2])


## Define the Neural Network Models

In [80]:
class Net(nn.Module):
    def __init__(self, dropout=False):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(4, 28)
        self.dropout = nn.Dropout(0.5) if dropout else nn.Identity()
        self.fc2 = nn.Linear(28, 3)

    def forward(self, x):
        # x = F.relu(self.fc1(x))
        # use sigmoid
        x = torch.sigmoid(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

# Initialize two models: one with dropout and one without
model_with_dropout = Net(dropout=True)
model_without_dropout = Net(dropout=False)

## Training and Evaluation Functions

In [81]:
def train(model, optimizer, criterion, X_train, y_train):
    model.train()
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()
    return loss.item()

def evaluate(model, X_val, y_val):
    model.eval()
    with torch.no_grad():
        outputs = model(X_val)
        _, predicted = torch.max(outputs, 1)

        correct = (predicted == y_val).sum().item()
        accuracy = correct / X_val.size(0)
    return accuracy

## Perform 5-Fold Cross Validation

In [82]:
# Set up KFold with 5 splits
kfold = KFold(n_splits=5, shuffle=True, random_state=42)

# Train and evaluate the models
def perform_cross_validation(model, X_train, y_train):
    results = []
    for fold, (train_idx, val_idx) in enumerate(kfold.split(X_train)):
        print(f'Starting fold {fold+1}')
        # Create data subsets for the fold
        X_train_fold = X_train[train_idx]
        y_train_fold = y_train[train_idx]

        # Initialize optimizer and criterion
        optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
        criterion = nn.CrossEntropyLoss()

        # Training and evaluation for one fold
        for epoch in range(50):
            loss = train(model, optimizer, criterion, X_train_fold, y_train_fold)
            if (epoch+1) % 10 == 0:
                accuracy = evaluate(model, X_test, y_test)
                print(f'Epoch {epoch+1}, Loss: {loss}, Accuracy: {accuracy}')

        results.append({'fold': fold+1, 'final_loss': loss, 'accuracy': accuracy})
    return results

results_with_dropout = perform_cross_validation(model_with_dropout, X_train, y_train)
results_without_dropout = perform_cross_validation(model_without_dropout, X_train, y_train)

Starting fold 1
Epoch 10, Loss: 1.133414387702942, Accuracy: 0.24444444444444444
Epoch 20, Loss: 1.1098768711090088, Accuracy: 0.24444444444444444
Epoch 30, Loss: 1.0391340255737305, Accuracy: 0.28888888888888886
Epoch 40, Loss: 1.0465450286865234, Accuracy: 0.28888888888888886
Epoch 50, Loss: 1.0011996030807495, Accuracy: 0.7111111111111111
Starting fold 2
Epoch 10, Loss: 0.9753791093826294, Accuracy: 0.7111111111111111
Epoch 20, Loss: 1.005685567855835, Accuracy: 0.7111111111111111
Epoch 30, Loss: 1.030776023864746, Accuracy: 0.7111111111111111
Epoch 40, Loss: 0.9689159393310547, Accuracy: 0.7111111111111111
Epoch 50, Loss: 0.9489995241165161, Accuracy: 0.7111111111111111
Starting fold 3
Epoch 10, Loss: 0.9274333715438843, Accuracy: 0.8222222222222222
Epoch 20, Loss: 0.9316403269767761, Accuracy: 0.7111111111111111
Epoch 30, Loss: 0.8732455968856812, Accuracy: 0.7111111111111111
Epoch 40, Loss: 0.8535342216491699, Accuracy: 0.7111111111111111
Epoch 50, Loss: 0.8499693274497986, Accur

In [83]:
len(X_test)

45

## Review Results

In [84]:
print('Results with Dropout:')
for result in results_with_dropout:
    print(f"Fold {result['fold']}: Loss: {result['final_loss']}, Accuracy: {result['accuracy']}")

print('\nResults without Dropout:')
for result in results_without_dropout:
    print(f"Fold {result['fold']}: Loss: {result['final_loss']}, Accuracy: {result['accuracy']}")


Results with Dropout:
Fold 1: Loss: 1.0011996030807495, Accuracy: 0.7111111111111111
Fold 2: Loss: 0.9489995241165161, Accuracy: 0.7111111111111111
Fold 3: Loss: 0.8499693274497986, Accuracy: 0.7111111111111111
Fold 4: Loss: 0.724637508392334, Accuracy: 0.7111111111111111
Fold 5: Loss: 0.7054985761642456, Accuracy: 0.9111111111111111

Results without Dropout:
Fold 1: Loss: 1.0126844644546509, Accuracy: 0.7111111111111111
Fold 2: Loss: 0.9069873094558716, Accuracy: 0.7111111111111111
Fold 3: Loss: 0.7699190974235535, Accuracy: 0.7111111111111111
Fold 4: Loss: 0.6590569615364075, Accuracy: 0.7555555555555555
Fold 5: Loss: 0.586053729057312, Accuracy: 0.8444444444444444


In [85]:
# calculate average accuracy
avg_accuracy_with_dropout = np.mean([result['accuracy'] for result in results_with_dropout])
avg_accuracy_without_dropout = np.mean([result['accuracy'] for result in results_without_dropout])
print(f'\nAverage accuracy with dropout: {avg_accuracy_with_dropout}')
print(f'Average accuracy without dropout: {avg_accuracy_without_dropout}')


Average accuracy with dropout: 0.7511111111111111
Average accuracy without dropout: 0.7466666666666667



**Understanding Dropout and Overfitting**

* Dropout is a technique to fight overfitting and improve neural network generalization
* Focus on training performance first, and deal with overfitting once it's clear
* Overfitting can manifest in different ways, such as:
	+ Training accuracy increases, but validation accuracy plateaus or decreases
	+ Model performance on training data is good, but poor on new, unseen data

**Choosing the Right Dropout Rate**

* The default dropout rate of 0.5 may be too severe, especially for convolutional layers
* Research suggests that lower dropout rates (0.1, 0.2) may be more effective for convolutional layers
* Gradually increasing the dropout rate from the first convolutional layer down the network can be effective
* Example architecture:
	+ CONV-1: filter=3x3, size=32, dropout between 0.0-0.1
	+ CONV-2: filter=3x3, size=64, dropout between 0.1-0.25
	+ ...
* Cross-validate and optimize hyper-parameters for your specific problem using techniques like random search or Bayesian optimization
# its markdown cell so show the image
[ ![image](https://i.stack.imgur.com/xvqJI.jpg) ](https://i.stack.imgur.com/xvqJI.jpg)



Hinton dropout paper:
https://arxiv.org/pdf/1207.0580.pdf

Analysis on the Dropout Effect in Convolutional Neural Networks:
http://mipal.snu.ac.kr/images/1/16/Dropout_ACCV2016.pdf

