# Best Model Description

** Accuracy Achieved on Validation Dataset = 85%**
* Model Name: CIFAR10CNNX
* Number of CNN Layer = 3
* Hyperparameters:
    - epochs = 10,
    - output_dim = 10, 
    - **batch_size = 64,**
It started with a batch size of 128 and tuned on the different batch sizes 128,1024,512 and 64.
    - **learning_rate = 0.005,**
Earlier taken 0.03 then after tunning reduced it to 0.005
    - early_stopping = True,

* Wandb Project Link of all Conducted Experiments: 

https://wandb.ai/neetika/CNN_Experiment_Neetika?workspace=user-neetika


# Importing packages

In [1]:
pip install torch-lr-finder

Collecting torch-lr-finder
  Downloading torch_lr_finder-0.2.1-py3-none-any.whl (11 kB)
Installing collected packages: torch-lr-finder
Successfully installed torch-lr-finder-0.2.1


In [2]:
# Install wandb and update it to the latest version
%%capture
!pip install wandb --upgrade

In [3]:
# Importing the necessary libraries
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torch.nn.functional as F
from torchsummary import summary

from torch.optim.lr_scheduler import ReduceLROnPlateau, ExponentialLR, CyclicLR, OneCycleLR, StepLR
from torch_lr_finder import LRFinder

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix
import random

from datetime import datetime
from pathlib import Path
import plotly.io as pio
pio.renderers.default = 'colab'

In [4]:
# Import random function
import random

# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [6]:
data_folder = Path('/content/drive/MyDrive/DL_Assignment5')
  

In [7]:
# Import wandb
import wandb

# Login to W&B
wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.



We will first convert the images in the dataset to pytorch tensors using torchvision.transforms and then normalize them.

Next, we will use torchvision.datasets for downloading the CIFAR - 10  datasets and apply transform that we defines earlier.

trainset conains the training data
testset contains the testing data

## Data Transformation

In [None]:
# Transform to convert images to pytorch tensors and normalize the data
train_trans= transforms.Compose([ 
                                #  transforms.RandomCrop(size = (32,32), padding = 2),
                                #  transforms.RandomAffine(degrees=10, translate =(0.05, 0.05), scale=(0.9, 1.1)),
                                #  transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(), 
                                 transforms.Normalize((0.4914,0.4822,0.4655), (0.2023,0.1994,0.2010))])
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.4914,0.4822,0.4655), (0.2023,0.1994,0.2010))])
train_full = torchvision.datasets.CIFAR10(root=data_folder,
                                              train=True, 
                                              transform=train_trans,
                                              download=True)
trainset, validset = torch.utils.data.random_split(train_full, [40000, 10000], generator=torch.Generator().manual_seed(42))
testset  = torchvision.datasets.CIFAR10(root=data_folder,
                                              train=False, 
                                              transform=trans,
                                              download=True)

Files already downloaded and verified
Files already downloaded and verified


In [None]:
len(trainset), len(validset )

(40000, 10000)

In [None]:
# check the min value of inputs
train_full.data.mean()/255

0.4733630004850899

In [None]:
testset.data.shape

(10000, 32, 32, 3)

## Creating Smaller Dataset

In [None]:
# n sample points
train_sample_size = int(len(trainset)/700)
valid_sample_size = int(len(validset)/500)

# Getting n random indices
train_subset_indices = random.sample(range(0, len(trainset)), train_sample_size)
valid_subset_indices = random.sample(range(0, len(testset)), valid_sample_size)

# Getting subset of dataset
train_subset = torch.utils.data.Subset(trainset, train_subset_indices)
valid_subset = torch.utils.data.Subset(validset, valid_subset_indices)

In [None]:
train_sample_size

57

## Predictor Labels

In [None]:
def get_CIAFR10_labels(labels):  
    """ 
    Function to generate labels.
    Input: numerical labels
    Output: actual string labels
    """

    # Create a list of labels
    text_labels = ['airplane', 'automobile', 'bird', 'cat',
                   'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

    # Return text_labels according to numerical values
    return [text_labels[int(i)] for i in labels]

# Model 1 - Tried Different Optimizer and SGD working well

**3 Convolution Layer**

**Learning Rate 0.05**

**Epoch 5**

**Batch Size 128**

**Train Accuracy:  72.0725% | Valid Accuracy:  68.2200%**


## CNN Model Class

In [None]:
class CIFAR10CNN1(nn.Module):
    
    def __init__(self):

      super().__init__()

      super(CIFAR10CNN1, self).__init__()
      
      self.conv1_layer = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding='same'), # 32*32
          nn.ReLU(),
          nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding='same'), #32 *32
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 16 * 16
          
      )

      self.conv2_layer = nn.Sequential(
          nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 8 * 8
      )
      self.conv3_layer = nn.Sequential(
          nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding='same'), #8*8
          nn.ReLU(),
          nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'), #8*8
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2),# 4 * 4
      )

      self.flatten = nn.Flatten()
      
      self.fc1 = nn.Linear(256*4*4, out_features=1024)
      self.fc2 = nn.Linear(1024, out_features=512)
      self.fc3 = nn.Linear(512, out_features=10)
      
      
    def forward(self, x):
        # conv layers
        out = self.conv1_layer(x)
        out = self.conv2_layer(out)

        # flatten befrore input to linear layer
        out = self.flatten(out)
        # linear hidden layers
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))

        # output layer - no softmax as it is applied by nn.CrossEntropyLoss

        out = self.fc3(out)
        
        return out

In [None]:
summary(CIFAR10CNN1().cuda(), (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 32, 32, 32]             896
              ReLU-2           [-1, 32, 32, 32]               0
            Conv2d-3           [-1, 32, 32, 32]           9,248
              ReLU-4           [-1, 32, 32, 32]               0
         MaxPool2d-5           [-1, 32, 16, 16]               0
            Conv2d-6           [-1, 64, 16, 16]          18,496
              ReLU-7           [-1, 64, 16, 16]               0
            Conv2d-8           [-1, 64, 16, 16]          36,928
              ReLU-9           [-1, 64, 16, 16]               0
        MaxPool2d-10             [-1, 64, 8, 8]               0
          Flatten-11                 [-1, 4096]               0
           Linear-12                 [-1, 1024]       4,195,328
           Linear-13                  [-1, 512]         524,800
           Linear-14                   

## Training Epoch

In [17]:
def train(train_loader, model, optimizer, loss_function, log_batch, log_interval, grad_clipping, max_norm):

  """ 
  Function for training the model in each epoch
  Input: iterator for train dataset, initial weights and bias, epochs, learning rate.
  Output: final weights, bias, train loss, train accuracy
  """
  # initilalize variables as global
  # these counts will be updated every epoch
  global example_ct_train
  global batch_ct_train

  # Training Loop loop
  # Initialize train_loss at the he start of the epoch
  running_train_loss = 0
  running_train_correct = 0
  
  # put the model in training mode
  model.train()

  # Iterate on batches from the dataset using train_loader
  for input, targets in train_loader:
    
    # move inputs and outputs to GPUs
    input = input.to(device)
    targets = targets.to(device)

    # Forward pass
    output = model(input)
    loss = loss_function(output, targets)

    # Correct prediction
    y_pred = torch.argmax(output, dim = 1)
    correct = torch.sum(y_pred == targets)

    example_ct_train +=  len(targets)
    batch_ct_train += 1

    # set gradients to zero 
    optimizer.zero_grad()

    # Backward pass
    loss.backward()

    # Gradient Clipping
    if grad_clipping:
      nn.utils.clip_grad_norm_(model.parameters(), max_norm=max_norm, norm_type=2)

    # Update parameters using their gradient
    optimizer.step()

    # scheduler.step()
          
    # Add train loss of a batch 
    running_train_loss += loss.item()

    # Add Corect counts of a batch
    running_train_correct += correct

    # log batch loss and accuracy
    if log_batch:
      if ((batch_ct_train + 1) % log_interval) == 0:
        wandb.log({f"Train Batch Loss  :": loss})
        wandb.log({f"Train Batch Acc :": correct/len(targets)})
        # print(f'Learning rate: {scheduler.get_last_lr()}')

    
    
  # Calculate mean train loss for the whole dataset for a particular epoch
  train_loss = running_train_loss/len(train_loader)



  # Calculate accuracy for the whole dataset for a particular epoch
  train_acc = running_train_correct/len(train_loader.dataset)

  return train_loss, train_acc

## Validation Epoch

In [18]:
def valid(loader, model, optimizer, loss_function, log_batch, log_interval):

  """ 
  Function for training the model and plotting the graph for train & valid loss vs epoch.
  Input: iterator for train dataset, initial weights and bias, epochs, learning rate, batch size.
  Output: final weights, bias and train loss and valid loss for each epoch.
  """

  # initilalize variables as global
  # these counts will be updated every epoch
  global example_ct_valid
  global batch_ct_valid

  # Validation loop
  # Initialize train_loss at the he strat of the epoch
  running_valid_loss = 0
  running_valid_correct = 0
  
  # put the model in evaluation mode
  model.eval()

  with torch.no_grad():
    for input,targets in loader:

      # move inputs and outputs to GPUs
      input = input.to(device)
      targets = targets.to(device)

      # Forward pass
      output = model(input)
      loss = loss_function(output,targets)

      # Correct Predictions
      y_pred = torch.argmax(output, dim = 1)
      correct = torch.sum(y_pred == targets)

      # count of images and batches
      example_ct_valid +=  len(targets)
      batch_ct_valid += 1

      # Add valid loss of a batch 
      running_valid_loss += loss.item()

      # Add correct count for each batch
      running_valid_correct += correct

      # log batch loss and accuracy
      if log_batch:
        if ((batch_ct_valid + 1) % log_interval) == 0:
          wandb.log({f"Valid Batch Loss  :": loss})
          wandb.log({f"Valid Batch Accuracy :": correct/len(targets)})


    # Calculate mean valid loss for the whole dataset for a particular epoch
    valid_loss = running_valid_loss/len(valid_loader)

    # scheduler step
    # scheduler.step(valid_loss)
    # scheduler.step()

    # Calculate accuracy for the whole dataset for a particular epoch
    valid_acc = running_valid_correct/len(valid_loader.dataset)
    
  return valid_loss, valid_acc

## Model Training Loop

In [19]:
def train_loop(train_loader, valid_loader, model, loss_function, optimizer, epochs, device, patience, early_stopping,
               file_model):

  '''
  model: specify your model for training
  criterion: loss function 
  optimizer: optimizer like SGD , ADAM etc.
  train loader: function to carete batches for training data
  loader : function to create batches for valid data set
  file_model : specify file name for saving your model. This way we can upload the model weights from file. We will not to run model again.
  

  '''
  # Create lists to store train and valid loss at each epoch

  train_loss_history = []
  valid_loss_history = []
  train_acc_history = []
  valid_acc_history = []
  delta = 0
  best_score = None
  valid_loss_min = np.Inf
  counter_early_stop=0
  early_stop=False


  # Iterate for the given number of epochs
  for epoch in range(epochs):
    t0 = datetime.now()
    # Get train loss and accuracy for one epoch

    train_loss, train_acc = train(train_loader, model, optimizer, loss_function, 
                                  wandb.config.log_batch, wandb.config.log_interval,
                                  wandb.config.grad_clipping, wandb.config.max_norm)
    valid_loss, valid_acc = valid(valid_loader, model, optimizer, loss_function,
                                    wandb.config.log_batch, wandb.config.log_interval)

    dt = datetime.now() - t0

    # Save history of the Losses and accuracy
    train_loss_history.append(train_loss)
    train_acc_history.append(train_acc)
    valid_loss_history.append(valid_loss)
    valid_acc_history.append(valid_acc)

    if early_stopping:
      score = -valid_loss
      if best_score is None:
        best_score=score
        print(f'Validation loss has decreased ({valid_loss_min:.6f} --> {valid_loss:.6f}). Saving Model...')
        torch.save(model.state_dict(), file_model)
        valid_loss_min = valid_loss

      elif score < best_score + delta:
        counter_early_stop += 1
        print(f'Early stoping counter: {counter_early_stop} out of {patience}')
        if counter_early_stop > patience:
          early_stop = True

      
      else:
        best_score = score
        print(f'Validation loss has decreased ({valid_loss_min:.6f} --> {valid_loss:.6f}). Saving model...')
        torch.save(model.state_dict(), file_model)
        counter_early_stop=0
        valid_loss_min = valid_loss

      if early_stop:
        print('Early Stopping')
        break

    else:

      score = -valid_loss
      if best_score is None:
        best_score=score
        print(f'Validation loss has decreased ({valid_loss_min:.6f} --> {valid_loss:.6f}). Saving Model...')
        torch.save(model.state_dict(), file_model)
        valid_loss_min = valid_loss

      elif score < best_score + delta:
        print(f'Validation loss has not decreased ({valid_loss_min:.6f} --> {valid_loss:.6f}). Not Saving Model...')
      
      else:
        best_score = score
        print(f'Validation loss has decreased ({valid_loss_min:.6f} --> {valid_loss:.6f}). Saving model...')
        torch.save(model.state_dict(), file_model)
        valid_loss_min = valid_loss



    # Log the train and valid loss to W&B
    wandb.log({f"Train epoch Loss :": train_loss, f"Valid epoch Loss :": valid_loss })
    wandb.log({f"Train epoch Acc :": train_acc, f"Valid epoch Acc :": valid_acc})



    # Print the train loss and accuracy for given number of epochs, batch size and number of samples
    print(f'Epoch : {epoch+1} / {epochs}')
    print(f'Time to complete {epoch+1} is {dt}')
    # print(f'Learning rate: {scheduler.get_last_lr()}')
    # print(f'Learning rate: {scheduler._last_lr[0]}')
    print(f'Train Loss: {train_loss : .4f} | Train Accuracy: {train_acc * 100 : .4f}%')
    print(f'Valid Loss: {valid_loss : .4f} | Valid Accuracy: {valid_acc * 100 : .4f}%')
    print()
    torch.cuda.empty_cache()

  return train_loss_history, train_acc_history, valid_loss_history, valid_acc_history


## HyperParameter

In [None]:
hyperparameters = dict(
    epochs = 5,
    output_dim = 10, 
    batch_size = 128,
    learning_rate = 0.05,
    dataset="CIFAR10",
    architecture="CNN",
    log_interval = 100,
    log_batch = True,
    file_model = data_folder/'exp3.pt',
    grad_clipping = False,
    max_norm = 0,
    patience = 0 ,
    early_stopping = False,
    weight_decay = 0,
    scheduler_factor = 0,
    scheduler_patience = 0,
   )

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Intialize wandb

In [None]:
wandb.init(name = 'Exp1_LR(0.05)-Testing', project = 'CNN_Experiment_Neetika', config = hyperparameters)

[34m[1mwandb[0m: Currently logged in as: [33mneetika[0m (use `wandb login --relogin` to force relogin)


In [None]:
wandb.config.device = device
print(wandb.config.device )

cpu


## Specify Dataloader, Loss_function, Model, Optimizer, Weight Initialization

In [None]:
# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Data Loader
train_loader = torch.utils.data.DataLoader(train_subset, batch_size=wandb.config.batch_size, shuffle = True)
valid_loader = torch.utils.data.DataLoader(valid_subset, batch_size=wandb.config.batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(testset, batch_size=wandb.config.batch_size,   shuffle = False)

# cross entropy loss function
loss_function = nn.CrossEntropyLoss()

# device 
model = CIFAR10CNN1()

def init_weights(m):
  if type(m) == nn.Conv2d:
        torch.nn.init.kaiming_normal_(m.weight)
        torch.nn.init.zeros_(m.bias)

        
# apply initialization recursively  to all modules
# model.apply(init_weights)

wandb.config.init_weights = init_weights

# put model to GPUs
model.to(wandb.config.device)

# Intialize stochiastic gradient descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay, momentum = 0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay)
# optimizer = torch.optim.RMSprop(model.parameters(), lr=wandb.config.learning_rate,weight_decay=wandb.config.weight_decay, momentum=0.9)

wandb.config.optimizer = optimizer

# scheduler = ReduceLROnPlateau(optimizer, mode='min', factor= wandb.config.scheduler_factor, 
                              # patience=wandb.config.scheduler_patience, verbose=True)

scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, total_steps=len(train_loader) * 10 , epochs=10, three_phase=True)

#scheduler = StepLR(optimizer, gamma=0.4,step_size=1, verbose=True)

In [None]:
# Fix seed value

SEED = 2345 
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

for input, targets in train_loader:
  
  # move inputs and outputs to GPUs
  input = input.to(device)
  targets = targets.to(device)
  model.eval()
  # Forward pass
  output = model(input)
  loss = loss_function(output, targets)
  print(f'Actual loss: {loss}')
  break

print(f'Expected Theoretical loss: {np.log(10)}')



Actual loss: 2.3019375801086426
Expected Theoretical loss: 2.302585092994046


## Training and Saving Model

In [None]:
wandb.watch(model, log = 'all', log_freq=25, log_graph=True)

[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`


[<wandb.wandb_torch.TorchGraph at 0x7f86b7207a90>]

In [None]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Learning rate: [0.006614526851829006]
Learning rate: [0.014272463088775028]
Learning rate: [0.026123052998520413]
Validation loss has decreased (inf --> 1.564840). Saving Model...
Epoch : 1 / 5
Time to complete 1 is 0:03:19.337535
Learning rate: [0.02804642345947317]
Train Loss:  2.0434 | Train Accuracy:  23.7525%
Valid Loss:  1.5648 | Valid Accuracy:  43.7800%

Learning rate: [0.040849342881520805]
Learning rate: [0.05681480307219131]
Learning rate: [0.07224519515513114]
Validation loss has decreased (1.564840 --> 1.236263). Saving model...
Epoch : 2 / 5
Time to complete 2 is 0:03:16.895490
Learning rate: [0.07609275712154677]
Train Loss:  1.3420 | Train Accuracy:  52.0000%
Valid Loss:  1.2363 | Valid Accuracy:  57.7500%

Learning rate: [0.08542574272732459]
Learning rate: [0.09489169415873297]
Learning rate: [0.0995911001495962]
Validation loss has decreased (1.236263 --> 1.021881). Saving model...
Epoch : 3 / 5
Time to complete 3 is 0:03:16.003670
Learning rate: [0.09999973078149742

#Model 2 -  Channel Size Increased


## CNN Model Class

In [None]:
class CIFAR10CNN2(nn.Module):
    
    def __init__(self):

      super().__init__()

      super(CIFAR10CNN2, self).__init__()
      
      self.conv1_layer = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3, padding='same'), # 32*32
          nn.ReLU(),
          nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'), #32*32
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 16*16
          
      )

      self.conv2_layer = nn.Sequential(
          nn.Conv2d(in_channels=128, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 8*8
      )


      self.flatten = nn.Flatten()
      
      self.fc1 = nn.Linear(512*8*8, out_features=1024)
      self.fc2 = nn.Linear(1024, out_features=512)
      self.fc3 = nn.Linear(512, out_features=10)
      
      
      
    def forward(self, x):
        # conv layers
        out = self.conv1_layer(x)
        out = self.conv2_layer(out)

        # flatten befrore input to linear layer
        out = self.flatten(out)
        # linear hidden layers
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        # output layer - no softmax as it is applied by nn.CrossEntropyLoss

        out = self.fc3(out)
        
        return out

In [None]:
summary(CIFAR10CNN2().cuda(), (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [-1, 128, 32, 32]           3,584
              ReLU-2          [-1, 128, 32, 32]               0
            Conv2d-3          [-1, 128, 32, 32]         147,584
              ReLU-4          [-1, 128, 32, 32]               0
         MaxPool2d-5          [-1, 128, 16, 16]               0
            Conv2d-6          [-1, 512, 16, 16]         590,336
              ReLU-7          [-1, 512, 16, 16]               0
            Conv2d-8          [-1, 512, 16, 16]       2,359,808
              ReLU-9          [-1, 512, 16, 16]               0
        MaxPool2d-10            [-1, 512, 8, 8]               0
          Flatten-11                [-1, 32768]               0
           Linear-12                 [-1, 1024]      33,555,456
           Linear-13                  [-1, 512]         524,800
           Linear-14                   

## HyperParameter

In [None]:
hyperparameters = dict(
    epochs = 5,
    output_dim = 10, 
    batch_size = 128,
    learning_rate = 0.03,
    dataset="CIFAR10",
    architecture="CNN",
    log_interval = 100,
    log_batch = True,
    file_model = data_folder/'exp3.pt',
    grad_clipping = False,
    max_norm = 0,
    patience = 0 ,
    early_stopping = False,
    weight_decay = 0,
    scheduler_factor = 0,
    scheduler_patience = 0,
   )

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Intialize wandb

In [None]:
wandb.init(name = 'Exp2_LR(0.03)+NewChannelSizeIncr', project = 'CNN_Experiment_Neetika', config = hyperparameters)

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Batch Acc :,▁▁▄▃▃▅▆▆▆▇▇▆██▇
Train Batch Loss :,█▇▅▅▅▄▃▃▃▂▂▃▁▂▂
Train epoch Acc :,▁▄▆▇█
Train epoch Loss :,█▅▃▂▁
Valid Batch Accuracy :,▁▃█
Valid Batch Loss :,█▅▁
Valid epoch Acc :,▁▄▆▇█
Valid epoch Loss :,█▅▃▂▁

0,1
Train Batch Acc :,0.74219
Train Batch Loss :,0.69995
Train epoch Acc :,0.77587
Train epoch Loss :,0.64493
Valid Batch Accuracy :,0.76562
Valid Batch Loss :,0.7739
Valid epoch Acc :,0.7514
Valid epoch Loss :,0.70155


In [None]:
wandb.config.device = device
print(wandb.config.device )

cuda:0


## Specify Dataloader, Loss_function, Model, Optimizer, Weight Initialization

In [None]:
# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Data Loader
train_loader = torch.utils.data.DataLoader(train_subset, batch_size=wandb.config.batch_size, shuffle = True)
valid_loader = torch.utils.data.DataLoader(valid_subset, batch_size=wandb.config.batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(testset, batch_size=wandb.config.batch_size,   shuffle = False)

# cross entropy loss function
loss_function = nn.CrossEntropyLoss()

# device 
model = CIFAR10CNN2()

def init_weights(m):
  if type(m) == nn.Conv2d:
        torch.nn.init.kaiming_normal_(m.weight)
        torch.nn.init.zeros_(m.bias)

        
# apply initialization recursively  to all modules
# model.apply(init_weights)

wandb.config.init_weights = init_weights

# put model to GPUs
model.to(wandb.config.device)

# Intialize stochiastic gradient descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay, momentum = 0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay)

wandb.config.optimizer = optimizer

# scheduler = ReduceLROnPlateau(optimizer, mode='min', factor= wandb.config.scheduler_factor, 
                              # patience=wandb.config.scheduler_patience, verbose=True)

# scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, total_steps=len(train_loader) * 10 , epochs=10, three_phase=True)

#scheduler = StepLR(optimizer, gamma=0.4,step_size=1, verbose=True)

In [None]:
# Fix seed value

SEED = 2345 
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

for input, targets in train_loader:
  
  # move inputs and outputs to GPUs
  input = input.to(device)
  targets = targets.to(device)
  model.eval()
  # Forward pass
  output = model(input)
  loss = loss_function(output, targets)
  print(f'Actual loss: {loss}')
  break

print(f'Expected Theoretical loss: {np.log(10)}')



Actual loss: 2.30287504196167
Expected Theoretical loss: 2.302585092994046


## Training and Saving Model

In [None]:
wandb.watch(model, log = 'all', log_freq=25, log_graph=True)

[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`


[<wandb.wandb_torch.TorchGraph at 0x7f2dee009e10>]

In [None]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Validation loss has decreased (inf --> 1.229793). Saving Model...
Epoch : 1 / 5
Time to complete 1 is 0:01:05.161604
Train Loss:  1.6498 | Train Accuracy:  39.1775%
Valid Loss:  1.2298 | Valid Accuracy:  55.6300%

Validation loss has decreased (1.229793 --> 0.880920). Saving model...
Epoch : 2 / 5
Time to complete 2 is 0:01:06.475897
Train Loss:  1.0248 | Train Accuracy:  63.6450%
Valid Loss:  0.8809 | Valid Accuracy:  68.7200%

Validation loss has decreased (0.880920 --> 0.714982). Saving model...
Epoch : 3 / 5
Time to complete 3 is 0:01:06.326808
Train Loss:  0.7239 | Train Accuracy:  74.7900%
Valid Loss:  0.7150 | Valid Accuracy:  75.0200%

Validation loss has not decreased (0.714982 --> 0.748220). Not Saving Model...
Epoch : 4 / 5
Time to complete 4 is 0:01:05.369591
Train Loss:  0.4829 | Train Accuracy:  83.0550%
Valid Loss:  0.7482 | Valid Accuracy:  74.5200%

Validation loss has not decreased (0.714982 --> 0.759010). Not Saving Model...
Epoch : 5 / 5
Time to complete 5 is 0:01:0

# Model 3- Increasing Layer

SGD
OneCycleLR

## CNN Model Class

In [None]:
class CIFAR10CNN3(nn.Module):
    
    def __init__(self):

      super().__init__()

      super(CIFAR10CNN3, self).__init__()
      
      self.conv1_layer = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=128, kernel_size=5, padding='same'), # 32*32
          nn.ReLU(),
          nn.Conv2d(in_channels=128, out_channels=128, kernel_size=5, padding='same'), #32 *32
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 16 * 16
          
      )

      self.conv2_layer = nn.Sequential(
          nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 8 * 8
      )
      self.conv3_layer = nn.Sequential(
          nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, padding='same'), #8*8
          nn.ReLU(),
          nn.Conv2d(in_channels=512, out_channels=512, kernel_size=5, padding='same'), #8*8
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 4*4
      )


      self.flatten = nn.Flatten()
      
      self.fc1 = nn.Linear(512*4*4, out_features=1024)
      self.fc2 = nn.Linear(1024, out_features=512)
      self.fc3 = nn.Linear(512, out_features=10)
      
      
    def forward(self, x):
        # conv layers
        out = self.conv1_layer(x)
        out = self.conv2_layer(out)
        out = self.conv3_layer(out)

        # flatten befrore input to linear layer
        out = self.flatten(out)
        # linear hidden layers
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))

        # output layer - no softmax as it is applied by nn.CrossEntropyLoss

        out = self.fc3(out)
        
        return out

In [None]:
summary(CIFAR10CNN3().cuda(), (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [-1, 128, 32, 32]           9,728
              ReLU-2          [-1, 128, 32, 32]               0
            Conv2d-3          [-1, 128, 32, 32]         409,728
              ReLU-4          [-1, 128, 32, 32]               0
         MaxPool2d-5          [-1, 128, 16, 16]               0
            Conv2d-6          [-1, 256, 16, 16]         295,168
              ReLU-7          [-1, 256, 16, 16]               0
            Conv2d-8          [-1, 256, 16, 16]         590,080
              ReLU-9          [-1, 256, 16, 16]               0
        MaxPool2d-10            [-1, 256, 8, 8]               0
           Conv2d-11            [-1, 512, 8, 8]       3,277,312
             ReLU-12            [-1, 512, 8, 8]               0
           Conv2d-13            [-1, 512, 8, 8]       6,554,112
             ReLU-14            [-1, 51

## HyperParameter

In [None]:
hyperparameters = dict(
    epochs = 5,
    output_dim = 10, 
    batch_size = 128,
    learning_rate = 0.03,
    dataset="CIFAR10",
    architecture="CNN",
    log_interval = 100,
    log_batch = True,
    file_model = data_folder/'exp3.pt',
    grad_clipping = False,
    max_norm = 0,
    patience = 0 ,
    early_stopping = False,
    weight_decay = 0,
    scheduler_factor = 0.5,
    scheduler_patience = 0,
   )

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Intialize wandb

In [None]:
wandb.init(name = 'Exp3_IncreasingLayer', project = 'CNN_Experiment_Neetika', config = hyperparameters)

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Batch Acc :,▁▁▂▄▅▄▅▅▇▅▇███▇█
Train Batch Loss :,██▇▅▅▅▄▄▂▄▂▁▂▁▁▁
Train epoch Acc :,▁▄▆▇█
Train epoch Loss :,█▅▄▂▁
Valid Batch Accuracy :,▁▇█
Valid Batch Loss :,█▂▁
Valid epoch Acc :,▁▆███
Valid epoch Loss :,█▄▁▂▃

0,1
Train Batch Acc :,0.88281
Train Batch Loss :,0.31675
Train epoch Acc :,0.8942
Train epoch Loss :,0.30462
Valid Batch Accuracy :,0.78125
Valid Batch Loss :,0.64444
Valid epoch Acc :,0.747
Valid epoch Loss :,0.81607


In [None]:
wandb.config.device = device
print(wandb.config.device )

cuda:0


## Specify Dataloader, Loss_function, Model, Optimizer, Weight Initialization

In [None]:
# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Data Loader
train_loader = torch.utils.data.DataLoader(train_subset, batch_size=wandb.config.batch_size, shuffle = True)
valid_loader = torch.utils.data.DataLoader(valid_subset, batch_size=wandb.config.batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(testset, batch_size=wandb.config.batch_size,   shuffle = False)

# cross entropy loss function
loss_function = nn.CrossEntropyLoss()

# device 
model = CIFAR10CNN3()

def init_weights(m):
  if type(m) == nn.Conv2d:
        torch.nn.init.kaiming_normal_(m.weight)
        torch.nn.init.zeros_(m.bias)

        
# apply initialization recursively  to all modules
# model.apply(init_weights)

wandb.config.init_weights = init_weights

# put model to GPUs
model.to(wandb.config.device)

# Intialize stochiastic gradient descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay, momentum = 0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay)

wandb.config.optimizer = optimizer

# scheduler = ReduceLROnPlateau(optimizer, mode='min', factor= wandb.config.scheduler_factor, 
                              # patience=wandb.config.scheduler_patience, verbose=True)

# scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, total_steps=len(train_loader) * 10 , epochs=10, three_phase=True)

# scheduler = StepLR(optimizer, gamma=0.4,step_size=1, verbose=True)

## Training and Saving Model

In [None]:
wandb.watch(model, log = 'all', log_freq=25, log_graph=True)

[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`


[<wandb.wandb_torch.TorchGraph at 0x7f2ded3f6c50>]

In [None]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Validation loss has decreased (inf --> 1.711429). Saving Model...
Epoch : 1 / 5
Time to complete 1 is 0:01:12.296384
Train Loss:  2.0301 | Train Accuracy:  22.9925%
Valid Loss:  1.7114 | Valid Accuracy:  36.5000%

Validation loss has decreased (1.711429 --> 1.202313). Saving model...
Epoch : 2 / 5
Time to complete 2 is 0:01:12.830973
Train Loss:  1.4206 | Train Accuracy:  47.5650%
Valid Loss:  1.2023 | Valid Accuracy:  56.9200%

Validation loss has decreased (1.202313 --> 0.914028). Saving model...
Epoch : 3 / 5
Time to complete 3 is 0:01:12.816780
Train Loss:  1.0328 | Train Accuracy:  63.1875%
Valid Loss:  0.9140 | Valid Accuracy:  67.9600%

Validation loss has decreased (0.914028 --> 0.733043). Saving model...
Epoch : 4 / 5
Time to complete 4 is 0:01:12.368443
Train Loss:  0.7795 | Train Accuracy:  72.9625%
Valid Loss:  0.7330 | Valid Accuracy:  74.4600%

Validation loss has decreased (0.733043 --> 0.685352). Saving model...
Epoch : 5 / 5
Time to complete 5 is 0:01:12.874527
Train L

# Model 4- After **Overfitting** - Increasing *Size*

**Taking Larger Ratio of Dataset**

BaseModel - CIFAR10CNN2

## Increasing Sample Size

In [158]:
# n sample points
train_sample_size = int(len(trainset)/10)
valid_sample_size = int(len(validset)/10)

# Getting n random indices
train_subset_indices = random.sample(range(0, len(trainset)), train_sample_size)
valid_subset_indices = random.sample(range(0, len(testset)), valid_sample_size)

# Getting subset of dataset
train_subset = torch.utils.data.Subset(trainset, train_subset_indices)
valid_subset = torch.utils.data.Subset(validset, valid_subset_indices)

In [159]:
valid_sample_size

1000

## CNN Model Class

In [160]:
class CIFAR10CNN4(nn.Module):
    
    def __init__(self):

      super().__init__()

      super(CIFAR10CNN4, self).__init__()
      
      self.conv1_layer = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3, padding='same'), # 32*32
          nn.ReLU(),
          nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'), #32*32
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 16*16 
          
      )

      self.conv2_layer = nn.Sequential(
          nn.Conv2d(in_channels=128, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 8*8
      )


      self.flatten = nn.Flatten()
      
      self.fc1 = nn.Linear(512*8*8, out_features=1024)
      self.fc2 = nn.Linear(1024, out_features=512)
      self.fc3 = nn.Linear(512, out_features=10)
      
      
      
    def forward(self, x):
        # conv layers
        out = self.conv1_layer(x)
        out = self.conv2_layer(out)

        # flatten befrore input to linear layer
        out = self.flatten(out)
        # linear hidden layers
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        # output layer - no softmax as it is applied by nn.CrossEntropyLoss

        out = self.fc3(out)
        
        return out

In [161]:
summary(CIFAR10CNN4().cuda(), (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [-1, 128, 32, 32]           3,584
              ReLU-2          [-1, 128, 32, 32]               0
            Conv2d-3          [-1, 128, 32, 32]         147,584
              ReLU-4          [-1, 128, 32, 32]               0
         MaxPool2d-5          [-1, 128, 16, 16]               0
            Conv2d-6          [-1, 512, 16, 16]         590,336
              ReLU-7          [-1, 512, 16, 16]               0
            Conv2d-8          [-1, 512, 16, 16]       2,359,808
              ReLU-9          [-1, 512, 16, 16]               0
        MaxPool2d-10            [-1, 512, 8, 8]               0
          Flatten-11                [-1, 32768]               0
           Linear-12                 [-1, 1024]      33,555,456
           Linear-13                  [-1, 512]         524,800
           Linear-14                   

## HyperParameter

In [162]:
hyperparameters= dict(
    epochs = 5,
    output_dim = 10, 
    batch_size = 128,
    learning_rate = 0.03,
    dataset="CIFAR10",
    architecture="CNN",
    log_interval = 100,
    log_batch = True,
    file_model = data_folder/'exp1.pt',
    grad_clipping = False,
    max_norm = 0,
    patience = 0 ,
    early_stopping = False,
    weight_decay = 0,
    scheduler_factor = 0,
    scheduler_patience = 0,
   )

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Intialize wandb

In [163]:
wandb.init(name = 'AfterOverFitting_TrainSize(4000)', project = 'CNN_Experiment_Neetika', config = hyperparameters)

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Batch Acc :,▁▁▂▄▃▄▅▄▆▆▆▆▇▇▇▇▇▇██▇█████████
Train Batch Loss :,█▇▆▅▆▅▄▄▃▃▃▃▂▂▂▂▁▂▁▁▂▁▁▁▁▁▁▁▁▁
Train epoch Acc :,▁▄▅▆▇█████
Train epoch Loss :,█▅▄▃▂▁▁▁▁▁
Valid Batch Accuracy :,▁▁▆▅█▄
Valid Batch Loss :,▄▂▁▄▅█
Valid epoch Acc :,▁▅▇▇██████
Valid epoch Loss :,█▃▁▁▂▃▄▆▇▇

0,1
Train Batch Acc :,1.0
Train Batch Loss :,0.00823
Train epoch Acc :,0.98167
Train epoch Loss :,0.0549
Valid Batch Accuracy :,0.75781
Valid Batch Loss :,1.18496
Valid epoch Acc :,0.7725
Valid epoch Loss :,1.16807


In [164]:
wandb.config.device = device
print(wandb.config.device )

cuda:0


## Specify Dataloader, Loss_function, Model, Optimizer, Weight Initialization

In [165]:
# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Data Loader
train_loader = torch.utils.data.DataLoader(train_subset, batch_size=wandb.config.batch_size, shuffle = True)
valid_loader = torch.utils.data.DataLoader(valid_subset, batch_size=wandb.config.batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(testset, batch_size=wandb.config.batch_size,   shuffle = False)

# cross entropy loss function
loss_function = nn.CrossEntropyLoss()

# device 
model = CIFAR10CNN4()

def init_weights(m):
  if type(m) == nn.Conv2d:
        torch.nn.init.kaiming_normal_(m.weight)
        torch.nn.init.zeros_(m.bias)

        
# apply initialization recursively  to all modules
# model.apply(init_weights)

wandb.config.init_weights = init_weights

# put model to GPUs
model.to(wandb.config.device)

# Intialize stochiastic gradient descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay, momentum = 0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay)

wandb.config.optimizer = optimizer

# scheduler = ReduceLROnPlateau(optimizer, mode='min', factor= wandb.config.scheduler_factor, 
                              # patience=wandb.config.scheduler_patience, verbose=True)

# scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, total_steps=len(train_loader) * 10 , epochs=10, three_phase=True)

# scheduler = StepLR(optimizer, gamma=0.4,step_size=1, verbose=True)

## Training and Saving Model

In [166]:
wandb.watch(model, log = 'all', log_freq=25, log_graph=True)

[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`


[<wandb.wandb_torch.TorchGraph at 0x7f2dee0aa110>]

In [167]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Validation loss has decreased (inf --> 1.229793). Saving Model...
Epoch : 1 / 5
Time to complete 1 is 0:01:04.992217
Train Loss:  1.6498 | Train Accuracy:  39.1775%
Valid Loss:  1.2298 | Valid Accuracy:  55.6300%

Validation loss has decreased (1.229793 --> 0.880920). Saving model...
Epoch : 2 / 5
Time to complete 2 is 0:01:06.367374
Train Loss:  1.0248 | Train Accuracy:  63.6450%
Valid Loss:  0.8809 | Valid Accuracy:  68.7200%

Validation loss has decreased (0.880920 --> 0.714982). Saving model...
Epoch : 3 / 5
Time to complete 3 is 0:01:07.111791
Train Loss:  0.7239 | Train Accuracy:  74.7900%
Valid Loss:  0.7150 | Valid Accuracy:  75.0200%

Validation loss has not decreased (0.714982 --> 0.748220). Not Saving Model...
Epoch : 4 / 5
Time to complete 4 is 0:01:05.261346
Train Loss:  0.4829 | Train Accuracy:  83.0550%
Valid Loss:  0.7482 | Valid Accuracy:  74.5200%

Validation loss has not decreased (0.714982 --> 0.759010). Not Saving Model...
Epoch : 5 / 5
Time to complete 5 is 0:01:0

# Model 5 - Adding Dropout Layers



## CNN Model Class

In [200]:
class CIFAR10CNN5(nn.Module):
    
    def __init__(self):

      super().__init__()

      super(CIFAR10CNN5, self).__init__()
      
      self.conv1_layer = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3, padding='same'), # 32*32
          nn.ReLU(),
          nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'), #32*32
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 16*16
          nn.Dropout(0.05) 
          
      )

      self.conv2_layer = nn.Sequential(
          nn.Conv2d(in_channels=128, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 8*8
          # nn.Dropout(0.05)
      )


      self.flatten = nn.Flatten()
      
      self.fc1 = nn.Linear(512*8*8, out_features=1024)
      self.fc2 = nn.Linear(1024, out_features=512)
      self.fc3 = nn.Linear(512, out_features=10)
      
      
      
    def forward(self, x):
        # conv layers
        out = self.conv1_layer(x)
        out = self.conv2_layer(out)

        # flatten befrore input to linear layer
        out = self.flatten(out)
        # linear hidden layers
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        # output layer - no softmax as it is applied by nn.CrossEntropyLoss

        out = self.fc3(out)
        
        return out

In [201]:
summary(CIFAR10CNN5().cuda(), (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [-1, 128, 32, 32]           3,584
              ReLU-2          [-1, 128, 32, 32]               0
            Conv2d-3          [-1, 128, 32, 32]         147,584
              ReLU-4          [-1, 128, 32, 32]               0
         MaxPool2d-5          [-1, 128, 16, 16]               0
           Dropout-6          [-1, 128, 16, 16]               0
            Conv2d-7          [-1, 512, 16, 16]         590,336
              ReLU-8          [-1, 512, 16, 16]               0
            Conv2d-9          [-1, 512, 16, 16]       2,359,808
             ReLU-10          [-1, 512, 16, 16]               0
        MaxPool2d-11            [-1, 512, 8, 8]               0
          Flatten-12                [-1, 32768]               0
           Linear-13                 [-1, 1024]      33,555,456
           Linear-14                  [

## HyperParameter

In [202]:
hyperparameters= dict(
    epochs = 5,
    output_dim = 10, 
    batch_size = 128,
    learning_rate = 0.03,
    dataset="CIFAR-10",
    architecture="CNN",
    log_interval = 100,
    log_batch = True,
    file_model = data_folder/'exp1.pt',
    grad_clipping = False,
    max_norm = 0,
    patience = 0 ,
    early_stopping = False,
    weight_decay = 0,
    scheduler_factor = 0,
    scheduler_patience = 0,
   )

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Intialize wandb

In [203]:
wandb.init(name = 'Dropout_Regularization4_1Layer', project = 'CNN_Experiment_Neetika', config = hyperparameters)

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Batch Acc :,▁▁▂▄▄▆▄█▆▇▇▇███
Train Batch Loss :,██▆▄▅▄▄▂▃▂▂▂▁▁▁
Train epoch Acc :,▁▄▆▇█
Train epoch Loss :,█▅▃▂▁
Valid Batch Accuracy :,▁▇█
Valid Batch Loss :,█▄▁
Valid epoch Acc :,▁▅▆▇█
Valid epoch Loss :,█▄▃▂▁

0,1
Train Batch Acc :,0.78906
Train Batch Loss :,0.5496
Train epoch Acc :,0.7772
Train epoch Loss :,0.64196
Valid Batch Accuracy :,0.77344
Valid Batch Loss :,0.58564
Valid epoch Acc :,0.7693
Valid epoch Loss :,0.65928


In [204]:
wandb.config.device = device
print(wandb.config.device )

cuda:0


## Specify Dataloader, Loss_function, Model, Optimizer, Weight Initialization

In [205]:
# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Data Loader
train_loader = torch.utils.data.DataLoader(trainset, batch_size=wandb.config.batch_size, shuffle = True)
valid_loader = torch.utils.data.DataLoader(validset, batch_size=wandb.config.batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(testset, batch_size=wandb.config.batch_size,   shuffle = False)

# cross entropy loss function
loss_function = nn.CrossEntropyLoss()

# device 
model = CIFAR10CNN5()

def init_weights(m):
  if type(m) == nn.Conv2d:
        torch.nn.init.kaiming_normal_(m.weight)
        torch.nn.init.zeros_(m.bias)

        
# apply initialization recursively  to all modules
# model.apply(init_weights)

wandb.config.init_weights = init_weights

# put model to GPUs
model.to(wandb.config.device)

# Intialize stochiastic gradient descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay, momentum = 0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay)

wandb.config.optimizer = optimizer

# scheduler = ReduceLROnPlateau(optimizer, mode='min', factor= wandb.config.scheduler_factor, 
                              # patience=wandb.config.scheduler_patience, verbose=True)

# scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, total_steps=len(train_loader) * 10 , epochs=10, three_phase=True)

scheduler = StepLR(optimizer, gamma=0.4,step_size=1, verbose=True)

Adjusting learning rate of group 0 to 3.0000e-02.


## Training and Saving Model

In [206]:
wandb.watch(model, log = 'all', log_freq=25, log_graph=True)

[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`


[<wandb.wandb_torch.TorchGraph at 0x7f2dee063ed0>]

In [207]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Validation loss has decreased (inf --> 1.346098). Saving Model...
Epoch : 1 / 5
Time to complete 1 is 0:01:13.246768
Train Loss:  1.7195 | Train Accuracy:  36.7900%
Valid Loss:  1.3461 | Valid Accuracy:  50.4000%

Validation loss has decreased (1.346098 --> 1.013818). Saving model...
Epoch : 2 / 5
Time to complete 2 is 0:01:14.430782
Train Loss:  1.1820 | Train Accuracy:  57.3125%
Valid Loss:  1.0138 | Valid Accuracy:  63.6400%

Validation loss has decreased (1.013818 --> 0.829047). Saving model...
Epoch : 3 / 5
Time to complete 3 is 0:01:14.539052
Train Loss:  0.9040 | Train Accuracy:  68.3075%
Valid Loss:  0.8290 | Valid Accuracy:  70.6200%

Validation loss has decreased (0.829047 --> 0.706536). Saving model...
Epoch : 4 / 5
Time to complete 4 is 0:01:13.765510
Train Loss:  0.7417 | Train Accuracy:  74.0350%
Valid Loss:  0.7065 | Valid Accuracy:  75.0600%

Validation loss has decreased (0.706536 --> 0.683502). Saving model...
Epoch : 5 / 5
Time to complete 5 is 0:01:14.449160
Train L

# Data Augmentation

**To increase variation in the Training**

In [217]:
# Transform to convert images to pytorch tensors and normalize the data
train_trans= transforms.Compose([ 
                                #  transforms.RandomCrop(size = (32,32), padding = 2),
                                #  transforms.RandomAffine(degrees=10, translate =(0.05, 0.05), scale=(0.9, 1.1)),
                                 transforms.RandomRotation(2.8),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(), 
                                 transforms.Normalize((0.4914,0.4822,0.4655), (0.2023,0.1994,0.2010))])
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.4914,0.4822,0.4655), (0.2023,0.1994,0.2010))])
train_full = torchvision.datasets.CIFAR10(root=data_folder,
                                              train=True, 
                                              transform=train_trans,
                                              download=True)
trainset, validset = torch.utils.data.random_split(train_full, [40000, 10000], generator=torch.Generator().manual_seed(42))
testset  = torchvision.datasets.CIFAR10(root=data_folder,
                                              train=False, 
                                              transform=trans,
                                              download=True)

Files already downloaded and verified
Files already downloaded and verified


In [218]:
# n sample points
train_sample_size = int(len(trainset)/10)
valid_sample_size = int(len(validset)/10)

# Getting n random indices
train_subset_indices = random.sample(range(0, len(trainset)), train_sample_size)
valid_subset_indices = random.sample(range(0, len(testset)), valid_sample_size)

# Getting subset of dataset
train_subset = torch.utils.data.Subset(trainset, train_subset_indices)
valid_subset = torch.utils.data.Subset(validset, valid_subset_indices)

## HyperParameter

In [219]:
hyperparameters= dict(
    epochs = 5,
    output_dim = 10, 
    batch_size = 256,
    learning_rate = 0.03,
    dataset="CIFAR10",
    architecture="CNN",
    log_interval = 100,
    log_batch = True,
    file_model = data_folder/'exp1.pt',
    grad_clipping = False,
    max_norm = 0,
    patience = 0 ,
    early_stopping = False,
    weight_decay = 0,
    scheduler_factor = 0,
    scheduler_patience = 0,
   )

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Intialize wandb

In [220]:
wandb.init(name = 'DataAug1(RandomRotation+Flip)', project = 'CNN_Experiment_Neetika', config = hyperparameters)

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Batch Acc :,▂▁▁▄▃▅▄▇▆▆▆▇▆▇█
Train Batch Loss :,█▇▆▅▅▄▄▂▃▂▂▂▂▂▁
Train epoch Acc :,▁▄▆▇█
Train epoch Loss :,█▅▃▂▁
Valid Batch Accuracy :,▁▅█
Valid Batch Loss :,█▄▁
Valid epoch Acc :,▁▅▆██
Valid epoch Loss :,█▄▃▁▁

0,1
Train Batch Acc :,0.84375
Train Batch Loss :,0.49095
Train epoch Acc :,0.7826
Train epoch Loss :,0.62601
Valid Batch Accuracy :,0.8125
Valid Batch Loss :,0.51383
Valid epoch Acc :,0.7616
Valid epoch Loss :,0.6835


In [221]:
wandb.config.device = device
print(wandb.config.device )

cuda:0


## Specify Dataloader, Loss_function, Model, Optimizer, Weight Initialization

In [222]:
# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Data Loader
train_loader = torch.utils.data.DataLoader(train_subset, batch_size=wandb.config.batch_size, shuffle = True)
valid_loader = torch.utils.data.DataLoader(valid_subset, batch_size=wandb.config.batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(testset, batch_size=wandb.config.batch_size,   shuffle = False)

# cross entropy loss function
loss_function = nn.CrossEntropyLoss()

# device 
model = CIFAR10CNN5()

def init_weights(m):
  if type(m) == nn.Conv2d:
        torch.nn.init.kaiming_normal_(m.weight)
        torch.nn.init.zeros_(m.bias)

        
# apply initialization recursively  to all modules
# model.apply(init_weights)

wandb.config.init_weights = init_weights

# put model to GPUs
model.to(wandb.config.device)

# Intialize stochiastic gradient descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay, momentum = 0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay)
# optimizer = torch.optim.RMSprop(model.parameters(), lr=wandb.config.learning_rate,weight_decay=wandb.config.weight_decay, momentum=0.9)

wandb.config.optimizer = optimizer

# scheduler = ReduceLROnPlateau(optimizer, mode='min', factor= wandb.config.scheduler_factor, 
                              # patience=wandb.config.scheduler_patience, verbose=True)

# scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, total_steps=len(train_loader) * 10 , epochs=10, three_phase=True)

# scheduler = StepLR(optimizer, gamma=0.4,step_size=1, verbose=True)

## Training and Saving Model

In [223]:
wandb.watch(model, log = 'all', log_freq=25, log_graph=True)


[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`


[<wandb.wandb_torch.TorchGraph at 0x7f2ded7b2290>]

In [224]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Validation loss has decreased (inf --> 1.518196). Saving Model...
Epoch : 1 / 5
Time to complete 1 is 0:01:04.018552
Train Loss:  1.8656 | Train Accuracy:  30.9075%
Valid Loss:  1.5182 | Valid Accuracy:  43.5600%

Validation loss has decreased (1.518196 --> 1.062454). Saving model...
Epoch : 2 / 5
Time to complete 2 is 0:01:05.259090
Train Loss:  1.2837 | Train Accuracy:  53.4500%
Valid Loss:  1.0625 | Valid Accuracy:  61.5500%

Validation loss has decreased (1.062454 --> 0.876754). Saving model...
Epoch : 3 / 5
Time to complete 3 is 0:01:05.634388
Train Loss:  0.9861 | Train Accuracy:  65.2425%
Valid Loss:  0.8768 | Valid Accuracy:  69.7700%

Validation loss has decreased (0.876754 --> 0.731770). Saving model...
Epoch : 4 / 5
Time to complete 4 is 0:01:05.148150
Train Loss:  0.7757 | Train Accuracy:  72.6700%
Valid Loss:  0.7318 | Valid Accuracy:  74.0900%

Validation loss has decreased (0.731770 --> 0.670416). Saving model...
Epoch : 5 / 5
Time to complete 5 is 0:01:05.135941
Train L

#Model 6 - Adding Batch Normalization

## CNN Model Class

In [251]:
# Transform to convert images to pytorch tensors and normalize the data
train_trans= transforms.Compose([ 
                                #  transforms.RandomCrop(size = (32,32), padding = 2),
                                #  transforms.RandomAffine(degrees=10, translate =(0.05, 0.05), scale=(0.9, 1.1)),
                                 transforms.RandomRotation(2.8),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(), 
                                 transforms.Normalize((0.4914,0.4822,0.4655), (0.2023,0.1994,0.2010))])
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.4914,0.4822,0.4655), (0.2023,0.1994,0.2010))])
train_full = torchvision.datasets.CIFAR10(root=data_folder,
                                              train=True, 
                                              transform=train_trans,
                                              download=True)
trainset, validset = torch.utils.data.random_split(train_full, [40000, 10000], generator=torch.Generator().manual_seed(42))
testset  = torchvision.datasets.CIFAR10(root=data_folder,
                                              train=False, 
                                              transform=trans,
                                              download=True)

Files already downloaded and verified
Files already downloaded and verified


In [252]:
# n sample points
train_sample_size = int(len(trainset)/10)
valid_sample_size = int(len(validset)/10)

# Getting n random indices
train_subset_indices = random.sample(range(0, len(trainset)), train_sample_size)
valid_subset_indices = random.sample(range(0, len(testset)), valid_sample_size)

# Getting subset of dataset
train_subset = torch.utils.data.Subset(trainset, train_subset_indices)
valid_subset = torch.utils.data.Subset(validset, valid_subset_indices)

In [244]:
class CIFAR10CNN6(nn.Module):
    
    def __init__(self):

      super().__init__()

      super(CIFAR10CNN6, self).__init__()
      
      self.conv1_layer = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3, padding='same'), # 32*32
          nn.ReLU(),
          nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'), #32*32
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 16*16
          nn.Dropout(0.05) 
          
      )

      self.conv2_layer = nn.Sequential(
          nn.Conv2d(in_channels=128, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.BatchNorm2d(512),
          nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.BatchNorm2d(512),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 8*8
          # nn.Dropout(0.05)
      )


      self.flatten = nn.Flatten()
      
      self.fc1 = nn.Linear(512*8*8, out_features=1024)
      self.fc2 = nn.Linear(1024, out_features=512)
      self.fc3 = nn.Linear(512, out_features=10)
      
      
      
    def forward(self, x):
        # conv layers
        out = self.conv1_layer(x)
        out = self.conv2_layer(out)

        # flatten befrore input to linear layer
        out = self.flatten(out)
        # linear hidden layers
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        # output layer - no softmax as it is applied by nn.CrossEntropyLoss

        out = self.fc3(out)
        
        return out

In [233]:
summary(CIFAR10CNN6().cuda(), (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [-1, 128, 32, 32]           3,584
              ReLU-2          [-1, 128, 32, 32]               0
            Conv2d-3          [-1, 128, 32, 32]         147,584
              ReLU-4          [-1, 128, 32, 32]               0
         MaxPool2d-5          [-1, 128, 16, 16]               0
           Dropout-6          [-1, 128, 16, 16]               0
            Conv2d-7          [-1, 512, 16, 16]         590,336
              ReLU-8          [-1, 512, 16, 16]               0
       BatchNorm2d-9          [-1, 512, 16, 16]           1,024
           Conv2d-10          [-1, 512, 16, 16]       2,359,808
             ReLU-11          [-1, 512, 16, 16]               0
      BatchNorm2d-12          [-1, 512, 16, 16]           1,024
        MaxPool2d-13            [-1, 512, 8, 8]               0
          Flatten-14                [-1

## HyperParameter

In [253]:
hyperparameters= dict(
    epochs = 5,
    output_dim = 10, 
    batch_size = 128,
    learning_rate = 0.03,
    dataset="CIFAR10",
    architecture="CNN",
    log_interval = 100,
    log_batch = True,
    file_model = data_folder/'exp1.pt',
    grad_clipping = False,
    max_norm = 0,
    patience = 0 ,
    early_stopping = False,
    weight_decay = 0,
    scheduler_factor = 0,
    scheduler_patience = 0,
   )

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Intialize wandb

In [254]:
wandb.init(name = 'BatchNorm2-WithDataAug', project = 'CNN_Experiment_Neetika', config = hyperparameters)

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Batch Acc :,▁▃▄▅▆▇█
Train Batch Loss :,█▅▅▄▃▂▁
Train epoch Acc :,▁▄▆▇█
Train epoch Loss :,█▅▃▂▁
Valid Batch Accuracy :,▁█
Valid Batch Loss :,█▁
Valid epoch Acc :,▁▅▆██
Valid epoch Loss :,█▄▂▁▂

0,1
Train Batch Acc :,0.94531
Train Batch Loss :,0.20108
Train epoch Acc :,0.9113
Train epoch Loss :,0.26085
Valid Batch Accuracy :,0.76172
Valid Batch Loss :,0.76283
Valid epoch Acc :,0.7669
Valid epoch Loss :,0.74064


In [255]:
wandb.config.device = device
print(wandb.config.device )

cuda:0


## Specify Dataloader, Loss_function, Model, Optimizer, Weight Initialization

In [257]:
# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Data Loader
train_loader = torch.utils.data.DataLoader(train_subset, batch_size=wandb.config.batch_size, shuffle = True)
valid_loader = torch.utils.data.DataLoader(valid_subset, batch_size=wandb.config.batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(testset, batch_size=wandb.config.batch_size,   shuffle = False)

# cross entropy loss function
loss_function = nn.CrossEntropyLoss()

# device 
model = CIFAR10CNN6()

def init_weights(m):
  if type(m) == nn.Conv2d:
        torch.nn.init.kaiming_normal_(m.weight)
        torch.nn.init.zeros_(m.bias)

        
# apply initialization recursively  to all modules
# model.apply(init_weights)

wandb.config.init_weights = init_weights

# put model to GPUs
model.to(wandb.config.device)

# Intialize stochiastic gradient descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay, momentum = 0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay)
# optimizer = torch.optim.RMSprop(model.parameters(), lr=wandb.config.learning_rate,weight_decay=wandb.config.weight_decay, momentum=0.9)

wandb.config.optimizer = optimizer

# scheduler = ReduceLROnPlateau(optimizer, mode='min', factor= wandb.config.scheduler_factor, 
                              # patience=wandb.config.scheduler_patience, verbose=True)

scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, total_steps=len(train_loader) * 10 , epochs=10, three_phase=True)

# scheduler = StepLR(optimizer, gamma=0.4,step_size=1, verbose=True)

## Training and Saving Model

In [258]:
wandb.watch(model, log = 'all', log_freq=25, log_graph=True)


[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`


[<wandb.wandb_torch.TorchGraph at 0x7f2ded65b4d0>]

In [259]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Validation loss has decreased (inf --> 1.005749). Saving Model...
Epoch : 1 / 5
Time to complete 1 is 0:01:09.767372
Train Loss:  1.2697 | Train Accuracy:  53.9475%
Valid Loss:  1.0057 | Valid Accuracy:  64.3000%

Validation loss has decreased (1.005749 --> 0.792099). Saving model...
Epoch : 2 / 5
Time to complete 2 is 0:01:11.120318
Train Loss:  0.7996 | Train Accuracy:  71.7900%
Valid Loss:  0.7921 | Valid Accuracy:  72.6800%

Validation loss has decreased (0.792099 --> 0.662142). Saving model...
Epoch : 3 / 5
Time to complete 3 is 0:01:11.043428
Train Loss:  0.6102 | Train Accuracy:  78.5525%
Valid Loss:  0.6621 | Valid Accuracy:  77.1400%

Validation loss has decreased (0.662142 --> 0.602200). Saving model...
Epoch : 4 / 5
Time to complete 4 is 0:01:09.851309
Train Loss:  0.4826 | Train Accuracy:  83.1800%
Valid Loss:  0.6022 | Valid Accuracy:  79.2100%

Validation loss has decreased (0.602200 --> 0.573868). Saving model...
Epoch : 5 / 5
Time to complete 5 is 0:01:10.839163
Train L

# Full Dataset

## Data Transformation

In [8]:
# Transform to convert images to pytorch tensors and normalize the data
train_trans= transforms.Compose([ 
                                #  transforms.RandomCrop(size = (32,32), padding = 2),
                                #  transforms.RandomAffine(degrees=10, translate =(0.05, 0.05), scale=(0.9, 1.1)),
                                transforms.RandomRotation(2.8),
                                transforms.RandomHorizontalFlip(),
                                transforms.ToTensor(), 
                                transforms.Normalize((0.4914,0.4822,0.4655), (0.2023,0.1994,0.2010))])
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.4914,0.4822,0.4655), (0.2023,0.1994,0.2010))])
train_full = torchvision.datasets.CIFAR10(root=data_folder,
                                              train=True, 
                                              transform=train_trans,
                                              download=True)
trainset, validset = torch.utils.data.random_split(train_full, [40000, 10000], generator=torch.Generator().manual_seed(42))
testset  = torchvision.datasets.CIFAR10(root=data_folder,
                                              train=False, 
                                              transform=trans,
                                              download=True)

Files already downloaded and verified
Files already downloaded and verified


## Final Model

In [9]:
class CIFAR10CNN(nn.Module):
    
    def __init__(self):

      super().__init__()

      super(CIFAR10CNN, self).__init__()
      
      self.conv1_layer = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3, padding='same'), # 32*32
          nn.ReLU(),
          # nn.BatchNorm2d(128),
          nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'), #32*32
          nn.ReLU(),
          # nn.BatchNorm2d(128),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 16*16
          nn.Dropout(0.05) 
          
      )

      self.conv2_layer = nn.Sequential(
          nn.Conv2d(in_channels=128, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.BatchNorm2d(512),
          nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.BatchNorm2d(512),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 8*8
      )


      self.flatten = nn.Flatten()
      
      self.fc1 = nn.Linear(512*8*8, out_features=1024)
      self.fc2 = nn.Linear(1024, out_features=512)
      self.fc3 = nn.Linear(512, out_features=10)
      
      
      
    def forward(self, x):
        # conv layers
        out = self.conv1_layer(x)
        out = self.conv2_layer(out)

        # flatten befrore input to linear layer
        out = self.flatten(out)
        # linear hidden layers
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        # output layer - no softmax as it is applied by nn.CrossEntropyLoss

        out = self.fc3(out)
        
        return out

In [10]:
summary(CIFAR10CNN().cuda(), (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [-1, 128, 32, 32]           3,584
              ReLU-2          [-1, 128, 32, 32]               0
            Conv2d-3          [-1, 128, 32, 32]         147,584
              ReLU-4          [-1, 128, 32, 32]               0
         MaxPool2d-5          [-1, 128, 16, 16]               0
           Dropout-6          [-1, 128, 16, 16]               0
            Conv2d-7          [-1, 512, 16, 16]         590,336
              ReLU-8          [-1, 512, 16, 16]               0
       BatchNorm2d-9          [-1, 512, 16, 16]           1,024
           Conv2d-10          [-1, 512, 16, 16]       2,359,808
             ReLU-11          [-1, 512, 16, 16]               0
      BatchNorm2d-12          [-1, 512, 16, 16]           1,024
        MaxPool2d-13            [-1, 512, 8, 8]               0
          Flatten-14                [-1

## HyperParameter

In [55]:
hyperparameters= dict(
    epochs = 10,
    output_dim = 10, 
    batch_size = 512,
    learning_rate = 0.005,
    dataset="CIFAR10",
    architecture="CNN",
    log_interval = 100,
    log_batch = True,
    file_model = data_folder/'exp1.pt',
    grad_clipping = False,
    max_norm = 0,
    patience = 0 ,
    early_stopping = False,
    weight_decay = 0,
    scheduler_factor = 0,
    scheduler_patience = 0,
   )

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Intialize wandb

In [49]:
wandb.init(name = 'FullData-Exp3_batch_size(512)+LR(0.005)', project = 'CNN_Experiment_Neetika', config = hyperparameters)

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Batch Acc :,▁▅▇█
Train Batch Loss :,█▄▂▁
Train epoch Acc :,▁▄▅▆▆▇▇▇██
Train epoch Loss :,█▅▄▃▃▂▂▂▁▁
Valid Batch Accuracy :,▁
Valid Batch Loss :,▁
Valid epoch Acc :,▁▅▅▆▆▇████
Valid epoch Loss :,█▄▃▂▂▂▁▁▁▁

0,1
Train Batch Acc :,0.85254
Train Batch Loss :,0.43934
Train epoch Acc :,0.8485
Train epoch Loss :,0.43506
Valid Batch Accuracy :,0.75391
Valid Batch Loss :,0.67364
Valid epoch Acc :,0.7748
Valid epoch Loss :,0.6525


In [50]:
wandb.config.device = device
print(wandb.config.device )

cuda:0


## Specify Dataloader, Loss_function, Model, Optimizer, Weight Initialization

In [51]:
# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Data Loader
train_loader = torch.utils.data.DataLoader(trainset, batch_size=wandb.config.batch_size, shuffle = True)
valid_loader = torch.utils.data.DataLoader(validset, batch_size=wandb.config.batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(testset, batch_size=wandb.config.batch_size,   shuffle = False)

# cross entropy loss function
loss_function = nn.CrossEntropyLoss()

# device 
model = CIFAR10CNN()

def init_weights(m):
  if type(m) == nn.Conv2d:
        torch.nn.init.kaiming_normal_(m.weight)
        torch.nn.init.zeros_(m.bias)

        
# apply initialization recursively  to all modules
# model.apply(init_weights)

wandb.config.init_weights = init_weights

# put model to GPUs
model.to(wandb.config.device)

# Intialize stochiastic gradient descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay, momentum = 0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay)
# optimizer = torch.optim.RMSprop(model.parameters(), lr=wandb.config.learning_rate,weight_decay=wandb.config.weight_decay, momentum=0.9)

wandb.config.optimizer = optimizer

# scheduler = ReduceLROnPlateau(optimizer, mode='min', factor= wandb.config.scheduler_factor, 
                              # patience=wandb.config.scheduler_patience, verbose=True)

scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, total_steps=len(train_loader) * 10 , epochs=10, three_phase=True)

# scheduler = StepLR(optimizer, gamma=0.4,step_size=1, verbose=True)

## Training and Saving Model

In [52]:
wandb.watch(model, log = 'all', log_freq=25, log_graph=True)


[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`


[<wandb.wandb_torch.TorchGraph at 0x7f9f2a7176d0>]

In [53]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Validation loss has decreased (inf --> 1.238680). Saving Model...
Epoch : 1 / 10
Time to complete 1 is 0:00:59.051305
Train Loss:  1.5362 | Train Accuracy:  44.4000%
Valid Loss:  1.2387 | Valid Accuracy:  55.4600%

Validation loss has decreased (1.238680 --> 0.953878). Saving model...
Epoch : 2 / 10
Time to complete 2 is 0:01:00.133284
Train Loss:  1.0322 | Train Accuracy:  62.7325%
Valid Loss:  0.9539 | Valid Accuracy:  66.2700%

Validation loss has decreased (0.953878 --> 0.811191). Saving model...
Epoch : 3 / 10
Time to complete 3 is 0:01:00.231129
Train Loss:  0.8051 | Train Accuracy:  71.5775%
Valid Loss:  0.8112 | Valid Accuracy:  71.0400%

Validation loss has decreased (0.811191 --> 0.726982). Saving model...
Epoch : 4 / 10
Time to complete 4 is 0:01:00.201283
Train Loss:  0.6954 | Train Accuracy:  75.3925%
Valid Loss:  0.7270 | Valid Accuracy:  73.7700%

Validation loss has decreased (0.726982 --> 0.686360). Saving model...
Epoch : 5 / 10
Time to complete 5 is 0:01:00.062990
Tr

##To increase more variability in the train dataset. Increasing Data Augmentation

# Checking Different Optimizer

## HyperParameter

In [81]:
hyperparameters= dict(
    epochs = 10,
    output_dim = 10, 
    batch_size = 512,
    learning_rate = 0.01,
    dataset="CIFAR10",
    architecture="CNN",
    log_interval = 100,
    log_batch = True,
    file_model = data_folder/'exp1.pt',
    grad_clipping = False,
    max_norm = 0,
    patience = 0 ,
    early_stopping = False,
    weight_decay = 0,
    scheduler_factor = 0,
    scheduler_patience = 0,
   )

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Intialize wandb

In [82]:
wandb.init(name = 'FullData-AdamOptimizer+LR(0.01)', project = 'CNN_Experiment_Neetika', config = hyperparameters)

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Batch Acc :,▁▃▅▆███
Train Batch Loss :,█▆▄▃▁▁▁
Train epoch Acc :,▁▄▅▆▆▇▇▇██
Train epoch Loss :,█▅▄▃▃▂▂▂▁▁
Valid Batch Accuracy :,▁█
Valid Batch Loss :,█▁
Valid epoch Acc :,▁▃▅▆▇▇▇███
Valid epoch Loss :,█▅▄▃▂▂▂▁▁▁

0,1
Train Batch Acc :,0.79297
Train Batch Loss :,0.59889
Train epoch Acc :,0.82125
Train epoch Loss :,0.50459
Valid Batch Accuracy :,0.78711
Valid Batch Loss :,0.6694
Valid epoch Acc :,0.767
Valid epoch Loss :,0.65801


In [83]:
wandb.config.device = device
print(wandb.config.device )

cuda:0


## Specify Dataloader, Loss_function, Model, Optimizer, Weight Initialization

In [84]:
# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Data Loader
train_loader = torch.utils.data.DataLoader(trainset, batch_size=wandb.config.batch_size, shuffle = True)
valid_loader = torch.utils.data.DataLoader(validset, batch_size=wandb.config.batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(testset, batch_size=wandb.config.batch_size,   shuffle = False)

# cross entropy loss function
loss_function = nn.CrossEntropyLoss()

# device 
model = CIFAR10CNN()

def init_weights(m):
  if type(m) == nn.Conv2d:
        torch.nn.init.kaiming_normal_(m.weight)
        torch.nn.init.zeros_(m.bias)

        
# apply initialization recursively  to all modules
# model.apply(init_weights)

wandb.config.init_weights = init_weights

# put model to GPUs
model.to(wandb.config.device)

# Intialize stochiastic gradient descent optimizer
# optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay, momentum = 0.9)
optimizer = torch.optim.Adam(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay)
# optimizer = torch.optim.RMSprop(model.parameters(), lr=wandb.config.learning_rate,weight_decay=wandb.config.weight_decay, momentum=0.9)

wandb.config.optimizer = optimizer

# scheduler = ReduceLROnPlateau(optimizer, mode='min', factor= wandb.config.scheduler_factor, 
                              # patience=wandb.config.scheduler_patience, verbose=True)

scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, total_steps=len(train_loader) * 10 , epochs=10, three_phase=True)

# scheduler = StepLR(optimizer, gamma=0.4,step_size=1, verbose=True)

## Training and Saving Model

In [85]:
wandb.watch(model, log = 'all', log_freq=25, log_graph=True)


[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`


[<wandb.wandb_torch.TorchGraph at 0x7f9f6851e610>]

In [86]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Validation loss has decreased (inf --> 1.968872). Saving Model...
Epoch : 1 / 10
Time to complete 1 is 0:01:03.595660
Train Loss:  5.1019 | Train Accuracy:  21.7850%
Valid Loss:  1.9689 | Valid Accuracy:  28.2700%

Validation loss has decreased (1.968872 --> 1.940318). Saving model...
Epoch : 2 / 10
Time to complete 2 is 0:01:04.421564
Train Loss:  1.8797 | Train Accuracy:  31.4050%
Valid Loss:  1.9403 | Valid Accuracy:  30.3000%

Validation loss has decreased (1.940318 --> 1.730662). Saving model...
Epoch : 3 / 10
Time to complete 3 is 0:01:04.829983
Train Loss:  1.7823 | Train Accuracy:  35.2025%
Valid Loss:  1.7307 | Valid Accuracy:  36.4500%

Validation loss has decreased (1.730662 --> 1.585110). Saving model...
Epoch : 4 / 10
Time to complete 4 is 0:01:04.886820
Train Loss:  1.6466 | Train Accuracy:  39.5650%
Valid Loss:  1.5851 | Valid Accuracy:  42.1900%

Validation loss has decreased (1.585110 --> 1.523493). Saving model...
Epoch : 5 / 10
Time to complete 5 is 0:01:05.011802
Tr

# Data Augmentation

In [135]:
# Transform to convert images to pytorch tensors and normalize the data
train_trans= transforms.Compose([ 
                                #  transforms.RandomCrop(size = (32,32), padding = 2),
                                #  transforms.RandomAffine(degrees=10, translate =(0.05, 0.05), scale=(0.9, 1.1)),
                                 transforms.RandomRotation(2.8),
                                 transforms.RandomHorizontalFlip(),
                                 transforms.ToTensor(), 
                                 transforms.Normalize((0.4914,0.4822,0.4655), (0.2023,0.1994,0.2010))])
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.4914,0.4822,0.4655), (0.2023,0.1994,0.2010))])
train_full = torchvision.datasets.CIFAR10(root=data_folder,
                                              train=True, 
                                              transform=train_trans,
                                              download=True)
trainset, validset = torch.utils.data.random_split(train_full, [40000, 10000], generator=torch.Generator().manual_seed(42))
testset  = torchvision.datasets.CIFAR10(root=data_folder,
                                              train=False, 
                                              transform=trans,
                                              download=True)

Files already downloaded and verified
Files already downloaded and verified


## HyperParameter

In [72]:
hyperparameters= dict(
    epochs = 10,
    output_dim = 10, 
    batch_size = 512,
    learning_rate = 0.03,
    dataset="CIFAR10",
    architecture="CNN",
    log_interval = 100,
    log_batch = True,
    file_model = data_folder/'exp1.pt',
    grad_clipping = False,
    max_norm = 0,
    patience = 0 ,
    early_stopping = False,
    weight_decay = 0,
    scheduler_factor = 0,
    scheduler_patience = 0,
   )

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Intialize wandb

In [73]:
wandb.init(name = 'FullData-Exp4_DataAug(Flip+Rotate)+LR(0.03)', project = 'CNN_Experiment_Neetika', config = hyperparameters)

VBox(children=(Label(value=' 0.00MB of 0.00MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Batch Acc :,▁▃▅▆███
Train Batch Loss :,█▆▄▃▁▁▁
Train epoch Acc :,▁▄▅▆▆▇▇▇██
Train epoch Loss :,█▅▄▃▃▂▂▂▁▁
Valid Batch Accuracy :,▁█
Valid Batch Loss :,█▁
Valid epoch Acc :,▁▃▅▆▇▇▇███
Valid epoch Loss :,█▅▄▃▂▂▂▁▁▁

0,1
Train Batch Acc :,0.79297
Train Batch Loss :,0.59889
Train epoch Acc :,0.82125
Train epoch Loss :,0.50459
Valid Batch Accuracy :,0.78711
Valid Batch Loss :,0.6694
Valid epoch Acc :,0.767
Valid epoch Loss :,0.65801


In [74]:
wandb.config.device = device
print(wandb.config.device )

cuda:0


## Specify Dataloader, Loss_function, Model, Optimizer, Weight Initialization

In [78]:
# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Data Loader
train_loader = torch.utils.data.DataLoader(trainset, batch_size=wandb.config.batch_size, shuffle = True)
valid_loader = torch.utils.data.DataLoader(validset, batch_size=wandb.config.batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(testset, batch_size=wandb.config.batch_size,   shuffle = False)

# cross entropy loss function
loss_function = nn.CrossEntropyLoss()

# device 
model = CIFAR10CNN()

def init_weights(m):
  if type(m) == nn.Conv2d:
        torch.nn.init.kaiming_normal_(m.weight)
        torch.nn.init.zeros_(m.bias)

        
# apply initialization recursively  to all modules
# model.apply(init_weights)

wandb.config.init_weights = init_weights

# put model to GPUs
model.to(wandb.config.device)

# Intialize stochiastic gradient descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay, momentum = 0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay)
# optimizer = torch.optim.RMSprop(model.parameters(), lr=wandb.config.learning_rate,weight_decay=wandb.config.weight_decay, momentum=0.9)

wandb.config.optimizer = optimizer

# scheduler = ReduceLROnPlateau(optimizer, mode='min', factor= wandb.config.scheduler_factor, 
                              # patience=wandb.config.scheduler_patience, verbose=True)

scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, total_steps=len(train_loader) * 10 , epochs=10, three_phase=True)

# scheduler = StepLR(optimizer, gamma=0.4,step_size=1, verbose=True)

## Training and Saving Model

In [79]:
wandb.watch(model, log = 'all', log_freq=25, log_graph=True)


[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`


[<wandb.wandb_torch.TorchGraph at 0x7f9f6e94bb50>]

In [80]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Validation loss has decreased (inf --> 1.325209). Saving Model...
Epoch : 1 / 10
Time to complete 1 is 0:01:02.334581
Train Loss:  1.6312 | Train Accuracy:  40.8125%
Valid Loss:  1.3252 | Valid Accuracy:  52.0200%

Validation loss has decreased (1.325209 --> 1.085118). Saving model...
Epoch : 2 / 10
Time to complete 2 is 0:01:03.593011
Train Loss:  1.1957 | Train Accuracy:  57.1825%
Valid Loss:  1.0851 | Valid Accuracy:  60.8100%

Validation loss has decreased (1.085118 --> 0.946641). Saving model...
Epoch : 3 / 10
Time to complete 3 is 0:01:03.811576
Train Loss:  1.0039 | Train Accuracy:  64.0650%
Valid Loss:  0.9466 | Valid Accuracy:  66.1600%

Validation loss has decreased (0.946641 --> 0.888469). Saving model...
Epoch : 4 / 10
Time to complete 4 is 0:01:03.994453
Train Loss:  0.8788 | Train Accuracy:  68.6175%
Valid Loss:  0.8885 | Valid Accuracy:  68.8400%

Validation loss has decreased (0.888469 --> 0.792284). Saving model...
Epoch : 5 / 10
Time to complete 5 is 0:01:03.985124
Tr

# As Train and Valid Score are almost equal. Increasing the complexity in the Model

#**Final Model** 

Full Datatset with Validation Accuracy

In [129]:
class CIFAR10CNNX(nn.Module):
    
    def __init__(self):

      super().__init__()

      super(CIFAR10CNNX, self).__init__()
      
      self.conv1_layer = nn.Sequential(
          nn.Conv2d(in_channels=3, out_channels=128, kernel_size=3, padding='same'), # 32*32
          nn.ReLU(),
          nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, padding='same'), #32*32
          nn.ReLU(),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 16*16
          nn.Dropout(0.05) 
          
      )

      self.conv2_layer = nn.Sequential(
          nn.Conv2d(in_channels=128, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.BatchNorm2d(512),
          nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, padding='same'), #16*16
          nn.ReLU(),
          nn.BatchNorm2d(512),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 8*8
      )
      self.conv3_layer = nn.Sequential(
          nn.Conv2d(in_channels=512, out_channels=1024, kernel_size=5, padding='same'), #8*8
          nn.ReLU(),
          nn.BatchNorm2d(1024),
          nn.Conv2d(in_channels=1024, out_channels=1024, kernel_size=5, padding='same'), #8*8
          nn.ReLU(),
          nn.BatchNorm2d(1024),
          nn.MaxPool2d(kernel_size=2, stride = 2), # 4*4
          nn.Dropout(0.05) 
      )


      self.flatten = nn.Flatten()
      
      self.fc1 = nn.Linear(1024*4*4, out_features=2048)
      self.fc2 = nn.Linear(2048, out_features=1024)
      self.fc3 = nn.Linear(1024, out_features=10)
      
      
      
    def forward(self, x):
        # conv layers
        out = self.conv1_layer(x)
        out = self.conv2_layer(out)
        out = self.conv3_layer(out)

        # flatten befrore input to linear layer
        out = self.flatten(out)
        # linear hidden layers
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        # output layer - no softmax as it is applied by nn.CrossEntropyLoss

        out = self.fc3(out)
        
        return out

In [130]:
summary(CIFAR10CNNX().cuda(), (3, 32, 32))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1          [-1, 128, 32, 32]           3,584
              ReLU-2          [-1, 128, 32, 32]               0
            Conv2d-3          [-1, 128, 32, 32]         147,584
              ReLU-4          [-1, 128, 32, 32]               0
         MaxPool2d-5          [-1, 128, 16, 16]               0
           Dropout-6          [-1, 128, 16, 16]               0
            Conv2d-7          [-1, 512, 16, 16]         590,336
              ReLU-8          [-1, 512, 16, 16]               0
       BatchNorm2d-9          [-1, 512, 16, 16]           1,024
           Conv2d-10          [-1, 512, 16, 16]       2,359,808
             ReLU-11          [-1, 512, 16, 16]               0
      BatchNorm2d-12          [-1, 512, 16, 16]           1,024
        MaxPool2d-13            [-1, 512, 8, 8]               0
           Conv2d-14           [-1, 102

## HyperParameter -Batch Size Reduced




In [140]:
# hyperparameters= dict(
#     epochs = 10,
#     output_dim = 10, 
#     batch_size = 64,
#     learning_rate = 0.005,
#     dataset="CIFAR10",
#     architecture="CNN",
#     log_interval = 100,
#     log_batch = True,
#     file_model = data_folder/'exp1.pt',
#     grad_clipping = False,
#     max_norm = 0,
#     patience = 0 ,
#     early_stopping = False,
#     weight_decay = 0,
#     scheduler_factor = 0,
#     scheduler_patience = 0,
#    )
   

# device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

##HyperParameter - Early Stopping

In [146]:
hyperparameters_1= dict(
    epochs = 10,
    output_dim = 10, 
    batch_size = 64,
    learning_rate = 0.005,
    dataset="CIFAR10",
    architecture="CNN",
    log_interval = 100,
    log_batch = True,
    file_model = data_folder/'exp1.pt',
    grad_clipping = False,
    max_norm = 0,
    patience = 0 ,
    early_stopping = True,
    weight_decay = 0,
    scheduler_factor = 0,
    scheduler_patience = 0,
   )
   

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Intialize wandb

In [147]:
wandb.init(name = 'FullData-EarlStopping', project = 'CNN_Experiment_Neetika', config = hyperparameters_1)

VBox(children=(Label(value=' 0.01MB of 0.01MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
Train Batch Acc :,▁▄▄▃▄▄▄▄▆▆▆▇▆▇▇▇▇▇▆▆▇▇▇▆▆▇▇▇▇▇█▆▇██▇███▇
Train Batch Loss :,█▆▅▆▅▅▅▅▄▄▄▃▃▂▂▂▂▃▂▃▂▂▂▂▂▂▂▂▂▂▁▂▁▂▁▂▁▁▁▂
Train epoch Acc :,▁▄▅▆▇▇▇███
Train epoch Loss :,█▅▄▃▂▂▂▁▁▁
Valid Batch Accuracy :,▁▃▃▃▄▄▇▅▆▄█▃█▆▂
Valid Batch Loss :,█▆▆▅▄▄▃▄▃▅▁▆▁▃█
Valid epoch Acc :,▁▄▅▆▇████▇
Valid epoch Loss :,█▅▄▂▂▁▁▁▁▂

0,1
Train Batch Acc :,0.90625
Train Batch Loss :,0.25918
Train epoch Acc :,0.95792
Train epoch Loss :,0.12502
Valid Batch Accuracy :,0.75
Valid Batch Loss :,1.02043
Valid epoch Acc :,0.8444
Valid epoch Loss :,0.54039


In [149]:
wandb.config.device = device
print(wandb.config.device )

cuda:0


## Specify Dataloader, Loss_function, Model, Optimizer, Weight Initialization

In [153]:
# Fix seed value
SEED = 2345
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

# Data Loader
train_loader = torch.utils.data.DataLoader(trainset, batch_size=wandb.config.batch_size, shuffle = True)
valid_loader = torch.utils.data.DataLoader(validset, batch_size=wandb.config.batch_size, shuffle = False)
test_loader = torch.utils.data.DataLoader(testset, batch_size=wandb.config.batch_size,   shuffle = False)

# cross entropy loss function
loss_function = nn.CrossEntropyLoss()

# device 
model = CIFAR10CNNX()

def init_weights(m):
  if type(m) == nn.Conv2d:
        torch.nn.init.kaiming_normal_(m.weight)
        torch.nn.init.zeros_(m.bias)

        
# apply initialization recursively  to all modules
# model.apply(init_weights)

wandb.config.init_weights = init_weights

# put model to GPUs
model.to(wandb.config.device)

# Intialize stochiastic gradient descent optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay, momentum = 0.9)
# optimizer = torch.optim.Adam(model.parameters(), lr = wandb.config.learning_rate, weight_decay=wandb.config.weight_decay)
# optimizer = torch.optim.RMSprop(model.parameters(), lr=wandb.config.learning_rate,weight_decay=wandb.config.weight_decay, momentum=0.9)

wandb.config.optimizer = optimizer

# scheduler = ReduceLROnPlateau(optimizer, mode='min', factor= wandb.config.scheduler_factor, 
                              # patience=wandb.config.scheduler_patience, verbose=True)

scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.1, total_steps=len(train_loader) * 10 , epochs=10, three_phase=True)

# scheduler = StepLR(optimizer, gamma=0.4,step_size=1, verbose=True)

## Training and Saving Model - 84% Accuracy

In [154]:
wandb.watch(model, log = 'all', log_freq=25, log_graph=True)


[34m[1mwandb[0m: logging graph, to disable use `wandb.watch(log_graph=False)`


[<wandb.wandb_torch.TorchGraph at 0x7f9f29b56790>]

In [145]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Validation loss has decreased (inf --> 0.908103). Saving Model...
Epoch : 1 / 10
Time to complete 1 is 0:03:52.850086
Train Loss:  1.2315 | Train Accuracy:  55.3900%
Valid Loss:  0.9081 | Valid Accuracy:  68.5400%

Validation loss has decreased (0.908103 --> 0.734171). Saving model...
Epoch : 2 / 10
Time to complete 2 is 0:03:53.064433
Train Loss:  0.7511 | Train Accuracy:  73.6775%
Valid Loss:  0.7342 | Valid Accuracy:  74.9300%

Validation loss has decreased (0.734171 --> 0.631712). Saving model...
Epoch : 3 / 10
Time to complete 3 is 0:03:52.928536
Train Loss:  0.5633 | Train Accuracy:  80.4925%
Valid Loss:  0.6317 | Valid Accuracy:  77.7900%

Validation loss has decreased (0.631712 --> 0.551762). Saving model...
Epoch : 4 / 10
Time to complete 4 is 0:03:54.850750
Train Loss:  0.4493 | Train Accuracy:  84.3725%
Valid Loss:  0.5518 | Valid Accuracy:  81.5400%

Validation loss has decreased (0.551762 --> 0.491793). Saving model...
Epoch : 5 / 10
Time to complete 5 is 0:03:52.109164
Tr

#**Training and Saving Model - 85%**



In [155]:
example_ct_train, batch_ct_train, example_ct_valid, batch_ct_valid = 0, 0, 0, 0
train_loss_history, train_acc_history, valid_loss_history, valid_acc_history = train_loop(train_loader, valid_loader, model, loss_function, optimizer, 
                                                                                          wandb.config.epochs, wandb.config.device,
                                                                                          wandb.config.patience, wandb.config.early_stopping,
                                                                                          wandb.config.file_model)

Validation loss has decreased (inf --> 0.908103). Saving Model...
Epoch : 1 / 10
Time to complete 1 is 0:03:52.130130
Train Loss:  1.2315 | Train Accuracy:  55.3900%
Valid Loss:  0.9081 | Valid Accuracy:  68.5400%

Validation loss has decreased (0.908103 --> 0.734171). Saving model...
Epoch : 2 / 10
Time to complete 2 is 0:03:55.851265
Train Loss:  0.7511 | Train Accuracy:  73.6775%
Valid Loss:  0.7342 | Valid Accuracy:  74.9300%

Validation loss has decreased (0.734171 --> 0.631712). Saving model...
Epoch : 3 / 10
Time to complete 3 is 0:03:52.492906
Train Loss:  0.5633 | Train Accuracy:  80.4925%
Valid Loss:  0.6317 | Valid Accuracy:  77.7900%

Validation loss has decreased (0.631712 --> 0.551762). Saving model...
Epoch : 4 / 10
Time to complete 4 is 0:03:55.306486
Train Loss:  0.4493 | Train Accuracy:  84.3725%
Valid Loss:  0.5518 | Valid Accuracy:  81.5400%

Validation loss has decreased (0.551762 --> 0.491793). Saving model...
Epoch : 5 / 10
Time to complete 5 is 0:03:52.908584
Tr