### Objective:
To classify images in the Caltech-256 dataset, which is an improvement over Caltech-101 dataset using a Convolutional Neural Network.

### Problem Statement
To build and implement a Convolutional Neural Network model to classify images in the Caltech-256 dataset.

At the end of this competition, you will be able to:

* Load and extract features of images available in the Caltech-256 dataset using ImageDataGenerator

* Build convolutional neural networks using either Keras or PyTorch deep learning libraries

* Use the pre-trained models using either Keras or PyTorch deep learning libraries

### Description:
Caltech-256 is an object recognition dataset containing approximately 30,000 real-world images, of different sizes, spanning 256 classes (256 object classes and an additional clutter class). Each class is represented by at least 80 images. The dataset is a superset of the Caltech-101 dataset.

Here is a handy link to Kaggle's competition documentation (https://www.kaggle.com/docs/competitions), which includes, among other things, instructions on submitting predictions (https://www.kaggle.com/docs/competitions#making-a-submission).

### Instructions for downloading train and test data are as follows:

### 1. Create an API key in Kaggle.

To do this, go to the competition site on Kaggle at https://www.kaggle.com/t/185418aa7ed24db3b98ed851a4db2b41 and click on user then click on your profile as shown below. Click Account.

![alt text](https://cdn.iisc.talentsprint.com/DLFA/Experiment_related_data/Capture-NLP.PNG)



### 2. Next, scroll down to the API access section and click on **Create New Token** to download an API key (kaggle.json).

![alt text](https://cdn.iisc.talentsprint.com/DLFA/Experiment_related_data/Capture-NLP_1.PNG)

### 3. Upload your kaggle.json file using the following snippet in a code cell:



In [1]:
from google.colab import files
files.upload()

Saving kaggle.json to kaggle (1).json


{'kaggle.json': b'{"username":"abhaykumardnnai","key":"36e0389fab17e1404c7fa2007ae4578b"}'}

In [2]:
#If successfully uploaded in the above step, the 'ls' command here should display the kaggle.json file.
%ls

 Caltech_256_Train.zip                     'kaggle (1).json'   kaggle.json
 classification-of-caltech-256-images.zip   [0m[01;34mKaggle2Test[0m/       [01;34msample_data[0m/


### 4. Install the Kaggle API using the following command


In [3]:
# !pip install -U -q kaggle==1.5.8

### 5. Move the kaggle.json file into ~/.kaggle, which is where the API client expects your token to be located:



In [4]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/

In [5]:
#Execute the following command to verify whether the kaggle.json is stored in the appropriate location: ~/.kaggle/kaggle.json
!ls ~/.kaggle

kaggle.json


In [6]:
!chmod 600 /root/.kaggle/kaggle.json #run this command to ensure your Kaggle API token is secure on colab

### 6. Now download the Test Data from Kaggle

**NOTE: If you get a '403 - Not Found' error after running the cell below, it is most likely that the user (whose kaggle.json is uploaded above) has not 'accepted' the rules of the competition and therefore has 'not joined' the competition.**

If you encounter **401-unauthorised** download latest **kaggle.json** by repeating steps 1 & 2

In [7]:
#If you get a forbidden link, you have most likely not joined the competition.
!kaggle competitions download -c classification-of-caltech-256-images

classification-of-caltech-256-images.zip: Skipping, found more recently modified local copy (use --force to force download)


In [8]:
!mkdir /content/Kaggle2Test
!unzip classification-of-caltech-256-images -d /content/Kaggle2Test/

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: /content/Kaggle2Test/test/4759.jpg  
  inflating: /content/Kaggle2Test/test/476.jpg  
  inflating: /content/Kaggle2Test/test/4760.jpg  
  inflating: /content/Kaggle2Test/test/4761.jpg  
  inflating: /content/Kaggle2Test/test/4762.jpg  
  inflating: /content/Kaggle2Test/test/4763.jpg  
  inflating: /content/Kaggle2Test/test/4764.jpg  
  inflating: /content/Kaggle2Test/test/4765.jpg  
  inflating: /content/Kaggle2Test/test/4766.jpg  
  inflating: /content/Kaggle2Test/test/4767.jpg  
  inflating: /content/Kaggle2Test/test/4768.jpg  
  inflating: /content/Kaggle2Test/test/4769.jpg  
  inflating: /content/Kaggle2Test/test/477.jpg  
  inflating: /content/Kaggle2Test/test/4770.jpg  
  inflating: /content/Kaggle2Test/test/4771.jpg  
  inflating: /content/Kaggle2Test/test/4772.jpg  
  inflating: /content/Kaggle2Test/test/4773.jpg  
  inflating: /content/Kaggle2Test/test/4774.jpg  
  inflating: /content/Kaggle2Test/tes

In [9]:
# !mkdir Kaggle2Test/test
# !mv test/* Kaggle2Test/test/

### 7. Download the Train Data

In [17]:
%%capture
!wget https://cdn.iisc.talentsprint.com/DLFA/Experiment_related_data/Caltech_256_Train.zip

!unzip "Caltech_256_Train.zip"

## Grading = 10 Marks

## YOUR CODING STARTS FROM HERE

### Import Required packages

In [1]:
# import the libraries
import numpy as np
import pandas as pd
import os,shutil,glob,PIL
import pathlib
from PIL import Image
import torch
import torch.nn as nn
import torch.nn.init as init
from torch.utils.data import DataLoader
from torchvision import transforms, datasets, utils
import matplotlib.pyplot as plt
import torchvision.models as models
import torch.optim as optim
from tqdm import tqdm
from collections import defaultdict
from collections import Counter
import torch.nn.functional as F

### **Stage 1:** Data Loading and preprocessing of Images (3 points)

#### Analyze the shape of images and distribution of classes

**The below two cells were run with transform without normalization to get appropriate mean and standard deviation.**

In [2]:
# mean = 0.0
# for img, _ in train_data:
#     mean += img.mean([1,2])
# mean = mean/len(train_data)
# print(mean)

In [3]:
# sumel = 0.0
# countel = 0
# for img, _ in train_data:
#     img = (img - mean.unsqueeze(1).unsqueeze(1))**2
#     sumel += img.sum([1, 2])
#     countel += torch.numel(img[0])
# std = torch.sqrt(sumel/countel)
# print(std)

In [4]:
# Normalize with mean and std
# train_transform = transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize((0.4839, 0.4528, 0.3962), (0.2702, 0.2655, 0.2745))])
# train_transform = transforms.Compose([transforms.Resize((256, 256)), transforms.ToTensor()])
# transform = transforms.Compose([transforms.Resize((256, 256)), transforms.ToTensor(), transforms.Normalize((0.5503, 0.5315, 0.5028), (0.3162, 0.3125, 0.3263))])

In [5]:
# Define train and test transforms for data preprocessing and image augmentation
train_transform = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4839, 0.4528, 0.3962), (0.2702, 0.2655, 0.2745))])

test_transform = transforms.Compose([
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize((0.4839, 0.4528, 0.3962), (0.2702, 0.2655, 0.2745))])

**We will be adjusting these values 224 224 after calculating mean and std dev. later on.**

In [6]:
# Loading the train set file
train_data_folder = "/content/Caltech_256_Train" # Train directory for loading images
total_dataset = datasets.ImageFolder(root=train_data_folder, transform=train_transform)
# train_data = datasets.ImageFolder(root=train_data_folder)

In [7]:
# splitting between train and Validation set
train_size = int(0.9 * len(total_dataset))  # 90% for training
val_size = len(total_dataset) - train_size  # Remaining 10% for validation
train_data, val_data = torch.utils.data.random_split(total_dataset, [train_size, val_size])

In [8]:
#defiining train_batch_size
train_batch_size = 32

In [9]:
# Create data loaders for training and validation
train_loader = torch.utils.data.DataLoader(train_data, batch_size=train_batch_size,shuffle=True)
val_loader = torch.utils.data.DataLoader(val_data, batch_size=train_batch_size,shuffle=False)

In [10]:
# Loading the test set file
test_data_folder = "/content/Kaggle2Test" # Test directory for loading images
test_data = datasets.ImageFolder(root=test_data_folder, transform=test_transform)

In [11]:
test_data.classes

['test']

In [12]:
# # Initializing batch size
# batch_size = 64

# # Loading the train dataset
# train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)

In [13]:
# Generate a batches of images and labels
train_images, train_labels = next(iter(train_loader))
train_images.shape, train_labels.shape

(torch.Size([32, 3, 224, 224]), torch.Size([32]))

In [14]:
# labels Translator
label_names = {v: k for k, v in total_dataset.class_to_idx.items()}
# label_names

In [15]:
# Create a grid of images along with their corresponding labels
# L = 3
# W = 3

# fig, axes = plt.subplots(L, W, figsize = (10, 10))
# axes = axes.reshape(-1)

# for i in np.arange(0, L*W):
#     axes[i].imshow(train_images[i].permute(1, 2, 0))
#     axes[i].set_title(label_names[train_labels[i].item()])
#     axes[i].axis('on')

# plt.tight_layout()

In [16]:
# mean = 0.0
# for img, _ in train_data:
#     mean += img.mean([1,2])
# mean = mean/len(train_data)
# print(mean)

In [17]:
# sumel = 0.0
# countel = 0
# for img, _ in train_data:
#     img = (img - mean.unsqueeze(1).unsqueeze(1))**2
#     sumel += img.sum([1, 2])
#     countel += torch.numel(img[0])
# std = torch.sqrt(sumel/countel)
# print(std)

In [None]:
# sum = 0
# for label in train_data.classes:
#     num_img = len(train_data.targets[train_data.targets == train_data.class_to_idx[label]])
#     print (num_img)
# print (sum)

num_classes = len(total_dataset.classes)
dataset_size = len(total_dataset)
classes = total_dataset.classes
img_dict = {}
for i in range(num_classes):
    img_dict[classes[i]] = 0

for i in range(dataset_size):
    img, label = total_dataset[i]
    img_dict[classes[label]] += 1

# img_dict

**Our observation is all those minority classes we need to go for augmenting new samples.**

In [None]:
# No of Categories
print(len(total_dataset.classes))

In [None]:
# Number of training samples
print(len(total_dataset))

In [None]:
# Size of one training image
print(train_data[0][0].size(), val_data[0][0].size())

In [None]:
max(img_dict,key=img_dict.get)

In [None]:
# print(img_dict.values())
# total = sum(img_dict.values())

In [None]:
# To test whether GPU instance is present in the system of not.
use_cuda = torch.cuda.is_available()
print('Using PyTorch version:', torch.__version__, 'CUDA:', use_cuda)

In [None]:
device = torch.device("cuda" if use_cuda else "cpu")
device

In [None]:
model_ft = models.googlenet(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs,num_classes)
#model.softmax = nn.softmax()
model_ft = model_ft.to(device)
print(model_ft)

In [None]:
class CaltechCNN(nn.Module):
    def __init__(self, num_classes):
      super(CaltechCNN, self).__init__()
      self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
      self.bn1 = nn.BatchNorm2d(64)
      self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
      self.bn2 = nn.BatchNorm2d(128)
      self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
      self.bn3 = nn.BatchNorm2d(256)
      self.pool = nn.MaxPool2d(2, 2)
      self.skip_conv1 = nn.Conv2d(64, 128, kernel_size=1, stride=1)
      self.skip_conv2 = nn.Conv2d(128, 256, kernel_size=1, stride=1)
      self.fc1 = nn.Linear(256 * 8 * 8, 512)
      self.fc2 = nn.Linear(512, num_classes)

    def forward(self, x):
      x = self.conv1(x)
      x = self.bn1(x)
      x = F.relu(x)
      x = self.pool(x)
      skip = self.skip_conv1(x)
      x = self.conv2(x)
      x = self.bn2(x)
      x = F.relu(x)
      x = self.pool(x)
      skip = self.skip_conv2(skip)
      x += F.interpolate(skip, size=x.size()[2:], mode='bilinear', align_corners=False)
      x = F.relu(x)
      x = self.conv3(x)
      x = self.bn3(x)
      x = F.relu(x)
      x = self.pool(x)
      x = x.view(-1, 256 * 8 * 8)
      x = F.relu(self.fc1(x))
      x = self.fc2(x)
      return x


In [None]:
torch.cuda.empty_cache()

In [None]:
model_ft = CaltechCNN(num_classes=num_classes)
model_ft = model_ft.to(device)
print(model_ft)

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_ft.parameters(), lr=0.001)

In [None]:
epsilon = 0.005  # Perturbation strength for FGSM

In [None]:
train_accu = []     # Empty list for saving train accuracy
train_losses = []   # Empty list for saving train losses
val_accu = []

In [None]:
def train(epoch):
  print('\nEpoch : %d'%epoch)

  model_ft.train()    # Initiate the model in training mode

  running_loss = 0
  correct = 0
  total = 0

  for data in tqdm(train_loader):

    inputs,labels=data[0].to(device),data[1].to(device)   # Loading the input tensors into CUDA GPU
    # Generate adversarial examples using FGSM
    # inputs.requires_grad = True

    optimizer.zero_grad()
    outputs = model_ft(inputs)
    loss = criterion(outputs,labels)  # Calculating the loss
    loss.backward()                   # Back Propagation for calculaing gradients and adjusting weights

    # Create perturbations based on the sign of gradients
    # inputs_grad = inputs.grad.data
    # perturbed_inputs = torch.clamp(inputs + epsilon * torch.sign(inputs_grad), 0, 1)
    # perturbed_inputs = perturbed_inputs.detach()

    # # Perform forward pass with perturbed inputs
    # outputs = model_ft(perturbed_inputs)
    # loss = criterion(outputs, labels)
    # loss.backward()

    optimizer.step()

    running_loss += loss.item()

    _, predicted = outputs.max(1)
    total += labels.size(0)
    correct += predicted.eq(labels).sum().item()




  train_loss = running_loss/len(train_loader)     # Calculating the mean of training loss
  accu = 100.*correct/total                       # Calculating the accuracy

  train_accu.append(accu)
  train_losses.append(train_loss)
  print('Train Loss: %.3f | Accuracy: %.3f'%(train_loss,accu))

In [None]:
  # Validation
def eval_val(epoch):
    model_ft.eval()
    wrong_predictions = []
    val_loss = 0
    correct = 0
    total = 0

    with torch.no_grad():
      for i, data in enumerate(val_loader, 0):
        inputs, labels = data[0].to(device), data[1].to(device)
        outputs = model_ft(inputs)
        loss = criterion(outputs, labels)
        # Get predicted labels
        _, predicted = torch.max(outputs.data, 1)

        # Find wrong predictions
        for j in range(len(predicted)):
          if predicted[j] != labels[j]:
              wrong_predictions.append({
                  'input': inputs[j],
                  'predicted_label': predicted[j].item(),
                  'true_label': labels[j].item()
              })
        # print(val_loss)
        val_loss += loss.item()
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()


    val_batch_loss = val_loss/len(val_loader)
    val_batch_accu = 100.*correct/total                       # Calculating the accuracy
    val_accu.append(val_batch_accu)
    print('Val Loss: %.3f| Accuracy: %.3f' %(val_batch_loss,val_batch_accu))

    print('Top 5 Wrong Predictions in Validation Set:')
    counter = Counter([prediction['true_label'] for prediction in wrong_predictions])
    top_wrong_predictions = counter.most_common(5)

    for true_label, count in top_wrong_predictions:
        print(f'True Label: {true_label}, Count: {count}')
        filtered_predictions = [prediction for prediction in wrong_predictions
                                if prediction['true_label'] == true_label][:5]
        for prediction in filtered_predictions:
            input_image = prediction['input']
            predicted_label = prediction['predicted_label']
            true_label = prediction['true_label']
            print(f'Predicted Label: {predicted_label}, True Label: {true_label}')


    # # Print details of wrong predictions
    # print('Wrong Predictions in Validation Set:')
    # for prediction in wrong_predictions:
    #     input_image = prediction['input']
    #     predicted_label = prediction['predicted_label']
    #     true_label = prediction['true_label']
    #     print(f'Predicted Label: {predicted_label}, True Label: {true_label}')
    #     # Additional code to visualize or process the input image if needed


In [None]:
epochs = 20 # 20 run modify it later on
for epoch in range(1, epochs+1):
  train(epoch)
  eval_val(epoch)

In [None]:
epochs = 12 # 20 run moidfy it later on
for epoch in range(1, epochs+1):
  train(epoch)
  eval_val(epoch)

In [None]:
epochs = 8 # 20 run modify it later on
for epoch in range(1, epochs+1):
  train(epoch)
  eval_val(epoch)

Validatation data sets as well - todo

Hyper Parameters = epcoh, learning rate,

In [None]:
PATH = '/content/CalTech-256-GoogleNet_v30_adv.pth'
torch.save(model_ft.state_dict(), PATH)

In [None]:
plt.plot(train_accu,'-o')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.legend(['Train'])
plt.show()

In [None]:
plt.plot(train_losses,'-o')
plt.xlabel('epoch')
plt.ylabel('Loss')
plt.legend(['Train'])
plt.title('Train Loss')
plt.show()

In [None]:
PATH='/content/CalTech-256-GoogleNet_v30_adv.pth'
model_ft.load_state_dict(torch.load(PATH))

In [None]:
model_ft.eval()

In [None]:
# Loading the train dataset
test_batch_size = 1
test_loader = torch.utils.data.DataLoader(test_data, batch_size=test_batch_size)

In [None]:
predictions = []
for i,data in enumerate(test_loader):
  input,_ = data
  output = model_ft(input.to(device))
  # print(output)
  pred = torch.max(output,dim=1)
  # print(pred)
  predictions.append(total_dataset.classes[pred.indices])

In [None]:
# print([x[0].split("/",3)[3] for x in list(test_loader.dataset.imgs)])

In [None]:
results = pd.DataFrame()
results['img_path'] = pd.Series([x[0].split("/",3)[3] for x in list(test_loader.dataset.imgs)])
results['label'] = pd.Series(predictions)

In [None]:
results.to_csv("Group11_Kaggle2_Submission_v30_adv.csv")

In [None]:
results.head(5)

In [None]:
# results.head(5)

In [None]:
from google.colab import files
# files.download('Group4_Pred_Submission_v75_lC_withques.csv')
files.download('Group11_Kaggle2_Submission_v30_adv.csv')
files.download('/content/CalTech-256-GoogleNet_v30_adv.pth')

In [None]:
# submit the file to kaggle
# !kaggle competitions submit classification-of-caltech-256-images -f Group11_Kaggle2_Submission_v4_adv.csv -m "Model"

#### Visualize the sample images of each class


### **Stage 2:** Build and train the CNN model using Keras/Pytorch (5 points)

You can train the CNN model and Pre-trained model and then compare the model performance on the kaggle testset


### Transfer learning

Transfer learning consists of taking features learned on one problem, and leveraging them on a new, similar problem.

A pre-trained model is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task.

The intuition behind transfer learning for image classification is that if a model is trained on a large and general enough dataset, this model will effectively serve as a generic model of the visual world. You can then take advantage of these learned feature maps without having to start from scratch by training a large model on a large dataset.



#### Use the pre-trained models

* Load the pre-trained model
* Train and evaluate the images

In [None]:
# YOUR CODE HERE

###   **Stage 3**: Evaluate the Model and get model predictions on the Kaggle testset (2 Points)









In [None]:
# YOUR CODE HERE

### Report Analysis

- Compare the accuracies for the Pre-trained vs CNN models
- What process was followed to tune the hyperparameters?
- Plot the confusion matrix in terms of the misclassifications