# 2.2 DL Studio
### Run CIFAR-10 Example
For Net, the overall accuracy of the network on the 10000 test images: 53 %
For Net2, the overall accuracy of the network on the 10000 test images: 67 %

Net2 outperforms Net, the primary reason for this improvement is that Net2 has an additional convolutional layer and features more channels in each layer, enhancing its ability to learn more complex patterns.

### Analyze Resolution Changes
CIFAR-10 images are typically of size 32x32 pixels and have 3 color channels (RGB), which is (batch_size, 3, 32, 32).

Let's use Net for analyzing resolution:
```self.conv1 = nn.Conv2d(3, 6, 5) (input channel, output channel, kernel size)```
Output -> The kernel size is 5x5 with a stride of 1 (default), so the spatial dimensions (height and width) of the image are reduced by 4 pixels. Hence, the output will be (batch_size, 6, 28, 28) (32-5+1)
```x = nn.MaxPool2d(2, 2)(F.relu(self.conv1(x)))```
Max pooling with a 2x2 kernel halves the spatial resolution. Output will be (batch_size, 6, 14, 14)
```self.conv2 = nn.Conv2d(6, 16, 5)```
Output: (batch_size, 16, 10, 10)
```x = nn.MaxPool2d(2, 2)(F.relu(self.conv2(x)))```
Output: (batch_size, 16, 5, 5)
```x = x.view(x.shape[0], -1)```
Output: (batch_size, 16*5*5 = 400)

And FC layers, which is easy, for example, fc1:
```self.fc1 = nn.Linear(16 * 5 * 5, 120)```
Output: (batch_size, 120)

### Consider Kernel Sizes and Padding
For Net, using a 5x5 kernel in Net is relatively large. A larger kernel can be beneficial for reducing the number of layers and capturing bigger features. However, it also results in more parameters, which increases the risk of overfitting. Typically, architectures like VGG and ResNet use 3x3 kernels, which are more efficient for deeper neural networks. We can also use a hybrid approach by combining different kernel sizes to capture multi-scale information.

# 3.2 Dataset Creation Instructions
To create a personal dataset by extracting a subset of the COCO dataset, we can utilize the provided code in the manual. However, to store the images separately for the training and validation sets, we introduce an additional list, images_to_process, which will temporarily store the image metadata.

Instead of saving the images directly in the for loop, we append the image metadata to this list. Afterward, we can split the images_to_process list into two separate lists—one for the training images and the other for validation images. Finally, we use two separate for loops to save the images accordingly into their respective directories.

### 1. 5x3 Images

##### Single instance
##### Multi instance (single object)
##### Multi instance (diff object)

### 2. 3x5 Tables

##### Train
##### Validation

# 3.3 Image Classification using CNNs– Training and Validation
### CNN for Dataset 1

##### Train loss plot
##### Confusion Matrix plot

### CNN for Dataset 2

##### Train loss plot
##### Confusion Matrix plot

### CNN for Dataset 3

##### Train loss plot
##### Confusion Matrix plot

### Table of Accuracies

### Observations from Conf matrix and table of accuracies

### Plot misclassified images

# Bonus

### Tain/confmatrices
### Observation


# Source Code

In [None]:
from PIL import Image
import os
from pycocotools.coco import COCO

# Set COCO dataset paths
data_dir = os.getcwd()
ann_file = os.path.join(data_dir, "annotations/instances_train2014.json")  # Use 2014 annotations
image_dir = os.path.join(data_dir, "train2014/train2014")  # Ensure correct image folder
output_dir = "output_datasets"

# Load COCO dataset
coco = COCO(ann_file)

# Ensure output directories exist
os.makedirs(output_dir, exist_ok=True)

def save_image(img_info, category_name, dataset_type, split):
    """Saves the extracted image into a structured output directory for training/validation."""
    img_path = os.path.join(image_dir, img_info['file_name'])
    save_dir = os.path.join(output_dir, dataset_type, split, category_name)
    os.makedirs(save_dir, exist_ok=True)

    # Load and resize image
    img = Image.open(img_path).resize((64, 64))
    img.save(os.path.join(save_dir, img_info['file_name']))

def extract_images(cat_names, min_instances=1, max_instances=1, multiple_categories=False, dataset_type="single_instance"):
    """
    Extracts images based on object count conditions for single-instance or multi-instance datasets.
    """
    cat_ids = coco.getCatIds(catNms=cat_names)
    img_ids = coco.getImgIds(catIds=cat_ids)
    extracted = 0  # Counter for extracted images

    # List to store valid images
    images_to_process = [] 

    for img_id in img_ids:
        img_info = coco.loadImgs(img_id)[0]
        ann_ids = coco.getAnnIds(imgIds=img_id, iscrowd=False)
        anns = coco.loadAnns(ann_ids)

        # Count objects per category
        obj_counts = {}
        for ann in anns:
            obj_category = coco.loadCats(ann['category_id'])[0]['name']
            obj_counts[obj_category] = obj_counts.get(obj_category, 0) + 1

        if multiple_categories:
            # Ensure multiple object categories are present
            if len(obj_counts) >= 2:
                images_to_process.append(img_info)
                extracted += 1
        elif dataset_type == "multi_instance_same":
            # Ensure multiple instances of the same object category
            if cat_names[0] in obj_counts and obj_counts[cat_names[0]] >= min_instances:
                images_to_process.append(img_info)
                extracted += 1
        else:
            # Ensure the number of instances falls within the desired range for single-instance
            if all(obj in obj_counts for obj in cat_names) and min_instances <= obj_counts[cat_names[0]] <= max_instances:
                images_to_process.append(img_info)
                extracted += 1

        if extracted >= 500:  # Limit to 500 images per dataset for testing (could be adjusted)
            break

    # Split the images into training and validation sets (400 training, 100 validation)
    train_images = images_to_process[:400]
    val_images = images_to_process[400:500]

    # Save training and validation images
    for img_info in train_images:
        save_image(img_info, cat_names[0], dataset_type, 'train')

    for img_info in val_images:
        save_image(img_info, cat_names[0], dataset_type, 'val')
