# CSE327 Homework 5
**Due date: 23:59 on Dec 10, 2024 (Tuesday)**

In this semester, we will use Google Colab for the assignments, which allows us to utilize resources that some of us might not have in their local machines such as GPUs. You will need to use your Stony Brook (*.stonybrook.edu) account for coding and Google Drive to save your results.

## Google Colab Tutorial
---
Go to https://colab.research.google.com/notebooks/, you will see a tutorial named "Welcome to Colaboratory" file, where you can learn the basics of using google colab.

Settings used for assignments: ***Edit -> Notebook Settings -> Runtime Type (Python 3)***.


## Description
---
This project is an introduction to deep learning tools for computer vision. You will design and train deep convolutional networks for scene recognition using [PyTorch](http://pytorch.org). You can visualize the
structure of the network with [mNeuron] (http://vision03.csail.mit.edu/cnn_art/index.html)

Remember Homework 3: Scene recognition with bag of words. You worked hard to design a bag of features representations that achieved 60% to 70% accuracy (most likely) on 16-way scene classification. We're going to attack the same task with deep learning and get higher accuracy. Training from scratch won't work quite as well as homework 3 due to the insufficient amount of data, fine-tuning an existing network will work much better than homework 3.

In Problem 1 of the project you will train a deep convolutional network from scratch to recognize scenes. The starter codes gives you methods to load data and display them. You will need to define a simple network architecture and add jittering, normalization, and regularization to increase recognition accuracy to 50, 60, or perhaps 70%. Unfortunately, we only have 2,400 training examples so it doesn't seem possible to train a network from scratch which outperforms hand-crafted features

For Problem 2 you will instead fine-tune a pre-trained deep network to achieve about 85% accuracy on the task. We will use the pretrained AlexNet network which was not trained to recognize scenes at all.


These two approaches represent the most common approaches to recognition problems in computer vision today -- train a deep network from scratch if you have enough data (it's not always obvious whether or not you do), and if you cannot then instead fine-tune a pre-trained network.

For Problem 3 you will train two networks for object detection, the expected performance for Faster R-CNN will be around 0.13(mAP50) and for YOLO it will be around 0.682(mAP50)

There are 3 problems in this homework with a total of 110 points. Be sure to read **Submission Guidelines** below. They are important. For the problems requiring text descriptions, you might want to add a markdown block for that.

## Dataset
---
Save the [dataset(click me)](https://drive.google.com/open?id=1NWC3TMsXSWN2TeoYMCjhf2N1b-WRDh-M) into your working folder in your Google Drive for this homework. <br>
Under your root folder, there should be a folder named "data" (i.e. XXX/Surname_Givenname_SBUID/data) containing the images.
**Do not upload** the data subfolder before submitting on blackboard due to size limit. There should be only one .ipynb file under your root folder Surname_Givenname_SBUID.

## Some Tutorials (PyTorch)
---
- You will be using PyTorch for deep learning toolbox (follow the [link](http://pytorch.org) for installation).
- For PyTorch beginners, please read this [tutorial](http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) before doing your homework.
- Feel free to study more tutorials at http://pytorch.org/tutorials/.
- Find cool visualization here at http://playground.tensorflow.org.


## Starter Code
---
In the starter code, you are provided with a function that loads data into minibatches for training and testing in PyTorch.

In [1]:
# import packages here
import cv2
import numpy as np
import matplotlib.pyplot as plt
import glob
import random
import time
import gc

import torch
import torchvision
import torchvision.transforms as transforms

from torch.autograd import Variable
from torch import optim
import torch.nn as nn
import torch.nn.functional as F

from torch.utils.data import Dataset
import pandas as pd
from torch.utils.data import DataLoader
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from tqdm import tqdm
from torch.optim import lr_scheduler
from torchvision import models

from sklearn.svm import LinearSVC
from sklearn.multiclass import OneVsRestClassifier as OVR
from sklearn.preprocessing import StandardScaler

In [None]:
# Mount your google drive where you've saved your assignment folder
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
# Set your working directory (in your google drive)
#   change it to your specific homework directory.
%cd '/content/gdrive/My Drive/CSE327/Abid_Khandaker_115478345_hw5'

In [None]:
# ==========================================
#    Load Training Data and Testing Data
# ==========================================
class_names = [name[13:] for name in glob.glob('./data/train/*')]
class_names = dict(zip(range(len(class_names)), class_names))
print("class_names: %s " % class_names)
n_train_samples = 150
n_test_samples = 50

def img_norm(img):
  #
  # Write your code here
  # normalize img pixels to [-1, 1]
  #
  norm_img = cv2.normalize(img.copy(), None, alpha=-1, beta=1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
  return norm_img

def load_dataset(path, img_size, num_per_class=-1, batch_num=1, shuffle=False, augment=False, is_color=False,
                rotate_90=False, zero_centered=False):

    data = []
    labels = []

    if is_color:
        channel_num = 3
    else:
        channel_num = 1

    # read images and resizing
    for id, class_name in class_names.items():
        print("Loading images from class: %s" % id)
        img_path_class = glob.glob(path + class_name + '/*.jpg')
        if num_per_class > 0:
            img_path_class = img_path_class[:num_per_class]
        labels.extend([id]*len(img_path_class))
        for filename in img_path_class:
            if is_color:
                img = cv2.imread(filename)
            else:
                img = cv2.imread(filename, 0)

            # resize the image
            img = cv2.resize(img, img_size, cv2.INTER_LINEAR)

            if is_color:
                img = np.transpose(img, [2, 0, 1])

            # norm pixel values to [-1, 1]
            data.append(img_norm(img))

    randcrop = transforms.Compose([
        transforms.ToTensor(),
        transforms.RandomResizedCrop(size=64,scale=(.7,1),ratio=(1,1))
    ])

    #
    # Write your Data Augmentation code here
    # mirroring
    #
    for i in range(len(data)):
      img = data[i].copy()
      image1 = cv2.flip(img, 1)
      data.append(image1)
      labels.extend([labels[i]]*1)

    #
    # Write your Data Normalization code here
    # norm data to zero-centered
    #
    for i in range(len(data)):
      img = data[i].copy()
      data[i] = data[i] - np.mean(img)

    # randomly permute (this step is important for training)
    if shuffle:
        bundle = list(zip(data, labels))
        random.shuffle(bundle)
        data, labels = zip(*bundle)

    # divide data into minibatches of TorchTensors
    if batch_num > 1:
        batch_data = []
        batch_labels = []

        print(len(data))
        print(batch_num)

        for i in range(int(len(data) / batch_num)):
            minibatch_d = data[i*batch_num: (i+1)*batch_num]
            minibatch_d = np.reshape(minibatch_d, (batch_num, channel_num, img_size[0], img_size[1]))
            batch_data.append(torch.from_numpy(minibatch_d))

            minibatch_l = labels[i*batch_num: (i+1)*batch_num]
            batch_labels.append(torch.LongTensor(minibatch_l))
        data, labels = batch_data, batch_labels

    return zip(batch_data, batch_labels)

In [None]:
# load data into size (64, 64)
img_size = (64, 64)
batch_num = 50 # training sample number per batch

# load training dataset
trainloader_small = list(load_dataset('./data/train/', img_size, batch_num=batch_num, shuffle=True,
                                      augment=True, zero_centered=True))
train_num = len(trainloader_small)
print("Finish loading %d minibatches(=%d) of training samples." % (train_num, batch_num))

# load testing dataset
testloader_small = list(load_dataset('./data/test/', img_size, num_per_class=50, batch_num=batch_num))
test_num = len(testloader_small)
print("Finish loading %d minibatches(=%d) of testing samples." % (test_num, batch_num))

In [None]:
# show some images
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    if len(npimg.shape) > 2:
        npimg = np.transpose(img, [1, 2, 0])
    plt.figure
    plt.imshow(npimg, 'gray')
    plt.show()
img, label = trainloader_small[0][0][11][0], trainloader_small[0][1][11]
label = int(np.array(label))
print(class_names[label])
imshow(img)

# Problem 1: Training a Network From Scratch
{Part 1: 20 points} Gone are the days of hand designed features. Now we have end-to-end learning in which a highly non-linear representation is learned for our data to maximize our objective (in this case, 16-way classification accuracy). Instead of 70% accuracy we can now recognize scenes with... 25% accuracy. OK, that didn't work at all. Try to boost the accuracy by doing the following:

**Data Augmentation**: We don't have enough training data, let's augment the training data.
If you left-right flip (mirror) an image of a scene, it never changes categories. A kitchen doesn't become a forest when mirrored. This isn't true in all domains — a "d" becomes a "b" when mirrored, so you can't "jitter" digit recognition training data in the same way. But we can synthetically increase our amount of training data by left-right mirroring training images during the learning process.

After you implement mirroring, you should notice that your training error doesn't drop as quickly. That's actually a good thing, because it means the network isn't overfitting to the 2,400 original training images as much (because it sees 4,800 training images now, although they're not as good as 4,800 truly independent samples). Because the training and test errors fall more slowly, you may need more training epochs or you may try modifying the learning rate. You should see a roughly 10% increase in accuracy by adding mirroring. You are **required** to implement mirroring as data augmentation for this part.

You can try more elaborate forms of jittering -- zooming in a random amount, rotating a random amount, taking a random crop, etc. These are not required, you might want to try these in the bonus part.

**Data Normalization**: The images aren't zero-centered. One simple trick which can help a lot is to subtract the mean from every image. It would arguably be more proper to only compute the mean from the training images (since the test/validation images should be strictly held out) but it won't make much of a difference. After doing this you should see another 15% or so increase in accuracy. This part is **required**.

**Network Regularization**: Add dropout layer. If you train your network (especially for more than the default 30 epochs) you'll see that the training error can decrease to zero while the val top1 error hovers at 40% to 50%. The network has learned weights which can perfectly recognize the training data, but those weights don't generalize to held out test data. The best regularization would be more training data but we don't have that. Instead we will use dropout regularization.

What does dropout regularization do? It randomly turns off network connections at training time to fight overfitting. This prevents a unit in one layer from relying too strongly on a single unit in the previous layer. Dropout regularization can be interpreted as simultaneously training many "thinned" versions of your network. At test, all connections are restored which is analogous to taking an average prediction over all of the "thinned" networks. You can see a more complete discussion of dropout regularization in this [paper](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf).

The dropout layer has only one free parameter — the dropout rate — the proportion of connections that are randomly deleted. The default of 0.5 should be fine. Insert a dropout layer between your convolutional layers. In particular, insert it directly before your last convolutional layer. Your test accuracy should increase by another 10%. Your train accuracy should decrease much more slowly. That's to be expected — you're making life much harder for the training algorithm by cutting out connections randomly.

If you increase the number of training epochs (and maybe decrease the learning rate) **you should be able to achieve around 50% test accuracy**. In this part, you are **required** to add dropout layer to your network.

Please give detailed descriptions of your network layout in the following format:<br>
Data augmentation: [descriptions]<br>
Data normalization: [descriptions]<br>
Layer 1: [layer_type]: [Parameters]<br>
Layer 2: [layer_type]: [Parameters]<br>
...<br>
Then report the final accuracy on test set and time consumed for training and testing separately.

{Part 2: 15 points} Try **three techniques** taught in the class to increase the accuracy of your model. Such as increasing training data by randomly rotating training images, adding batch normalization, different activation functions (e.g., sigmoid) and model architecture modification. Note that too many layers can do you no good due to insufficient training data. Clearly describe your method and accuracy increase/decrease for each of the three techniques.

In [24]:
class CNN(nn.Module):

  def __init__(self, classes: int, inner_channels, kernel_dim):
    super(CNN, self).__init__()
    self.conv1 = nn.Conv2d(in_channels=1, out_channels=inner_channels, kernel_size=kernel_dim, stride=1, padding =1)
    self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
    self.conv2 = nn.Conv2d(in_channels=inner_channels, out_channels=2*inner_channels, kernel_size=kernel_dim, stride=1, padding=1)
    self.fully_connected = nn.Linear(3*inner_channels*8*8, classes)
    self.norm1 = nn.BatchNorm2d(num_features=inner_channels)
    self.norm2 = nn.BatchNorm2d(num_features=2*inner_channels)
    self.norm3 = nn.BatchNorm2d(num_features = 3*inner_channels)
    self.dropout = nn.Dropout2d(p=.5)
    self.conv3 = nn.Conv2d(in_channels=2*inner_channels, out_channels=3*inner_channels, kernel_size=kernel_dim, stride=1, padding=1)

  def forward(self, input):
    y = F.tanh(self.conv1(input))
    y = self.pool(y)
    y = self.norm1(y)
    y = F.tanh(self.conv2(y))
    y = self.pool(y)
    y = self.norm2(y)
    y = self.dropout(y)
    y = F.tanh(self.conv3(y))
    y = self.pool(y)
    y = self.norm3(y)
    y = y.reshape(y.shape[0],-1)
    y = self.fully_connected(y)
    return y


In [None]:
# ==========================================
#       Define Network Architecture
# ==========================================

# Object that takes in an inner_channels variable n, in my instance n is 16
# Data Augmentation: flip every image horizontally using cv2 and add picture and label to dataset
# Data Normalization: subtract every image by the np.mean() of it
# Layer 1: Convolution: 1 in_channel -> n out_channels, 3X3 kernel size generally, 1 stride, 1 padding
# Layer 2: Pool: 2X2 kernel, 2 stride
# Layer 3: Batch Norm: n features
# Layer 4: Convolution: n in_channels -> 2n out_channels, 3x3 kernel size generally, 1 stride, 1 padding
# Layer 5: Pool: 2X2 kernel, 2 stride
# Layer 6: Batch Norm: 2n features
# Layer 7: Dropout: p=.5
# Layer 8: Convolution: 2n in_channels -> 3n out_channels, 3x3 kernel size generally, 1 stride, 1 padding
# Layer 9: Pool: 2X2 kernel, 2 stride
# Layer 10: Batch Norm: 3n features
# Layer 11: Fully Connected (Linear): 8*8*3n in_channels -> 16 out_channels (for 16 classes)

# Time taken with GPU: 33s for training, 0s for testing
# Accuracy before adjustments: 50.1250
# Accuracy after adjustments: 65.8750

In [25]:
# ==========================================
#         Optimize/Train Network
# ==========================================

power = "cuda" if torch.cuda.is_available() else "cpu"
model = CNN(classes=16, inner_channels=16, kernel_dim=3).to(power)
loss = nn.CrossEntropyLoss()
learning = .0008
opt = optim.Adam(model.parameters(), lr=learning)
epoch = 70
for i in range(epoch):
  for batch in trainloader_small:
    inputs = batch[0].to(power)
    labels = batch[1].to(power)
    pred = model(inputs)
    l = loss(pred, labels)
    opt.zero_grad()
    l.backward()
    opt.step()

In [None]:
# ==========================================
#            Evaluating Network
# ==========================================

def score(data, model):
  correct = 0
  n = 0
  model.eval()
  with torch.no_grad():
    for batch in data:
      inputs = batch[0].to(power)
      labels = batch[1].to(power)
      pred = model(inputs)
      _, c = pred.max(1)
      correct += (c==labels).sum()
      n += pred.size(0)
  model.train()
  return (correct/n * 100)

print(score(testloader_small, model))

## Part2
List each of the techniques you used and the performances after using these techniques
<br> Please clearly indicate the techniques you use in the text blocks

In [None]:
# ==========================================
#       Technique1
# ==========================================

# I decided to change the activation function from ReLU to tanh. I
# thought that it would be useful given this is image classification
# and it increased accuracy by 5%

In [None]:
# ==========================================
#       Technique2
# ==========================================

# I integrated Batch Normalization after each activation function and pooling cycle
# for the architecture cycle, and implemented it with PyTorch. I saw a net
# accuracy improvement of 4%

In [None]:
# ==========================================
#       Technique3
# ==========================================

# I improved my learning rate by setting it to .0008 instead of .001 and adding
# a third cycle of convolution and pooling (was originally 2), and noticed the
# accuracy improving well for taking some more time. I saw a good 6% increase.

# Problem 2: Fine Tuning a Pre-Trained Deep Network
{Part 1: 20 points} Our convolutional network to this point isn't "deep". Fortunately, the representations learned by deep convolutional networks is that they generalize surprisingly well to other recognition tasks.

But how do we use an existing deep network for a new recognition task? Take for instance,  [AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks) network has 1000 units in the final layer corresponding to 1000 ImageNet categories.

**Strategy A**: One could use those 1000 activations as a feature in place of a hand crafted feature such as a bag-of-features representation. You would train a classifier (typically a linear SVM) in that 1000 dimensional feature space. However, those activations are clearly very object specific and may not generalize well to new recognition tasks. It is generally better to use the activations in slightly earlier layers of the network, e.g. the 4096 activations in the last 2nd fully-connected layer. You can often get away with sub-sampling those 4096 activations considerably, e.g. taking only the first 200 activations.

**Strategy B**: *Fine-tune* an existing network. In this scenario you take an existing network, replace the final layer (or more) with random weights, and train the entire network again with images and ground truth labels for your recognition task. You are effectively treating the pre-trained deep network as a better initialization than the random weights used when training from scratch. When you don't have enough training data to train a complex network from scratch (e.g. with the 16 classes) this is an attractive option. Fine-tuning can work far better than Strategy A of taking the activations directly from an pre-trained CNN. For example, in [this paper](http://www.cc.gatech.edu/~hays/papers/deep_geo.pdf) from CVPR 2015, there wasn't enough data to train a deep network from scratch, but fine tuning led to 4 times higher accuracy than using off-the-shelf networks directly.

You are required to implement **Strategy B** to fine-tune a pre-trained **AlexNet** for this scene classification task. You should be able to achieve performance of 85% approximately. It takes roughly 35~40 minutes to train 20 epoches with AlexNet.

Please provide detailed descriptions of:<br>
(1) which layers of AlexNet have been replaced<br>
(2) the architecture of the new layers added including activation methods (same as problem 1)<br>
(3) the final accuracy on test set along with time consumption for both training and testing <br>

{Part 2: 15 points} Implement Strategy A where you use the activations of the pre-trained network as features to train one-vs-all SVMs for your scene classification task. Report the final accuracy on test set along with time consumption for both training and testing.


**Hints**:
- Many pre-trained models are available in PyTorch at [here](http://pytorch.org/docs/master/torchvision/models.html).
- For fine-tuning pretrained network using PyTorch, please read this [tutorial](http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html).

In [None]:
# reload data with a larger size
img_size = (224, 224)
batch_num = 50 # training sample number per batch

# load training dataset
trainloader_large = list(load_dataset('./data/train/', img_size, batch_num=batch_num, shuffle=True,
                                      augment=False, is_color=True, zero_centered=True))
train_num = len(trainloader_large)
print("Finish loading %d minibatches(=%d) of training samples." % (train_num, batch_num))

# load testing dataset
testloader_large = list(load_dataset('./data/test/', img_size, num_per_class=50, batch_num=batch_num, is_color=True))
test_num = len(testloader_large)
print("Finish loading %d minibatches(=%d) of testing samples." % (test_num, batch_num))

In [None]:
from torchvision import models
from torch.optim import lr_scheduler
# ==========================================
#       Fine-Tune Pretrained Network
# ==========================================
power = "cuda" if torch.cuda.is_available() else "cpu"
modelft = torch.hub.load('pytorch/vision:v0.10.0', 'alexnet', pretrained=True)
modelft.classifier[6] = nn.Linear(4096, 16)
for p in modelft.parameters():
  p.requires_grad = False
for p in modelft.classifier[4].parameters():
  p.requires_grad = True
modelft = modelft.to(power)
modelft.train()

'''
fullmodel = nn.Sequential(
    modelft,
    nn.Linear(64, 32),
    nn.ReLU(),
    nn.Linear(32, 16)
)
fullmodel = fullmodel.to(power)
'''

loss = nn.CrossEntropyLoss()
learning = .0008
opt = optim.SGD(modelft.parameters(), lr=.001, momentum=.9)
lr_sched = lr_scheduler.StepLR(opt, step_size=5, gamma=.1)
epoch = 20

for i in range(epoch):
  for batch in trainloader_large:
    inputs = batch[0].to(power)
    labels = batch[1].to(power)
    opt.zero_grad()
    pred = modelft(inputs)
    l = loss(pred, labels)
    l.backward()
    opt.step()
  lr_sched.step()

def score(data, m):
  correct = 0
  n = 0
  m.eval()
  with torch.no_grad():
    for batch in data:
      inputs = batch[0].to(power)
      labels = batch[1].to(power)
      pred = m(inputs)
      _, c = pred.max(1)
      correct += (c==labels).sum()
      n += pred.size(0)
  m.train()
  return (correct/n * 100)

print(score(testloader_large, modelft))

In [None]:
'''
Strategy B:
Layers Replaced: 1, the last fully connected layer
New Layers: NN.linear, 4096 in_features, 16 out_features at the end of the network
^the instructions were to finetune by replacing layers, so I didn't do much
as far as adding new ones. I hope that's ok, since it wasn't mentioned as a requirement.

Accuracy on test with strategy B: 84.6
Time consumed for training: 41s
Time consumed for testing: 1s
'''

In [None]:
# NOTE: This strategy is somewhat memory intensive, and there is a slight
# chance running this can deplete system memory or CUDA memory.
# Don't worry, I just restart the runtime only to test this cell and
# run the first few cells (and the trainloader_large cell) to get this cell to
# run, and it does. Please do that if the memory crashes.

torch.cuda.empty_cache()
gc.collect()

# Strategy A with alexnet

power = "cuda" if torch.cuda.is_available() else "cpu"
alex = torch.hub.load('pytorch/vision:v0.10.0', 'alexnet', pretrained=True)

ft_extract = nn.Sequential(
    alex.features,
    nn.AvgPool2d(1),
    nn.Flatten(),
    nn.Linear(256*6*6, 512),
    nn.ReLU(),
    nn.Linear(512, 200)
    #nn.Sequential(*list(alex.classifier.children())[:-1])
)
for p in ft_extract.parameters():
  p.requires_grad = False
ft_extract.to(power)

svm = OVR(LinearSVC(max_iter=2000, dual=False))
S = StandardScaler()

def ext(x):
  with torch.no_grad():
    ft = ft_extract(x)
  return ft

def fit(X, Y):
  X = X.to(power)
  Y = Y.to(power).cpu().numpy()
  X_ft = ext(X).cpu().numpy()
  X_norm = (X_ft-(np.mean(X_ft))) / np.std(X_ft)
  #X_ft = S.fit_transform(X_ft.cpu().numpy())
  svm.fit(X_norm, Y)

def pred(X):
  X = X.to(power)
  X_ft = ext(X).cpu().numpy()
  X_norm = (X_ft-(np.mean(X_ft))) / np.std(X_ft)
  #X_ft = StandardScaler.fit_transform(X_ft.cpu().numpy())
  return svm.predict(X_norm)

def parse(dataloader):
  img = torch.cat([x[0] for x in dataloader], dim=0)
  l = torch.cat([x[1] for x in dataloader], dim=0)
  return img,l

img, labels = parse(trainloader_large)
imgtest, labelstest = parse(testloader_large)
fit(img, labels)
prediction = pred(imgtest)

score = 0
for i in range(len(prediction)):
  if prediction[i] == labelstest[i]:
    score = score+1
score = (score/(len(prediction))) * 100
print(score)


In [None]:
'''
Strategy A:
Accuracy on test with strategy A: 68.2
Time Complexity for training (on GPU): 38s
Time Complexity for testing (on GPU): 1s
'''

## Object Detection

In this Part, students will focus on training models for object detection on Cryo-Electron Microscopy (Cryo-EM) data, widely used in structural biology for capturing high-resolution images of molecular complexes. The task involves detecting particles within noisy and low-contrast Cryo-EM images, presenting a challenge typical of this data type.

{Part 1: 20 points} Train Faster R-CNN network.

{Part 2: 20 points} Train any YOLO network




### Faster RCNN

In [9]:
## util function DONT CHANGE

import os

def get_split_paths(image_dir, annotation_dir):

    image_files = os.listdir(image_dir)
    image_paths = []
    annotation_paths = []

    for img_file in image_files:
        img_path = os.path.join(image_dir, img_file)
        annotation_file = os.path.splitext(img_file)[0] + ".csv"  # Match file name
        annotation_path = os.path.join(annotation_dir, annotation_file)

        if os.path.exists(annotation_path):
            image_paths.append(img_path)
            annotation_paths.append(annotation_path)
        else:
            print(f"Warning: No annotation found for {img_file}")

    return image_paths, annotation_paths



In [10]:

ANNOTATION_DIR = "CryoEM/particle_coordinates/"
TRAIN_IMAGE_DIR = "CryoEM/images/train/"
VAL_IMAGE_DIR = "CryoEM/images/val/"
TEST_IMAGE_DIR = "CryoEM/images/test/"


train_image_paths, train_annotation_paths = get_split_paths(TRAIN_IMAGE_DIR, ANNOTATION_DIR)
val_image_paths, val_annotation_paths = get_split_paths(VAL_IMAGE_DIR, ANNOTATION_DIR)
test_image_paths, test_annotation_paths = get_split_paths(TEST_IMAGE_DIR, ANNOTATION_DIR)


In [11]:
class ParticleDatasetFromCSV(Dataset):
    def __init__(self, image_paths, csv_paths):
        """
        Args:
            image_paths (list): List of image file paths.
            csv_paths (list): List of corresponding CSV file paths for annotations.
            transforms (callable, optional): A function/transform to apply to images and targets.
        """
        self.image_paths = image_paths
        self.csv_paths = csv_paths

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        # Load image
        img_path = self.image_paths[idx]
        img = cv2.imread(img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = torch.as_tensor(img, dtype=torch.float32).permute(2, 0, 1) / 255.0

        # Load particle data from CSV
        csv_path = self.csv_paths[idx]
        df = pd.read_csv(csv_path).to_numpy()
        ###TODO###
        # Calculate bounding boxes
        boxes = []
        for l in df:
          x = l[0]
          y = l[1]
          d = l[2]
          r = .5*d
          boxes.append(np.array([x-r,y-r,x+r,y+r]))
        ##########

        # Assign a single class label (1) for all boxes
        labels = torch.tensor(np.ones(len(boxes)), dtype=torch.int64)

        # Prepare target
        target = {"boxes": torch.tensor(np.array(boxes)), "labels": labels}

        return img, target


In [12]:
## DONT CHANGE

from torch.utils.data import DataLoader

# Train dataset
train_dataset = ParticleDatasetFromCSV(train_image_paths, train_annotation_paths)
# Validation dataset
val_dataset = ParticleDatasetFromCSV(val_image_paths, val_annotation_paths)
# Test dataset
test_dataset = ParticleDatasetFromCSV(test_image_paths, test_annotation_paths)

# Collate function
def collate_fn(batch):
    return tuple(zip(*batch))

# DataLoaders
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True, collate_fn=collate_fn)
val_loader = DataLoader(val_dataset, batch_size=2, shuffle=False, collate_fn=collate_fn)
test_loader = DataLoader(test_dataset, batch_size=2, shuffle=False, collate_fn=collate_fn)



In [None]:
# NOTE: This strategy is somewhat memory intensive, and there is a decent
# chance running this can deplete system memory or CUDA memory assuming you're using T4.
# Don't worry, I just restart the runtime only to test this cell and
# run only the first few cells (and all the cells in the faster rcnn block) to get
# this cell to run, and it does. Please do that if system or GPU memory runs out.
# With T4 this took around 11 minutes for me to run. I restarted runtime before doing this. It might take longer otherwise.

torch.cuda.empty_cache()
gc.collect()
os.environ["PYTORCH_CUDA_ALLOC_CONF"]="expandable_segments:True"

# Initialize the device
power = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

# Create your model
def create_model(num_classes):
    # Load a pre-trained Faster R-CNN model
    model = fasterrcnn_resnet50_fpn(pretrained=True)
    # Replace the classifier with a new one for our dataset
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    return model

def to_dev(x):
  y = {}
  for k, v in x.items():
      y[k] = v.to(power)
  return y

# Set number of classes: 1 class (particles) + background class
num_classes = 2  # Background + particle class

# Create the model and move it to device
model = create_model(num_classes)
model.to(power)
num_epochs = 20 #if T4 was faster or if i had more time, I'd make this 50

###TODO###
# Set up optimizer
p = [x for x in model.parameters() if x.requires_grad]
optimizer = optim.SGD(p, lr=.001, momentum=.9)
##########

loss = nn.CrossEntropyLoss()

###TODO###
# Set up learning rate scheduler
lr_sched = lr_scheduler.StepLR(optimizer, step_size=3, gamma=.1)
##########

# Training Loop
model.train()
for epoch in range(num_epochs):
################*****************************
    ###TODO###
    # Write training loop
    for i, data in enumerate(train_loader):
      input, label = data
      input = list(x.to(power) for x in input)
      label = [{k:(v.to(power)) for k,v in x.items()} for x in label]
      #input = (input[0].to(device), input[1].to(device))
      #label = (to_dev(label[0]), to_dev(label[1]))
      optimizer.zero_grad()
      losses = model(input, label)
      l = sum(l for l in losses.values())
      l.backward()
      optimizer.step()
    #########

    lr_sched.step()



##### Evaluate

Run the following two cells to see your model's performance on the test dataset.

In [None]:
!pip install torchmetrics ## Install torchmetrics for evaluation

In [None]:
# DONT CHANGE

import torch
from torchmetrics.detection.mean_ap import MeanAveragePrecision
from torchmetrics.functional.classification import precision_recall

# Assume `model` is your trained model, and `test_loader` is your DataLoader for the test set
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Initialize mean Average Precision (mAP) metric
map_metric = MeanAveragePrecision(iou_type="bbox", box_format="xyxy", iou_thresholds=[0.5, 0.75])

# Switch model to evaluation mode
model.eval()

with torch.no_grad():
    for batch in tqdm(test_loader):
        images, targets = batch  # Assuming the test_loader returns (image, target)
        images = [img.to(device) for img in images]

        # Run inference
        preds = model(images)

        # Prepare the output format for metrics
        formatted_preds = [
            {
                "boxes": pred["boxes"].detach().cpu(),
                "scores": pred["scores"].detach().cpu(),
                "labels": pred["labels"].detach().cpu(),
            }
            for pred in preds
        ]
        formatted_targets = [
            {
                "boxes": target["boxes"].detach().cpu(),
                "labels": target["labels"].detach().cpu(),
            }
            for target in targets
        ]

        # Update the mAP metric
        map_metric.update(formatted_preds, formatted_targets)

# Compute final metrics
results = map_metric.compute()
print(results)


### YOLO

#### Dataset

The images for this section are present under CryoEM/images and the corresponding annotation file for each image is present in CryoEM/particle_coordinates. For this assignment we will only use the first three columns of the annotation file.


##### YOLO Data Format

In order to train YOLO model for our data, we will first have to restructure our data to match with YOLO's requirements, which can be found [here](https://docs.ultralytics.com/datasets/detect/). In brief each image needs a corresponding text file containing normalized bounding box coordinates and size, in corresponding directories based on wheather it is training/validation/testing splits.

In [17]:
# NOTE: YOLOV11 is very memory intensive, so to maintain 20 epochs I used
# the v2,8 TPU runtime. It takes a while, so I just decided to add one trained
# implementation with 20 epochs, which works with the TPU, and another
# with 20 epochs (still very solid) that is meant to work with the T4 GPU's memory.
# This one worked for me last-minute, but it really boils down to a gamble with CUDA's
# memory. I can't control this, so I ask if it doesnt work with the T4 GPU that you switch to a TPU runtime and
# run the first implementation. Both are clearly trained implementations so I don't really see anything wrong with this.

from PIL import Image
def convert_to_YOLO_format(file_name,IMAGE_DIR,YOLO_anno_dir):
    image_path = IMAGE_DIR + file_name + ".jpg"
    anno_path = ANNOTATION_DIR + file_name + ".csv"
    orig_image = np.array(cv2.imread(image_path, cv2.IMREAD_GRAYSCALE))
    img_w, img_h = orig_image.shape ##
    df = pd.read_csv(anno_path)
    df_values = df.iloc[:, :3].to_numpy() #Fill | Slice first three columns
    with open(YOLO_anno_dir + file_name + ".txt", "w") as f:
        for x,y,d in df_values:
            class_int = 1
            x_norm = x/img_w
            y_norm = y/img_h
            w_norm = d/img_w
            h_norm = d/img_h
            f.write("{} {} {} {} {}\n".format(class_int,x_norm,y_norm,w_norm,h_norm))


In [18]:
import os
ANNOTATION_DIR = "CryoEM/particle_coordinates/"


TRAIN_IMAGE_DIR = "CryoEM/images/train/"
TRAIN_YOLO_anno_dir = "CryoEM/labels/train/" # Directory to store converted labels

VAL_IMAGE_DIR = "CryoEM/images/val/"
VAL_YOLO_anno_dir = "CryoEM/labels/val/" # Directory to store converted labels

TEST_IMAGE_DIR = "CryoEM/images/test/"
TEST_YOLO_anno_dir = "CryoEM/labels/test/" # Directory to store converted labels

for directory in [TRAIN_YOLO_anno_dir, VAL_YOLO_anno_dir, TEST_YOLO_anno_dir]:
    os.makedirs(directory, exist_ok=True)

file_names = os.listdir(TRAIN_IMAGE_DIR)
for file_path in file_names:
    file_name = file_path[:-4]
    convert_to_YOLO_format(file_name,TRAIN_IMAGE_DIR,TRAIN_YOLO_anno_dir)

file_names = os.listdir(VAL_IMAGE_DIR)
for file_path in file_names:
    file_name = file_path[:-4]
    convert_to_YOLO_format(file_name,VAL_IMAGE_DIR,VAL_YOLO_anno_dir)

file_names = os.listdir(TEST_IMAGE_DIR)
for file_path in file_names:
    file_name = file_path[:-4]
    convert_to_YOLO_format(file_name,TEST_IMAGE_DIR,TEST_YOLO_anno_dir)

#### Training

You will have to generate .yaml file as given in the YOLO [docs](https://docs.ultralytics.com/datasets/detect/).

In [None]:
# There is no mentioning of testing the YOLO, so I just trained and left it as that.
# Dont shoot me for that please!!

!pip install ultralytics
from ultralytics import YOLO
modelYOLO = YOLO("yolo11n.pt") # Fill, You should be able to train a YOLOv11 in colab GPU

In [None]:
# 20 EPOCH TPU IMPLEMENTATION
resultsTPU = modelYOLO.train(data="cryoem.yaml", epochs=20, imgsz=640, cache=False)
## not sure about imgsz, just resized to 640 because its a good size

In [None]:
# GPU IMPLEMENTATION (20 epochs worked last minute)
torch.cuda.empty_cache()
gc.collect()
os.environ["PYTORCH_CUDA_ALLOC_CONF"]="expandable_segments:True"

device = "cuda" if torch.cuda.is_available() else "cpu"
modelYOLO = modelYOLO.to(device)
results = modelYOLO.train(data="cryoem.yaml", epochs=20, imgsz=640, cache=False)
## not sure about imgsz, just resized to 640 because its a good size

# BTW train22 IS THE MOST RECENT RESULT FOLDER FOR MY YOLO MODEL
# SORRY I DIDN'T HAVE TIME TO DELETE THE PREVIOUS TRAIN FOLDERS AND START AGAIN

## Submission guidelines
---
Extract the downloaded .zip file to a folder of your preference. The input and output paths are predefined and **DO NOT** change them, (we assume that 'Surname_Givenname_SBUID_hw5' is your working directory, and all the paths are relative to this directory).  The image read and write functions are already written for you. All you need to do is to fill in the blanks as indicated to generate proper outputs. **DO NOT** zip and upload the dataset on blackboard due to size limit.

When submitting your .zip file through blackboard, please
-- name your .zip file as **Surname_Givenname_SBUID_hw*.zip**.

This zip file should include:
```
Surname_Givenname_SBUID_hw*
        |---Surname_Givenname_SBUID_hw*.ipynb
        |---Surname_Givenname_SBUID_hw*.pdf
```

For instance, student Michael Jordan should submit a zip file named "Jordan_Michael_111134567_hw5.zip" for homework5 in this structure:
```
Jordan_Michael_111134567_hw5
        |---Jordan_Michael_111134567_hw5.ipynb
        |---Jordan_Michael_111134567_hw5.pdf
```

The **Surname_Givenname_SBUID_hw*.pdf** should include a **google shared link**. To generate the **google shared link**, first create a folder named **Surname_Givenname_SBUID_hw*** in your Google Drive with your Stony Brook account. The structure of the files in the folder should be exactly the same as the one you downloaded. If you alter the folder structures, the grading of your homework will be significantly delayed and possibly penalized.

Then right click this folder, click ***Get shareable link***, in the People textfield, enter ***the TA's email***. Make sure that TAs who have the link **can edit**, ***not just*** **can view**, and also **uncheck** the **Notify people** box.

Colab has a good feature of version control, you should take advantage of this to save your work properly. However, the timestamp of the submission made in blackboard is the only one that we consider for grading. To be more specific, we will only grade the version of your code right before the timestamp of the submission made in blackboard.

You are encouraged to post and answer questions on Piazza. Based on the amount of email that we have received in past years, there might be dealys in replying to personal emails. Please ask questions on Piazza and send emails only for personal issues.

Be aware that your code will undergo plagiarism check both vertically and horizontally. Please do your own work.

**Late submission penalty:** <br>
There will be a 10% penalty per day for late submission. However, you will have 3 days throughout the whole semester to submit late without penalty. Note that the grace period is calculated by days instead of hours. If you submit the homework one minute after the deadline, one late day will be counted. Likewise, if you submit one minute after the deadline, the 10% penaly will be imposed if not using the grace period.

## Attention on HW submission
---
Based on the issues we observed during HW1 grading, we would like to ***stress*** the following.

* Submit the ***zip file*** containing (a notebook, pdf of sharable link, results) on Blackboard, ***not only*** the pdf with link.

* Link in the pdf should be directed to the ***folder*** on Google Drive, not the notebook alone.

* ***DO NOT*** change the structure of the notebook. If you need additional codes, just add new cells. ***DO NOT*** delete existing cells.

* Notebook should run without errors by by clicking ***'run all'*** . Verify this before submission. Because we need to run all your notebooks for grading. (Your folder structure, paths on Google Drive should be correct. If you do your HW locally on Jupyter and upload later to Google Drive, ***run and verify*** this on Colab to avoid any ***PENALTY***.)

* ***DO NOT*** remove the outputs visualized in the notebook. We check both the codes and the outputs.

* Make sure you submit the notebook in which you coded your answers.

* Read the questions ***carefully***, as they may contain sub parts or even hints.

* Share your notebook with ***EDIT ACCESS*** to ***the TA***: ***namjoshi@cs.stonybrook.edu***. Uncheck the Notify people box.

If you don’t follow these instructions you will be penalized and the grading will be significantly delayed.



<!--Write your report here in markdown or html-->
