### Q1 The objectives of Selective Search in R-CNN?

1) to generate region proposals for objects in the image
2) to reduce the computation required for classifying every possible window in the image

### Q2  Explain the following phases involved in R-CNN

1) Region Proposal:- first we have to generate region proposals using selective search
2) warping and resizing:- warp the region proposals to a fixed size to feed into the CNN
3) Pre-trained CNN Architecture:- use a pretrained CNN like AlexNet to extract features from the warped regions
4) Pre-trained SVM model:- Use a pre-trained SVM classifier to classify the features extracted by the CNN into object categories
5) clean up: remove duplicate detections using greedy non maximum sppression

### Q3

pre-trained CNNs that can be used in the pre-trained CNN architecture phase are AlexNet, VGG, ResNet, Inception, etc.

### Q4

In R-CNN, an SVM is trained separately for each object class using the CNN features extracted from warped selective search region proposals as input.

### Q5

a) Take the highest scoring detection

b) Remove all other detections that have high intersection-over-union (IoU) with the top detection

c) Repeating for the remaining detections

### Q6

1) fast rcnn uses single CNN for whole image,reducing computation
2) it has higher detection accuracy
3) it is much faster at inference time

### Q7

1) ROI pooling works by taking the feature maps output by the CNN
2) ROI pooling maps the proposed region to a fixed spatial window on the feature maps
3) it max pool the values inside each spatial window into a fixed feature representation

### Q8

a) ROI Projections: mapping the proposed bounding boxes from the image coordinates to the feature map coordinates

b) ROI Pooling: extracting a fixed length feature vector from feature map for each ROI using max pooling

### Q9

In Fast R-CNN, the softmax classifier is used directly on the feature maps output by the ROI pooling, instead of using an SVM. This allows end-to-end training.

### Q10

* Faster R-CNN replaces Selective Search with a Region Proposal Network (RPN)
* The RPN predicts objectness scores and regresses region proposal coordinates in one pass of the CNN
* This makes region proposal much faster

### Q11

An anchor box is a set of defined bounding boxes of different aspect ratios and scales that serve as references for the RPN to predict offsets and objectness scores.

### Q12

In [None]:
import torch
import torchvision
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from torchvision.datasets import CocoDetection
import torch.utils.data
import torchvision.transforms as T

dataset = CocoDetection(root="F:/Third_Year/EDI_2/Sanket-master/assets/coco', annFile='F:/Third_Year/EDI_2/Sanket-master/assets/annotations.json', 
                        transform=T.ToTensor())
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [len(dataset)-500, 500])

backbone = torchvision.models.resnet50(pretrained=True)

anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0)))


rpn_head = torchvision.ops.misc.RandomBoxes(anchor_generator)
roi_head = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(backbone.out_channels, 91)  


model = FasterRCNN(backbone, rpn_head, roi_head)


train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=2, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=2, shuffle=False)

optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)

num_epochs = 10
for epoch in range(num_epochs):
    # Train for one epoch
    model.train()
    for images, targets in train_loader:
        optimizer.zero_grad()
        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())
        losses.backward()
        optimizer.step()
    
    lr_scheduler.step()
    

    model.eval()
    losses = []
    for images, targets in val_loader:
        with torch.no_grad():
            loss_dict = model(images, targets)
            losses.append(sum(loss for loss in loss_dict.values()))
    val_loss = sum(losses) / len(val_loader)
    print(f'Epoch: {epoch+1}, Validation Loss: {val_loss}')


model.eval()
cpu_device = torch.device("cpu")
for image in val_dataset:
    image = [T.ToTensor()(image)]
    output = model(image)[0]
    print(f'Found {len(output["boxes"])} objects in image')

    img = T.ToPILImage()(image[0])
    draw = torchvision.utils.draw_bounding_boxes(img, output["boxes"], colors="red")
    draw.save(f"result.jpg")
