# Region-based Convolutional Neural Network (R-CNN)

<br><br>
![](https://lilianweng.github.io/lil-log/assets/images/RCNN.png)
<center>Image taken from <a href="https://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html">here</a></center>

Now it's time for you to put together one of the first models for object detection - **R-CNN**! 

This model is far from perfect, but it started the journey of exploring how to improve the ways we get ROIs for the *two-stage* models. Later in the course, we will discuss the most advanced version of this model called *Faster R-CNN* and all the things that took us from the R-CNN to Faster R-CNN.

If you look at the architecture of the R-CNN, you'll notice that it's not much different from the model we had in the *naive approach* lesson. 

The simplified version of the R-CNN model:

We start with some regions of interest. In the case of R-CNN, we use Selective Search to get these regions. After we have about 2k of them, we put them through a pre-trained CNN architecture (Originally VGG) and get predictions for each class + background.

There is more to this. To learn how the whole (paper-based) implementation works, read this blog: https://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html


Now that you know the R-CNN architecture, let's build one! Since we have two classes (Airplane, background), we won't be making separate classifiers for each class, but our CNN will do the whole work!



### Steps:
1. Import dependencies
2. Selective Search data generator and data generation
3. Model definition and training
4. Prediction loop

### Topics covered and learning objectives
- Intersection over Union (IoU)
- Object detection concept

### Time estimates:
- Reading/Watching materials: 15min
- Exercises: 1h
<br><br>
- **Total**: ~1h

**This time does not include execution time!**

## Implement dependencies

In [1]:
from pathlib import PurePath, Path
import os
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
from sklearn.model_selection import train_test_split

import tensorflow as tf
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Flatten, Dense, Dropout, GlobalAvgPool2D

# Importing custom project-based utils
from utils import IoU, data_loader
from tests import selective_search_generator_TEST

## Loading the datset 

In [None]:
train_data, test_data = data_loader()

### Exercise 1: Update the sliding window generator to use selective search

You'll find the code for the sliding window data generator used in the **naive approach to object detection** lesson in the code below. Your task is to fill a couple of lines, so it uses the selective search algorithm instead.

NOTE: *selective_search* is already given as an argument of the function. Use it inside of the function.

In [2]:
def selective_search_training_data(img_obj,
                                   selective_search,
                                   img_size=(224, 224),
                                   number_of_samples=10,
                                   number_of_regions=2000):

    training_images = []
    training_labels = []

    img = img_obj['img']

    #YOUR CODE HERE: Addd selective search here
    selective_search.setBaseImage(img)
    selective_search.switchToSelectiveSearchFast()
    ssresults = selective_search.process()

    positive_samples=0
    negative_samples=0
    if len(ssresults) > 0:
        number_of_regions = np.minimum(len(ssresults), number_of_regions)

        for i in range(number_of_regions):
            x,y,w,h = ssresults[i]
            proposed_region = {"x1":x,
                               "x2":x+w,
                               "y1":y,
                               "y2":y+h}

            for obj in img_obj['objects']:

                iou = IoU(obj, proposed_region)

                # Generating positive samples
                if positive_samples < number_of_samples:
                    if iou > 0.7:
                        proposal = cv2.resize(img[y:y+h,x:x+w],
                                              img_size, interpolation = cv2.INTER_AREA)

                        training_images.append(proposal)
                        training_labels.append(1)
                        positive_samples += 1

                # Generating negative samples
                if negative_samples < number_of_samples:
                    if iou < 0.3:
                        proposal = cv2.resize(img[y:y+h,x:x+w],
                                              img_size, interpolation = cv2.INTER_AREA)

                        training_images.append(proposal)
                        training_labels.append(0)
                        negative_samples += 1

    else:
        print("No regions found")

    return np.array(training_images), np.array(training_labels)

In [3]:
### TEST YOUR TRAINING SET GENERATOR
selective_search_generator_TEST(selective_search_training_data)

### Exercise 2: Write the training set generation loop

In the cell below, you have two empty lists, **training_images** and **training_labels** your task is to define the Selective Search algorithm and write a custom for loop that calls **selective_search_training_data** for each item in the **train_data** dictionary.

Append all results to those two empty lists 


**IMPORTANT NOTE**: My implementation took about **1h 30min** to complete. Be cautious. A longer execution time is expected!

In [None]:
training_images = []
training_labels = []

# YOUR CODE HERE

ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()

for key in tqdm(train_data.keys()):
    obj = train_data[key]
    imgs, labels = selective_search_training_data(obj, ss)
    training_images.append(imgs)
    training_labels.append(labels)


# DON'T CHANGE CODE BELOW THIS LINE
X = np.vstack(training_images)
y = np.hstack(training_labels)

### Exercise 3: Split data to training and testing parts using sklearn library

In [None]:
# YOUR CODE HERE
X_train, X_test, y_train, y_test = train_test_split(X, y)

### Exercise 4: Define VGG16 network 

Define the VGG16 Network trained on the **imagenet** dataset. The network should *not* include the top part.

After you define it, make sure to freeze the network.

In [None]:
base_model = VGG16(weights='imagenet', include_top=False)
base_model.trainable = False

### Exercise 5: Define the custom head part of the network

Define the custom header with; it's up to you what architecture you want to use, the only important part is to have the output layer defined as - 

`Dense(1, activation="sigmoid")`

Example of the custom head I've used and achieved decent accuracy:

- Based model
- Start with GlobaAvgPool2D layer 
- Output layer 

This model achieved about ~97% accuracy on the validation set.

In [None]:
# YOUR CODE HERE
flattened_features = GlobalAvgPool2D()(base_model.output)
predictions = Dense(1, activation="sigmoid")(flattened_features)
model = Model(inputs=base_model.inputs, outputs=predictions)

### Exercise 6: Compile and train the model

While compiling the model, make sure to set `metrics=['acc']` and loss to `binary_crossentropy`. The rest of the arguments are up to you.

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])

In [None]:
model.fit(X_train, y_train, batch_size=32, epochs= 10, validation_data=(X_test, y_test), callbacks=[EarlyStopping(patience=3)])

In [None]:
plt.imshow(train_data['img1.png']['img'])

# Prediction time!

### Exercise 7: Write a custom code to make predictions with the trained model

The following exercise is writing the custom prediction code to make AND visualize outputs from your model.

Here are the pointers to keep in mind:
- Take images from the **test_data**
- Run Selective Search on top of an image
- Selective Search can generate a few hundred proposals to thousands! You don't want to check all of them. Make sure this is a constant!
- Each proposal should be 224x224 because of the model
- Run model on top of the proposal
- Model will produce a confidence score ranging from 0 to 1, don't visualize all the rectangles but only the most confident once! The number that worked for me was *0.95*

In [None]:
number_of_regions = 2000
img_name = np.random.choice(list(test_data.keys()))
print(img_name)
random_img = test_data[img_name]['img']

ss.setBaseImage(random_img)
ss.switchToSelectiveSearchFast()
ssresults = ss.process()

number_of_regions = min(len(ssresults), number_of_regions)
for i in tqdm(range(number_of_regions)):
    x, y, w, h = ssresults[i]
    proposal = cv2.resize(random_img[y:y+h, x:x+w], (224, 224), interpolation = cv2.INTER_AREA)

    img = np.expand_dims(proposal, axis=0)
    out= model.predict(img)
    if out[0][0] > 0.95:
        cv2.rectangle(random_img, (x, y), (x+w, y+h), (0, 255, 0), 1, cv2.LINE_AA)

plt.figure()
plt.imshow(random_img)

## What's next

Fantastic work so far! We got a pretty decent model with only the ROI generation change, but you've probably seen some predictions overlap and are either smaller or larger than the object we are looking for.

There are a couple of things that we can do about this!
- We can apply a technique called Hard-Negative mining to find negative samples and re-trained the model 
- Apply a technique called Non-Maximum suppression to merge overlapping predictions

Let's talk about these two techniques in the following two lessons