# Faster R-CNN


![Faster R-CNN](https://lilianweng.github.io/lil-log/assets/images/faster-RCNN.png)
<center>Image taken from <a href="https://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html0">here</a></center>
<br><br>

So far, we talked about R-CNN and the logical speedup to do at this point is the way we select ROIs for our network. The ROI selection process is completed even before we put those proposals through the CNN part of the architecture. To speed things up, we would like to make it a part of the network itself! That's exactly the main difference that Faster R-CNN introduced in its architecture.

The network looks pretty similar to what we have had so far. We have a pre-trained network (e.g., VGG16), which we use for a CNN part, and the RPN (Region proposal network) to generate ROIs for the full model training. The step by step model flow is explained pretty well here: 

https://lilianweng.github.io/lil-log/2017/12/31/object-recognition-for-dummies-part-3.html

Overall, we have 3 new things introduced for the Faster R-CNN model:

- ### Anchor boxes

- ### Region Proposal Network (RPN)

- ### ROI pooling 

In this section of the course, we will cover all core techniques that make a Faster R-CNN model. Since there is a slight change in architecture which we already trained in the previous lessons, we will implement only the differences and provide you great resources to implement the entire model on your own time, for your data! However, the full implementation of the model (including pre-processing, post-processing) is very long and might take several days/weeks to complete. 

https://tryolabs.com/blog/2018/01/18/faster-r-cnn-down-the-rabbit-hole-of-modern-object-detection/
https://www.analyticsvidhya.com/blog/2018/11/implementation-faster-r-cnn-python-object-detection/
https://towardsdatascience.com/faster-r-cnn-object-detection-implemented-by-keras-for-custom-data-from-googles-open-images-125f62b9141a


### Steps:
1. Import dependencies
2. Define the make_anchors

### Topics covered and learning objectives
- Anchor boxes for object detection

### Time estimates:
- Reading/Watching materials: 20min
- Exercises: 10min
<br><br>
- **Total**: ~30min



# Anchor boxes

![](https://ww2.mathworks.cn/help/vision/ug/ssd_detection.png)
<center>Image taken from <a href="https://ww2.mathworks.cn/help/vision/ug/getting-started-with-ssd.html">here</a></center>
<br><br>
In the previous tries to make the best object detection algorithm, we used randomly generated ROIs (including positions and sizes). This provided decent results, but since everything was random, it was not our control, and we very much relayed on luck. To handle this, people introduced **anchor boxes**.

    Anchor boxes are a set of predefined bounding boxes of a certain height and width. These boxes are defined to capture the scale and aspect ratio of specific object classes you want to detect, and are typically chosen based on object sizes in your training datasets. During detection, the predefined anchor boxes are tiled across the image. The network predicts the probability and other attributes, such as background, intersection over union (IoU), and offsets for every tiled anchor box. The predictions are used to refine each anchor box. You can define several anchor boxes, each for different object size. Anchor boxes are fixed initial-boundary box guesses.
Taken from [here](https://www.mathworks.com/help/vision/ug/anchor-boxes-for-object-detection.html)


When we tile them, they might look something like this:

![](https://dongjk.github.io/assets/article_images/2018-05-21-Faster_R-CNN_step_by_step/all_inside_anchors.jpg)
<center>Image taken from <a href="https://dongjk.github.io/code/object+detection/keras/2018/05/21/Faster_R-CNN_step_by_step,_Part_I.html">here</a></center>
<br><br>

### Anchor box resources:

- Quality Object detection: https://towardsdatascience.com/anchor-boxes-the-key-to-quality-object-detection-ddf9d612d4f9
- Anchors explained: https://www.mathworks.com/help/vision/ug/anchor-boxes-for-object-detection.html

In [None]:
from IPython.display import IFrame

In [None]:
IFrame("https://www.youtube.com/embed/RTlwl2bv0Tg", 1000, 500)

**In some cases, IPython widgets do not work!**

If this is the case here is the like for YouTube video from the cell above: https://www.youtube.com/watch?v=RTlwl2bv0Tg

## Import dependencies

In [None]:
import numpy as np
from tests import test_anchors

### Exercise 1: Complete the function that generates anchors

This function was created as a reference to the one found [here](https://d2l.ai/chapter_computer-vision/anchor.html). 
It creates a grid on top of an image and generates anchor boxes for each grid of an image. 

NOTE: If you define the grid_cell_size to be 1, this function will perform pixel-wise generation of anchor boxes.

The first part of the task is to fill a couple of lines of code for the **make_anchors** function. Here is how:

- Step 1: Calculate the number of grid cells for X-axis and Y-axis. **grid_cell_size** and **height, width** are already defined

- Step 2: Using np.arange function, find central points of each grid cell in the X-axis and Y-axis
    - HINT: you can use window_size/2 as a starting point
    - Use documentation of np.arange to find what arguments to provide
   
- Step 3: Using np.stack function to stack together this list: *[center_x, center_y, center_x, center_y]* along axis one,  then repeat that block for the number of boxes per grid cells, along the axis zero.


After completing the first 3 steps and passing the test, go to the source code found [here](https://d2l.ai/chapter_computer-vision/anchor.html) walk through the implementation. This will take 10-15min on average. The goal is to understand different approaches to implementing anchor boxes.

In [None]:
def make_anchors(img_size, 
                 sizes, 
                 ratios, 
                 grid_cell_size=100):
    """
    Calculate anchor box proposals per part of the image.
    
    Args:
        :param img_size (tuple): WxH of images in a dataset E.g. (512, 512)
        :param sizes (np.array): array of sizes of bboxes
        :param ratios (np.array): array of ratios of bboxes
        :param grid_cell_size (int): size of a grid cell
    """
    height = img_size[0]
    width = img_size[1]
    
    grid_cell_size = grid_cell_size
    num_sizes = len(sizes)
    num_ratios = len(ratios)
    boxes_per_grid_cell = (num_sizes + num_ratios - 1) # Note: NOT num_sizes * num_ratios, see eq 14.4.1 in the link
    
    # Step 1: YOUR CODE HERE
    grid_x=None
    grid_y=None
    
    # Step 2: YOUR CODE HERE
    center_h = None
    center_w = None
    
    # Put center coordinates of each grid cell into a matrix 
    center_x, center_y = np.meshgrid(center_w, center_h)
    center_x, center_y = center_x.reshape(-1), center_y.reshape(-1)
    
    
    # Taken from: https://d2l.ai/chapter_computer-vision/anchor.html
    # This part of the code calculates ratios and sizes of anchor boxes
    w = np.concatenate((sizes * np.sqrt(ratios[0]), 
                        sizes[0] * np.sqrt(ratios[1:]))) * grid_cell_size  
    h = np.concatenate((sizes / np.sqrt(ratios[0]),
                        sizes[0] / np.sqrt(ratios[1:]))) * grid_cell_size
    
    # Get all calculates 
    anchor_manipulations = np.tile(np.stack((-w, -h, w, h)).T, ((grid_x * grid_y, 1))) / 2
    
    # Step 3: YOUR CODE HERE:
    # Stack center points together and make a final grind
    final_grid = None
    
    # Add new coordinates and generate new position
    output = final_grid + anchor_manipulations
    
    return np.expand_dims(output, axis=0).reshape(grid_x, grid_y, boxes_per_grid_cell, 4)

In [None]:
### RUN THIS CELL TO TEST YOUR IMPLEMENTATION
test_anchors(make_anchors)