RCNN blog #27

sezan92 · 2023-02-27T09:45:55Z

Objective

This issue is to work on RCNN blog.

Tasks

TBD

sezan92 · 2023-02-27T09:51:53Z

sezan92 · 2023-02-28T04:40:55Z

Why started working on RCNN

I have been working as AI engineer for 5 years. Most of my projects are mostly related to Computer vision, Almost all of them are on Object Detection. I have trained, fine-tuned, collected data, and deployed Object Detection models many times. But I could not get any chance to implement any one object detection model from scratch! Because implementing object detection models from scratch is not efficient when you already have open source implementations available!

Is it necessary to implement?

I do not know about others. But for me, implementing any model/algorithm from scratch helps me understand the issue far better than just reading from other sources! For production, we do not need to implement it from scratch. We can reuse available implementations/libraries. But for own self, I find implementing is the best way to know any architecture.

Why an old algorithm like RCNN?

For several reasons,

It is the least complex of all models
Because CNN-based object detection kickstarted with it.
I plan to refactor and rescale the algorithm into modern / latest/ SOTA models.

What kind of skills it may show?

If I had implemented a SOTA algorithm, it might have shown my skills in the latest algorithms. I get that. But due to my less time in hand with full-time work and other stuff (family, learning other stuff on AI, sports, etc), I needed to start with an old and easy one. All-and-all, this implementation has taught me several things

Chunking for big projects
Using GitHub projects feature and agile methodology to do big projects (more on that later)
Grinding my teeth even if some problem appears.

TODO

Revise and check if this is okay or extra needed

sezan92 · 2023-03-01T08:26:57Z

sezan92 · 2023-03-06T04:02:23Z

Update 2023/03/06

Initial flow chart

TODO

check and revise
if correct then make in pc

sezan92 · 2023-03-07T04:42:07Z

Update 2023/03/07

Actual paper system
We had to make some changes for initial implementation
we didnt use svm . we directly used softmax
we only trained on dogs and cats for easiness and lot of dataset.
no evaluation yet.

sezan92 · 2023-03-07T04:42:40Z

TODO 2023/03/07

make the flowchart in PC

sezan92 · 2023-03-08T14:00:05Z

Update 2023/03/08

initial version of flowchart https://drive.google.com/file/d/1-ZNG2CTiiWa9GyJMNew1A5S5WBrA1ohH/view?usp=sharing

TODO 2023/03/08

revise and check if something can be updated

sezan92 · 2023-03-09T02:56:42Z

Update 2023/03/09

RCNN description

The RCNN model is not an end-to-end model. i.e. we cannot feed the dataset as annotated, at one end and expect the model to figure out the rest. Rather it has a multi-step process for training. The processes are described below,

Dataset Preparation

For the dataset preparation, we extract regions based on selective search and then filter out the regions with IoU greater than a certain threshold (here $0.6$) as positive images. Here, positive means the image belongs to a certain class.
[add flowchart of region extraction]

$Region extraction -> measure iou -> if iou greater than upper threshold -> positive for a class$

most likely one region might have overlap with multiple classes. For example, if there is a picture of both dog and cat, there is a chance that the regions of dog and cats will have common overlap. in that case, we consider the maximum region iou.

[ Add flowchart for data preparation]

psuedocode

for image, bboxes,labels from dataset
    for bbox, label in bboxes , labels
        regions <- selective_search(image)
        for region in regions
            max_region_iou <- 0
            for bbox in bboxes:
                region_iou <- get_iou(region, bbox)
                if region_iou > max_region_iou
                    max_region_iou <- region_iou
                    max_region_label <-label corresponding to bbox
             if max_region_iou > upper_iou_threshold
                 save_the_region_in_the_respective_dir(region, max_region_label)
             elif max_region_iou < lower_iou_threshold
                 save_the_region_as_background(region)

The code is inefficient. I hope to optimize it later.

command

python3 /src/prepare_data.py {voc2007,voc2012}  --ss_method SS_METHOD --num_rects NUM_RECTS --output OUTPUTDIRECTORY --data_batch_size DATABATCHSIZE --split {train,test,validation} --upper_iou_thresh UPPER_IOU --lower_iou_thresh LOWER_IOU --minimum_bg MINIMUM_SIZE_OF_BACKGROUND_IMAGE

TODO

describe the selective search , iou threshold process
describe the different scenarios , if an image belongs to 2 classes based on iou, what would happen
share the codeblock
share the command

sezan92 · 2023-03-10T03:44:06Z

Update 2023/03/10

updated comment RCNN blog #27 (comment)

TODO

check and revise

if revision done

write up about model. chosen model in the paper and your model
last layer.

sezan92 · 2023-03-13T07:41:43Z

Update 2023/03/13

Model

In the original RCNN paper. they used Alexnet as the CNN model. The reason is that it was a State-of-the-art model at the time. I used the VGG16 model. The only reason is that it was very easy to use in TensorFlow. Also in the original paper, they extracted features from the model and fed them into the SVM layer. It was chosen empirically. I only used softmax because again, it seemed easier. So in short we can summarize the difference like the following,

image -> Alexnet -> features -> SVM -> result

my implementation

image -> VGG16-> features -> softmax -> result

Dataset

The original paper trained the model on VOC2007, and VOC2012 [confirm it]. I started to train on VOC2012, but the evaluation metrics didn't seem good at the beginning. It was very poor for several reasons (will explain later the challenges faced). But after some time I realized from the confusion matrix (add confusion matrix) that the model was working well on only pictures of dogs and cats. So I decided to only work on them from all of the classes. for simplicity, later maybe I will increase the complexity.

steps

At first, I extracted all regions
I separated the images of backgrounds, dogs, and cats
then I relabeled them as 0 for the dog; 1 for the cat, and 2 for the background.

TODO

check and revise

sezan92 · 2023-03-15T02:03:09Z

Update 2023/03/15

command for preparing data

python3 /src/rcnn/prepare_data.py DATA --ss_method {fast,quality} --num_rects NUM_OF_RECTS --output 
OUTPUT_DIRECTORY --data_batch_size DATA_BATCH_SIZE --upper_iou_thresh UPPER_IOU_THRESHOLD --    lower_iou_thresh LOWER_IOU_THRESHOLD --minimum_bg_size MINIMUM_BACKGROUND_SIZE --split     {train/test/validation}

TODO

command for training models
psuedocode for training models
think about introducing the commands in the relavant sections

sezan92 · 2023-03-27T05:41:07Z

Update 2023/03/27

Training model

command for training model

python3 /src/train.py --train_dir TRAIN_DIR_PATH --valid_dir VALID_DIR_PATH--batch_size BATCH_SIZE --learning_rate LEARNING_RATE --output MODEL_TARGET_DIR --num_classes NUMBER_OF_CLASSES --bg_class BACKGROUND_CLASS_ID

The training model is simple as training a CNN model. We feed in the images per class .

TODO

describe training model
describe datagen from directory

sezan92 · 2023-04-04T07:30:55Z

Update 2023/04/04

Training model

In the paper, they selected the Alexnet model, these days, there are far better models. I selected to use VGG16 as it is fairly easy to use. In addition to the model vgg16 model, I added some augmentation layers . They are random flip, random translation, random rotation, and random contrast. At the end of the model, I used a 4096-dimensional linear layer with reluactivation function and usedsoftmax` for classification.

Image -> Augmentation layers -> VGG16 model without classification layer -> flattening + dropout -> relu layer -> classification layer

[Add block diagram]

TODO

Draw the block diagram for model

sezan92 · 2023-04-08T14:07:08Z

Update 2023/04/08

Block diagram

https://drive.google.com/file/d/1N-mDVBdz9paAQ--t2linqGRecyhesfkk/view?usp=sharing

TODO

Need to recheck

sezan92 · 2023-04-12T13:30:48Z

Update 2023/04/12

Block diagram

sezan92 · 2023-04-12T13:37:37Z

Update 2023/04/12

Initial Result, challenges faced

After training on VOC2012, I got the results like https://github.com/sezan92/ComputerVision/issues/85#issuecomment-1328206603

If you check the confusion matrix properly, most objects were not classified correctly! There were biases for certain classes! This seemed problematic. So I tried to debug the issue. To make things easier, I chose only two classes, Cats and Dogs, with the background. From visual inspection (I cannot provide the stats should get it), many BG class images seemed to have weird sizes and shapes. For example, 10 x 100, 1 x 10 etc. But in the test case, that might not be the case.

So, i introduced a minimum image size (that is 128 x 128 ) it helped me get realistic images

Also, another problem seemed that due to one iou threshold, many background images having very similar iou (like 0.45) were selected as Background. to make sure background were really background

I introduced a lower iou threshold or upper iou threshold. If the ba

TODO

revise and update the reason for two classes, introduction of lower iou threshold and upper iou threshold.

sezan92 · 2023-04-17T00:08:24Z

Update 2023/04/17

revised and updated RCNN blog #27 (comment)

TODO

check the comment and write the conclusion

sezan92 · 2023-04-19T06:33:44Z

Update 2023/04/19

revised the comments, looks good. need to make script to generate results and evaluation

TODO

start working on https://github.com/sezan92/ComputerVision/issues/124

sezan92 · 2023-04-25T04:42:32Z

Update 2023/04/25

https://github.com/sezan92/ComputerVision/issues/124#issuecomment-1521144776

sezan92 · 2023-04-27T06:03:20Z

Update 2023/04/27

https://github.com/sezan92/ComputerVision/issues/124#issuecomment-1524784417

sezan92 self-assigned this Feb 27, 2023

sezan92 added the documentation Improvements or additions to documentation label Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RCNN blog #27

RCNN blog #27

sezan92 commented Feb 27, 2023

sezan92 commented Feb 27, 2023 •

edited

Loading

sezan92 commented Feb 28, 2023 •

edited

Loading

sezan92 commented Mar 1, 2023 •

edited

Loading

sezan92 commented Mar 6, 2023 •

edited

Loading

sezan92 commented Mar 7, 2023 •

edited

Loading

sezan92 commented Mar 7, 2023 •

edited

Loading

sezan92 commented Mar 8, 2023 •

edited

Loading

sezan92 commented Mar 9, 2023 •

edited

Loading

sezan92 commented Mar 10, 2023 •

edited

Loading

sezan92 commented Mar 13, 2023 •

edited

Loading

sezan92 commented Mar 15, 2023 •

edited

Loading

sezan92 commented Mar 27, 2023 •

edited

Loading

sezan92 commented Apr 4, 2023 •

edited

Loading

sezan92 commented Apr 8, 2023 •

edited

Loading

sezan92 commented Apr 12, 2023 •

edited

Loading

sezan92 commented Apr 12, 2023 •

edited

Loading

sezan92 commented Apr 17, 2023 •

edited

Loading

sezan92 commented Apr 19, 2023 •

edited

Loading

sezan92 commented Apr 25, 2023 •

edited

Loading

sezan92 commented Apr 27, 2023 •

edited

Loading

RCNN blog #27

RCNN blog #27

Comments

sezan92 commented Feb 27, 2023

Objective

Tasks

sezan92 commented Feb 27, 2023 • edited Loading

Topics

sezan92 commented Feb 28, 2023 • edited Loading

Why started working on RCNN

Is it necessary to implement?

Why an old algorithm like RCNN?

What kind of skills it may show?

TODO

sezan92 commented Mar 1, 2023 • edited Loading

What was my expected outcome

What outcomes I earned

What have I learnt.

TODO

sezan92 commented Mar 6, 2023 • edited Loading

Update 2023/03/06

TODO

sezan92 commented Mar 7, 2023 • edited Loading

Update 2023/03/07

sezan92 commented Mar 7, 2023 • edited Loading

TODO 2023/03/07

sezan92 commented Mar 8, 2023 • edited Loading

Update 2023/03/08

TODO 2023/03/08

sezan92 commented Mar 9, 2023 • edited Loading

Update 2023/03/09

RCNN description

Dataset Preparation

TODO

sezan92 commented Mar 10, 2023 • edited Loading

Update 2023/03/10

TODO

sezan92 commented Mar 13, 2023 • edited Loading

Update 2023/03/13

Model

Dataset

TODO

sezan92 commented Mar 15, 2023 • edited Loading

Update 2023/03/15

TODO

sezan92 commented Mar 27, 2023 • edited Loading

Update 2023/03/27

Training model

TODO

sezan92 commented Apr 4, 2023 • edited Loading

Update 2023/04/04

Training model

TODO

sezan92 commented Apr 8, 2023 • edited Loading

Update 2023/04/08

Block diagram

TODO

sezan92 commented Apr 12, 2023 • edited Loading

Update 2023/04/12

Block diagram

sezan92 commented Apr 12, 2023 • edited Loading

Update 2023/04/12

Initial Result, challenges faced

TODO

sezan92 commented Apr 17, 2023 • edited Loading

Update 2023/04/17

TODO

sezan92 commented Apr 19, 2023 • edited Loading

Update 2023/04/19

TODO

sezan92 commented Apr 25, 2023 • edited Loading

Update 2023/04/25

sezan92 commented Apr 27, 2023 • edited Loading

Update 2023/04/27

sezan92 commented Feb 27, 2023 •

edited

Loading

sezan92 commented Feb 28, 2023 •

edited

Loading

sezan92 commented Mar 1, 2023 •

edited

Loading

sezan92 commented Mar 6, 2023 •

edited

Loading

sezan92 commented Mar 7, 2023 •

edited

Loading

sezan92 commented Mar 7, 2023 •

edited

Loading

sezan92 commented Mar 8, 2023 •

edited

Loading

sezan92 commented Mar 9, 2023 •

edited

Loading

sezan92 commented Mar 10, 2023 •

edited

Loading

sezan92 commented Mar 13, 2023 •

edited

Loading

sezan92 commented Mar 15, 2023 •

edited

Loading

sezan92 commented Mar 27, 2023 •

edited

Loading

sezan92 commented Apr 4, 2023 •

edited

Loading

sezan92 commented Apr 8, 2023 •

edited

Loading

sezan92 commented Apr 12, 2023 •

edited

Loading

sezan92 commented Apr 12, 2023 •

edited

Loading

sezan92 commented Apr 17, 2023 •

edited

Loading

sezan92 commented Apr 19, 2023 •

edited

Loading

sezan92 commented Apr 25, 2023 •

edited

Loading

sezan92 commented Apr 27, 2023 •

edited

Loading