Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCNN blog #27

Open
sezan92 opened this issue Feb 27, 2023 · 20 comments
Open

RCNN blog #27

sezan92 opened this issue Feb 27, 2023 · 20 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@sezan92
Copy link
Owner

sezan92 commented Feb 27, 2023

Objective

This issue is to work on RCNN blog.

Tasks

  • TBD
@sezan92 sezan92 self-assigned this Feb 27, 2023
@sezan92 sezan92 added the documentation Improvements or additions to documentation label Feb 27, 2023
@sezan92
Copy link
Owner Author

sezan92 commented Feb 27, 2023

Topics

  • Why I started workin on RCNN and why now ?

  • What was my expected outcome?

- [ ] What i have learnt

  • Start working on code from flow chart

    • code on prepare data script
    • code on training model
  • Describe the flowcharts

  • Describe the data

  • What problems/ challanges i faced

  • How did I solve the challenges

  • why choose 2 classes later

  • Result

@sezan92
Copy link
Owner Author

sezan92 commented Feb 28, 2023

Why started working on RCNN

I have been working as AI engineer for 5 years. Most of my projects are mostly related to Computer vision, Almost all of them are on Object Detection. I have trained, fine-tuned, collected data, and deployed Object Detection models many times. But I could not get any chance to implement any one object detection model from scratch! Because implementing object detection models from scratch is not efficient when you already have open source implementations available!

Is it necessary to implement?

I do not know about others. But for me, implementing any model/algorithm from scratch helps me understand the issue far better than just reading from other sources! For production, we do not need to implement it from scratch. We can reuse available implementations/libraries. But for own self, I find implementing is the best way to know any architecture.

Why an old algorithm like RCNN?

For several reasons,

  • It is the least complex of all models
  • Because CNN-based object detection kickstarted with it.
  • I plan to refactor and rescale the algorithm into modern / latest/ SOTA models.

What kind of skills it may show?

If I had implemented a SOTA algorithm, it might have shown my skills in the latest algorithms. I get that. But due to my less time in hand with full-time work and other stuff (family, learning other stuff on AI, sports, etc), I needed to start with an old and easy one. All-and-all, this implementation has taught me several things

  • Chunking for big projects
  • Using GitHub projects feature and agile methodology to do big projects (more on that later)
  • Grinding my teeth even if some problem appears.

TODO

  • Revise and check if this is okay or extra needed

@sezan92
Copy link
Owner Author

sezan92 commented Mar 1, 2023

What was my expected outcome

My expected outcome was to implement an object detection model from scratch. Nothing else. If someone reuses it (which is highly implausible) then fine! But I got more than that

What outcomes I earned

  • structuring a big project into small sub tasks

  • update the plan along the way

  • when implementing a model first try on toy data, simple data.

  • Implementing a model from paper

  • Structuring project

  • specifically Kanban board from github. it helped me a lot. for example
    kanban

  • Replanning the project when the results do not come good

  • Debugging a project based on results

What have I learnt.

  • Same thing

TODO

  • Check and revise this section
  • Add details about structuring of this project.

@sezan92
Copy link
Owner Author

sezan92 commented Mar 6, 2023

Update 2023/03/06

  • Initial flow chart

09E323FC-A946-42F6-80EC-D5E4FF1C4230

TODO

  • check and revise
  • if correct then make in pc

@sezan92
Copy link
Owner Author

sezan92 commented Mar 7, 2023

Update 2023/03/07

  • Actual paper system
    Screenshot from 2023-03-07 13-34-45

  • We had to make some changes for initial implementation

  • we didnt use svm . we directly used softmax

  • we only trained on dogs and cats for easiness and lot of dataset.

  • no evaluation yet.

@sezan92
Copy link
Owner Author

sezan92 commented Mar 7, 2023

TODO 2023/03/07

  • make the flowchart in PC

@sezan92
Copy link
Owner Author

sezan92 commented Mar 8, 2023

Update 2023/03/08

TODO 2023/03/08

  • revise and check if something can be updated

@sezan92
Copy link
Owner Author

sezan92 commented Mar 9, 2023

Update 2023/03/09

RCNN description

The RCNN model is not an end-to-end model. i.e. we cannot feed the dataset as annotated, at one end and expect the model to figure out the rest. Rather it has a multi-step process for training. The processes are described below,

Dataset Preparation

For the dataset preparation, we extract regions based on selective search and then filter out the regions with IoU greater than a certain threshold (here $0.6$) as positive images. Here, positive means the image belongs to a certain class.
[add flowchart of region extraction]

$Region extraction -> measure iou -> if iou greater than upper threshold -> positive for a class$

most likely one region might have overlap with multiple classes. For example, if there is a picture of both dog and cat, there is a chance that the regions of dog and cats will have common overlap. in that case, we consider the maximum region iou.

[ Add flowchart for data preparation]

psuedocode

for image, bboxes,labels from dataset
    for bbox, label in bboxes , labels
        regions <- selective_search(image)
        for region in regions
            max_region_iou <- 0
            for bbox in bboxes:
                region_iou <- get_iou(region, bbox)
                if region_iou > max_region_iou
                    max_region_iou <- region_iou
                    max_region_label <-label corresponding to bbox
             if max_region_iou > upper_iou_threshold
                 save_the_region_in_the_respective_dir(region, max_region_label)
             elif max_region_iou < lower_iou_threshold
                 save_the_region_as_background(region)
             

The code is inefficient. I hope to optimize it later.

command

python3 /src/prepare_data.py {voc2007,voc2012}  --ss_method SS_METHOD --num_rects NUM_RECTS --output OUTPUTDIRECTORY --data_batch_size DATABATCHSIZE --split {train,test,validation} --upper_iou_thresh UPPER_IOU --lower_iou_thresh LOWER_IOU --minimum_bg MINIMUM_SIZE_OF_BACKGROUND_IMAGE

TODO

  • describe the selective search , iou threshold process
  • describe the different scenarios , if an image belongs to 2 classes based on iou, what would happen
  • share the codeblock
  • share the command

@sezan92
Copy link
Owner Author

sezan92 commented Mar 10, 2023

Update 2023/03/10

TODO

  • check and revise

if revision done

  • write up about model. chosen model in the paper and your model
  • last layer.

@sezan92
Copy link
Owner Author

sezan92 commented Mar 13, 2023

Update 2023/03/13

Model

In the original RCNN paper. they used Alexnet as the CNN model. The reason is that it was a State-of-the-art model at the time. I used the VGG16 model. The only reason is that it was very easy to use in TensorFlow. Also in the original paper, they extracted features from the model and fed them into the SVM layer. It was chosen empirically. I only used softmax because again, it seemed easier. So in short we can summarize the difference like the following,

image -> Alexnet -> features -> SVM -> result

my implementation

image -> VGG16-> features -> softmax -> result

Dataset

The original paper trained the model on VOC2007, and VOC2012 [confirm it]. I started to train on VOC2012, but the evaluation metrics didn't seem good at the beginning. It was very poor for several reasons (will explain later the challenges faced). But after some time I realized from the confusion matrix (add confusion matrix) that the model was working well on only pictures of dogs and cats. So I decided to only work on them from all of the classes. for simplicity, later maybe I will increase the complexity.

steps

  • At first, I extracted all regions
  • I separated the images of backgrounds, dogs, and cats
  • then I relabeled them as 0 for the dog; 1 for the cat, and 2 for the background.

TODO

  • check and revise

@sezan92
Copy link
Owner Author

sezan92 commented Mar 15, 2023

Update 2023/03/15

  • command for preparing data

    python3 /src/rcnn/prepare_data.py DATA --ss_method {fast,quality} --num_rects NUM_OF_RECTS --output 
    OUTPUT_DIRECTORY --data_batch_size DATA_BATCH_SIZE --upper_iou_thresh UPPER_IOU_THRESHOLD --    lower_iou_thresh LOWER_IOU_THRESHOLD --minimum_bg_size MINIMUM_BACKGROUND_SIZE --split     {train/test/validation}

TODO

  • command for training models
  • psuedocode for training models
  • think about introducing the commands in the relavant sections

@sezan92
Copy link
Owner Author

sezan92 commented Mar 27, 2023

Update 2023/03/27

Training model

  • command for training model
    python3 /src/train.py --train_dir TRAIN_DIR_PATH --valid_dir VALID_DIR_PATH--batch_size BATCH_SIZE --learning_rate LEARNING_RATE --output MODEL_TARGET_DIR --num_classes NUMBER_OF_CLASSES --bg_class BACKGROUND_CLASS_ID

The training model is simple as training a CNN model. We feed in the images per class .

TODO

  • describe training model
  • describe datagen from directory

@sezan92
Copy link
Owner Author

sezan92 commented Apr 4, 2023

Update 2023/04/04

Training model

In the paper, they selected the Alexnet model, these days, there are far better models. I selected to use VGG16 as it is fairly easy to use. In addition to the model vgg16 model, I added some augmentation layers . They are random flip, random translation, random rotation, and random contrast. At the end of the model, I used a 4096-dimensional linear layer with reluactivation function and usedsoftmax` for classification.

Image -> Augmentation layers -> VGG16 model without classification layer -> flattening + dropout -> relu layer -> classification layer 

[Add block diagram]

TODO

  • Draw the block diagram for model

@sezan92
Copy link
Owner Author

sezan92 commented Apr 8, 2023

Update 2023/04/08

Block diagram

TODO

  • Need to recheck

@sezan92
Copy link
Owner Author

sezan92 commented Apr 12, 2023

Update 2023/04/12

Block diagram

VGG16RCNN drawio

@sezan92
Copy link
Owner Author

sezan92 commented Apr 12, 2023

Update 2023/04/12

Initial Result, challenges faced

After training on VOC2012, I got the results like https://github.com/sezan92/ComputerVision/issues/85#issuecomment-1328206603

If you check the confusion matrix properly, most objects were not classified correctly! There were biases for certain classes! This seemed problematic. So I tried to debug the issue. To make things easier, I chose only two classes, Cats and Dogs, with the background. From visual inspection (I cannot provide the stats should get it), many BG class images seemed to have weird sizes and shapes. For example, 10 x 100, 1 x 10 etc. But in the test case, that might not be the case.

  • So, i introduced a minimum image size (that is 128 x 128 ) it helped me get realistic images

Also, another problem seemed that due to one iou threshold, many background images having very similar iou (like 0.45) were selected as Background. to make sure background were really background

  • I introduced a lower iou threshold or upper iou threshold. If the ba

TODO

  • revise and update the reason for two classes, introduction of lower iou threshold and upper iou threshold.

@sezan92
Copy link
Owner Author

sezan92 commented Apr 17, 2023

Update 2023/04/17

TODO

  • check the comment and write the conclusion

@sezan92
Copy link
Owner Author

sezan92 commented Apr 19, 2023

Update 2023/04/19

  • revised the comments, looks good. need to make script to generate results and evaluation

TODO

@sezan92
Copy link
Owner Author

sezan92 commented Apr 25, 2023

@sezan92
Copy link
Owner Author

sezan92 commented Apr 27, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant