-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The problems about anchors ? #112
Comments
Anchors are sampled local windows with different scales / aspect ratio. The Region Proposal Network (RPN) classifies each local anchor (window) to be either foreground or background, and stretch the foreground windows to fit ground truth object bounding boxes. The implementation detail of how the anchors are defined can be found in ./lib/rpn/generate_anchors.py. |
Hello @happyharrycn : I have seen the source codes in ./lib/rpn/generate_anchors.py. It shows how to generate the anchors and I have some questions about that. Do 16 * 16 base anchors correspond to the feature maps of last convolutional layer ? |
The 16*16 case comes from the down-sample factor in conv5. This number also controls the scale. anchor_target_layer is used to match gt object boxes to proposals and generates training labels of the proposals, which are further needed by the loss function. It is thus not required during testing. |
Yes, as you said, anchor_target_layer is not required during testing. |
I am confusing why an anchor of 128x128 make a proposal of 188x111. |
In Although you do not need to use In
|
@shamanez For example, see how it works in
|
So is it like this? , RPN uses final conv layer to propose bbxs . More specifically we slide a small window (3*3) in the last conv layer. For the center of each window position we take K(9) anchors . Now we have to map the conv feature map center pixel position in to the real image , That's what we assign with the parameter first_stage_features_stride . And it means if we move one position in the conv feature map , it should be 16 pixels (stride) in the real image . Am I correct ? Another question , what should be the input to the network ? What is mention by this parameter |
@shamanez I guess it works for that you want to resize the input image with the initial aspect, and make sure it has a minimum dimension(height/width) of 600 pixels and a max dimension(width/height) of 1024 pixels. (Although I do not check the code in Tensorflow version, I am not very sure.) For example, if you have an image of size 400x512, it will be resized to 600x768; if the original image size is 100x512, the code will resize it to 200x1024. Something like this.... |
@zhenni In the paper they say the have scales squared of , 128,256 and 512 . So first I thought scales are in the fraction of 256 . But here they have four. So can you please elaborate on this . |
@shamanez plz check the code in You can use different anchor sizes for the network from the paper describes. Using smaller anchors you can detect smaller objects in the pictures.
|
@zhenni actually went through the whole repo. From what I understood this take variable size (In spacial dimensions ) input image bounded by given aspect ration . Then perform the convolution part and get the feature maps . From feature maps inorder to get scores on RPN or fee I also went through the image resize function , and you are correct. It will keep the aspect ration of any image by keeping bound to that range. And It uses bi-linear interpolation in-order to reduce the distortion . |
@zhenni This is what TF repo says about the resize function 💯
|
hi, my input image is 1920*1080?i change |
@gentlebreeze1 You can modify the function generate_anchors in lib/rpn/generate_anchors.py |
@zhenni hi zhenni, i was looking at this function about base_size. Originally it was 16 in py-faster-rcnn, but in TF version it is by default 256. I was wondering, say if we don't filter small objects (by setting RPN_MIN_SIZE: 0), do we change base_size = 0? but then i see base_size is used to create an array.. |
@loackerc |
@shamanez Anchor scales and aspect ratios are explained in Faster-RCNN paper . Check it out first, especially the experiment part on MSCOCO dataset. Here the key quota:
So it is obvious that they use 4 scales in case of MSCOCO dataset since it contains many small objects. |
I'm confused with something if someone could help me... I'm currently working on this https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10 but the tutorial says the bounding boxes should be at least 33x33 pixels, I don't know why and would like to know because my bounding boxes are smaller than that. I try checking some code related to the anchors generation and found the same 16x16 anchor stride that's here. I think is somekind related to the 33x33 min size bbox but don't how. Thanks |
In "Faster R-CNN" there is a picture as follows. Causer I don't understand the "Anchors" much. How can the 256-D features foward propagate from the "intermediate layer" to "cls layer" and "reg layer"? According to the network describtion that I saw in caffe, The propcess from sliding window to intermediate layer for generating the (2+4) x k output achieved by two kinds of convolution layers. So what do the anchors work for?
![20160311102155](https://cloud.githubusercontent.com/assets/12611573/13691356/1d593ecc-e773-11e5-8fdd-028036e48de6.png)
Because, according to the text description, 9 anchors are got from each sliding window. Our conjecture is that the parameters between intermediate layer and cls layer or the parameters between intermediate layer and reg layer is fixed in order to achieve a specific portion of the extraction window to do the classification and regression, but this is only our guess.
For Table 1,we don't really understand the meaning either. How do the scales and aspect ratios of the anchors which given in the first line correspond to the generated proposal given in the second line?
![20160311102304](https://cloud.githubusercontent.com/assets/12611573/13691380/5462429c-e773-11e5-97e1-c15e47b1527f.png)
The text was updated successfully, but these errors were encountered: