Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problems about anchors ? #112

Open
JohnnyY8 opened this issue Mar 11, 2016 · 20 comments
Open

The problems about anchors ? #112

JohnnyY8 opened this issue Mar 11, 2016 · 20 comments

Comments

@JohnnyY8
Copy link

In "Faster R-CNN" there is a picture as follows. Causer I don't understand the "Anchors" much. How can the 256-D features foward propagate from the "intermediate layer" to "cls layer" and "reg layer"? According to the network describtion that I saw in caffe, The propcess from sliding window to intermediate layer for generating the (2+4) x k output achieved by two kinds of convolution layers. So what do the anchors work for?
Because, according to the text description, 9 anchors are got from each sliding window. Our conjecture is that the parameters between intermediate layer and cls layer or the parameters between intermediate layer and reg layer is fixed in order to achieve a specific portion of the extraction window to do the classification and regression, but this is only our guess.
20160311102155

For Table 1,we don't really understand the meaning either. How do the scales and aspect ratios of the anchors which given in the first line correspond to the generated proposal given in the second line?
20160311102304

@happyharrycn
Copy link

Anchors are sampled local windows with different scales / aspect ratio. The Region Proposal Network (RPN) classifies each local anchor (window) to be either foreground or background, and stretch the foreground windows to fit ground truth object bounding boxes. The implementation detail of how the anchors are defined can be found in ./lib/rpn/generate_anchors.py.

@JohnnyY8
Copy link
Author

Hello @happyharrycn : I have seen the source codes in ./lib/rpn/generate_anchors.py. It shows how to generate the anchors and I have some questions about that. Do 16 * 16 base anchors correspond to the feature maps of last convolutional layer ?
In addition, I notice the anchor_target_layer.py is used when training and the proposal_target_layer.py is used when testing. I do not understand these two layers. I guess that training process make the weights fit the different scales and aspect ratios, so in testing the anchor_target_layer.py is not used?

@happyharrycn
Copy link

The 16*16 case comes from the down-sample factor in conv5. This number also controls the scale. anchor_target_layer is used to match gt object boxes to proposals and generates training labels of the proposals, which are further needed by the loss function. It is thus not required during testing.

@JohnnyY8
Copy link
Author

Yes, as you said, anchor_target_layer is not required during testing.
But I still do not understand what do anchors do? In table1, 128^2 means 128 * 128 pixels in a anchor? I think that from sliding windows to proposals is achieved by two convolutional layers. So how do anchors affect the proposals? Could you please give me some details about this part of process?
Thank you! @happyharrycn

@wangfeng1981
Copy link

wangfeng1981 commented Jul 23, 2017

I am confusing why an anchor of 128x128 make a proposal of 188x111.

@zhenni
Copy link

zhenni commented Jul 24, 2017

@wangfeng1981

In ./lib/rpn/generate_anchors.py, you can find that the anchor 128x128 with ration 2:1 is [-83 -39 100 56] whose size is 184x96. And JohnnyY8 calculated an average size of the proposals. So it will have some differences with the actual sizes. Also, you can change the anchor sizes if your objects have some special sizes, which may give you a more accurate results.

@JohnnyY8

Although you do not need to use anchor_target_layer, when you test proposal_layer also do the task, add classification and regression using scores and regression targets from the prediction layers.

In ./lib/rpn/generate_anchors.py, they show the anchors as follows.

#    anchors =
#
#       -83   -39   100    56
#      -175   -87   192   104
#      -359  -183   376   200
#       -55   -55    72    72
#      -119  -119   136   136
#      -247  -247   264   264
#       -35   -79    52    96
#       -79  -167    96   184
#      -167  -343   184   360

@shamanez
Copy link

shamanez commented Jul 24, 2017

@zhenni

What are ** height_stride , width_stride** parameters in anchor generator .

Here

@zhenni
Copy link

zhenni commented Jul 24, 2017

@shamanez
The anchors behave like sliding windows. The strides are the distances between the two adjacent anchors/sliding windows vertically and horizontally.

For example, see how it works in lib/rpn/proposal_layer.py (where _feat_stride does not distinguish the horizontal and vertical strides.)

        anchor_scales = layer_params.get('scales', cfg.ANCHOR_SCALES)
        self._anchors = generate_anchors(scales=np.array(anchor_scales))
        # 1. Generate proposals from bbox deltas and shifted anchors
        height, width = scores.shape[-2:]

        # Enumerate all shifts
        shift_x = np.arange(0, width) * self._feat_stride
        shift_y = np.arange(0, height) * self._feat_stride
        shift_x, shift_y = np.meshgrid(shift_x, shift_y)
        shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                            shift_x.ravel(), shift_y.ravel())).transpose()

        # Enumerate all shifted anchors:
        #
        # add A anchors (1, A, 4) to
        # cell K shifts (K, 1, 4) to get
        # shift anchors (K, A, 4)
        # reshape to (K*A, 4) shifted anchors
        A = self._num_anchors
        K = shifts.shape[0]
        anchors = self._anchors.reshape((1, A, 4)) + \
                  shifts.reshape((1, K, 4)).transpose((1, 0, 2))
        anchors = anchors.reshape((K * A, 4))

@shamanez
Copy link

shamanez commented Jul 24, 2017

@zhenni

So is it like this? , RPN uses final conv layer to propose bbxs . More specifically we slide a small window (3*3) in the last conv layer. For the center of each window position we take K(9) anchors .

Now we have to map the conv feature map center pixel position in to the real image , That's what we assign with the parameter first_stage_features_stride .

And it means if we move one position in the conv feature map , it should be 16 pixels (stride) in the real image . Am I correct ?

Another question , what should be the input to the network ? What is mention by this parameter

@zhenni
Copy link

zhenni commented Jul 24, 2017

@shamanez
Yeah, I think you are right.

I guess it works for that you want to resize the input image with the initial aspect, and make sure it has a minimum dimension(height/width) of 600 pixels and a max dimension(width/height) of 1024 pixels. (Although I do not check the code in Tensorflow version, I am not very sure.) For example, if you have an image of size 400x512, it will be resized to 600x768; if the original image size is 100x512, the code will resize it to 200x1024. Something like this....

@shamanez
Copy link

shamanez commented Jul 29, 2017

@zhenni
What are these scale parameters in faster_rcnn_resnet101_pets.config .

In the paper they say the have scales squared of , 128,256 and 512 . So first I thought scales are in the fraction of 256 . But here they have four. So can you please elaborate on this .

@zhenni
Copy link

zhenni commented Jul 31, 2017

@shamanez plz check the code in grid_anchor_generator.py link

You can use different anchor sizes for the network from the paper describes. Using smaller anchors you can detect smaller objects in the pictures.

    Args:
      scales: a list of (float) scales, default=(0.5, 1.0, 2.0)
      aspect_ratios: a list of (float) aspect ratios, default=(0.5, 1.0, 2.0)
      base_anchor_size: base anchor size as height, width (
                        (length-2 float32 list, default=[256, 256])
      anchor_stride: difference in centers between base anchors for adjacent
                     grid positions (length-2 float32 list, default=[16, 16])
      anchor_offset: center of the anchor with scale and aspect ratio 1 for the
                     upper left element of the grid, this should be zero for
                     feature networks with only VALID padding and even receptive
                     field size, but may need additional calculation if other
                     padding is used (length-2 float32 tensor, default=[0, 0])

@shamanez
Copy link

shamanez commented Aug 1, 2017

@zhenni actually went through the whole repo. From what I understood this take variable size (In spacial dimensions ) input image bounded by given aspect ration . Then perform the convolution part and get the feature maps . From feature maps inorder to get scores on RPN or fee

I also went through the image resize function , and you are correct. It will keep the aspect ration of any image by keeping bound to that range. And It uses bi-linear interpolation in-order to reduce the distortion .

@shamanez
Copy link

shamanez commented Aug 1, 2017

@zhenni This is what TF repo says about the resize function 💯

  1. If the image can be rescaled so its minimum dimension is equal to the
    provided value without the other dimension exceeding max_dimension,
    then do so.
  2. Otherwise, resize so the largest dimension is equal to max_dimension.

@gentlebreeze1
Copy link

hi, my input image is 1920*1080?i change
__C.TEST.SCALES = (1080,)
__C.TEST.MAX_SIZE = 1920.
how i change anchor size in grid_anchor_generator.py @zhenni

@zhenni
Copy link

zhenni commented Jan 9, 2018

@gentlebreeze1 You can modify the function generate_anchors in lib/rpn/generate_anchors.py

@ghost
Copy link

ghost commented Mar 11, 2018

@zhenni hi zhenni, i was looking at this function about base_size. Originally it was 16 in py-faster-rcnn, but in TF version it is by default 256. I was wondering, say if we don't filter small objects (by setting RPN_MIN_SIZE: 0), do we change base_size = 0? but then i see base_size is used to create an array..
base_anchor = np.array([1, 1, base_size, base_size]) - 1

@zhenni
Copy link

zhenni commented Mar 27, 2018

@loackerc
Hi loackerc, I am not sure that I got the idea about the question.
I think the anchor size might need to be changed accord to your data.
As for the array that base_size create, base_anchor, is the left-top and right bottom point of the box. [left-top-x, left-top-y, right-bottom-x, right-bottom-y ]

@myagmur01
Copy link

myagmur01 commented Sep 6, 2018

@shamanez Anchor scales and aspect ratios are explained in Faster-RCNN paper . Check it out first, especially the experiment part on MSCOCO dataset. Here the key quota:

For the anchors, we use 3 aspect ratios and 4 scales
(adding 64), mainly motivated by handling small objects on this dataset.

So it is obvious that they use 4 scales in case of MSCOCO dataset since it contains many small objects.

@BussLightYear
Copy link

BussLightYear commented Jan 30, 2019

I'm confused with something if someone could help me... I'm currently working on this https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10 but the tutorial says the bounding boxes should be at least 33x33 pixels, I don't know why and would like to know because my bounding boxes are smaller than that. I try checking some code related to the anchors generation and found the same 16x16 anchor stride that's here. I think is somekind related to the 33x33 min size bbox but don't how. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants