New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] About the bounding box of the VG data #606
Comments
it seems the range of it is [0,1] |
OK,谢了老兄,我去试试可视化 |
It seems wrong, can you provide your exapmle? |
Is it possible the normalization is based on the image shape 336x336, since the cli inference return image tensor 336x336? Does that sounds more resonable? I have tried several samples, seems work. |
i did some serious digging and find out is does based on image shape 336x336(or any square), but it is NOT as simple as resizing. it is padding the shorter edge to the longer, pouring processor.image_mean to the padding area. you can referring to code here in train. the four floats are actually you can use this case to test:
simply resizing would cause the number in this region to be incomplete, after padding it is correct. |
Thank you @Maxlinn and your understanding is correct. We will need to clarify this in our revised paper to make it clearer. |
Hello, I also encountered the same problem. Does it mean that fine-tuning the description of coordinates in the dataset is based on a resolution of 336x336? |
The answer is yes, if you use a 336px clip. and if you use a 224px clip, it is 224px. The corrdinates in the instructions are percentages(between 0-1) on a padded square image. |
Thank you!! |
Question
Hi, there. I'm wondering if the bounding box [x,y,w,h] value in VG data has been modified because of the resize of the training process? Can you please elaborate on this detail? Thanks a lot!
The text was updated successfully, but these errors were encountered: