-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Objects Coordinate input #114
Comments
@chenxwh Hi,
|
Hi @logicwong, Thanks for the reply. I see in the colab, the checkpoint is Although regarding the bounding box, in the paper it says It sounds like, for each image, a list of objects with the corresponding bounding boxes are taken as input? I wonder how the bounding box are used as input, whether you could point me to it in the code, since I see most tasks only take patched images as input? Thank you in advance! |
@chenxwh |
@logicwong Hi,
|
|
Hi @logicwong, Thank you for the clarification, although I think in vqa the Another question of the code, not really related to coordinate input, I see there is Thank you! |
@chenxwh Oh... You are right, in VQA the |
Hi,
Congratulations on the ICML acceptance!
I would like to feed the model several sets of coordinate information with the input image and ask question about the object specified in the coordinates, for example,
what are person1 (corresponding to coord1) and person2 (corresponding to coord2) doing?
, it is possible for OFA to attend to the objects with the coordinate information? If so, what would be the best input format for this?Having read the paper I think the grounded captioning in the pre-training task might be most relevant, but I don't see such examples in the
pretrain_data_examples
, it is still not clear what the best practice to feed the model with multiple coord info in one example. Also I fail to replicate the results shown in Figure 10 from the Appendix, grounded question answering, which model was used for these? And is the input exactly in the format as shown under the images, e.g.what color is the car in the region? region: <loc301> <loc495> <loc501> <loc596>
? I assume the 301, 495, 501, 596 are thex1 y1, x2, y2
coordinates? I tried to ask questions about regions this way on customised images but it does not seem to focus on the region provided.Thanks!
The text was updated successfully, but these errors were encountered: