You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
You said that "In the encoder-decoder attention module, the target query can attend to all positions on the template and the search region features, thus learning robust representations for the final bounding box prediction." in your paper. How to understand that? It's really abstract for me.
Thanks for your reply!
The text was updated successfully, but these errors were encountered:
@ANdong-star Hi, this process is quite similar to that in the DETR decoder. In DETR, 100 object queries interact with the image features output by the encoder. In STARK, one target query interacts with the joint template-search features to extract the target information. Finally the box prediction head integrate the output of the encoder and the decoder to predict the final box results.
@ANdong-star Hi, this process is quite similar to that in the DETR decoder. In DETR, 100 object queries interact with the image features output by the encoder. In STARK, one target query interacts with the joint template-search features to extract the target information. Finally the box prediction head integrate the output of the encoder and the decoder to predict the final box results.
Hi!
You said that "In the encoder-decoder attention module, the target query can attend to all positions on the template and the search region features, thus learning robust representations for the final bounding box prediction." in your paper. How to understand that? It's really abstract for me.
Thanks for your reply!
The text was updated successfully, but these errors were encountered: