Question on the definition of visually "ungrounded" categories #14

SCZwangxiao · 2022-11-22T09:36:46Z

I agree that some categories may not provide enough aligned vision-language information for multi-modal learning. However, in the paper, you mentioned "video game commentaries" as an example.

I wonder why it is not visually grounded. The people's comments are usually related to the games. In my opinion, we could filter this category only for its unreality, which means it may not benefit downstream tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on the definition of visually "ungrounded" categories #14

Question on the definition of visually "ungrounded" categories #14

SCZwangxiao commented Nov 22, 2022

Question on the definition of visually "ungrounded" categories #14

Question on the definition of visually "ungrounded" categories #14

Comments

SCZwangxiao commented Nov 22, 2022