You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Manojbhat09 it's a common practice in NLP where you have one token that pools information from the rest of the tokens through rounds of attention, usually to classify the sentence at the end. whether it is completely necessary for ViT to work is up to debate. my take is it isn't that important. you could pool all the embeddings from the last layer and probably still get great results at scale
Really appreciate your work.
Question : As the topic.
The text was updated successfully, but these errors were encountered: