You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.
I read your superior paper some times and was interested in 'contrastive loss' mentioned in paper, but I can't find it in the source code.
(1)Specifically ,I noticed the model used in run_oscarplus_pretrained.py is BertImgForPreTraining ,so I think it is the model class which is used for pretraining .However,I find the code of this class is similar to BERT (get sequence_output and pool_output from encoder ,then process them by BertPreTrainingHeads to get prediction_scores and seq_relationship_score ),it seems that the only difference is that BertImgForPreTraining supports image input but BERT doesn't .
Because there is only masked token loss in BERT and they're similar, I can't find where contrasive loss is .
(2)If the output of BertImgForPreTraining is just like BERT, it seems that it could process only language problems ,but it's a VLP model class ,and through its training method that judge wether object tags are changed to optimize contrastive loss ,I think its output can reflect the ability about image-text-alignment in a certain degree.I want to know which output or model class I should choose to reflect it.
In paper ,you mentioned 'apply a fully-connected (FC) layer on the top of [CLS] as a binary classifier to predict wether the pair is polluted', I only find binary classifier in ImageBertForSequenceClassification, but it is used for Image-Text Retrieval and NLVR but not pretraining , which puzzles me a lot.
The text was updated successfully, but these errors were encountered:
Hi Oscar Team,
I read your superior paper some times and was interested in 'contrastive loss' mentioned in paper, but I can't find it in the source code.
(1)Specifically ,I noticed the model used in run_oscarplus_pretrained.py is BertImgForPreTraining ,so I think it is the model class which is used for pretraining .However,I find the code of this class is similar to BERT (get sequence_output and pool_output from encoder ,then process them by BertPreTrainingHeads to get prediction_scores and seq_relationship_score ),it seems that the only difference is that BertImgForPreTraining supports image input but BERT doesn't .
Because there is only masked token loss in BERT and they're similar, I can't find where contrasive loss is .
(2)If the output of BertImgForPreTraining is just like BERT, it seems that it could process only language problems ,but it's a VLP model class ,and through its training method that judge wether object tags are changed to optimize contrastive loss ,I think its output can reflect the ability about image-text-alignment in a certain degree.I want to know which output or model class I should choose to reflect it.
In paper ,you mentioned 'apply a fully-connected (FC) layer on the top of [CLS] as a binary classifier to predict wether the pair is polluted', I only find binary classifier in ImageBertForSequenceClassification, but it is used for Image-Text Retrieval and NLVR but not pretraining , which puzzles me a lot.
The text was updated successfully, but these errors were encountered: