You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the code above-cited, image_feat_m is $I_m \in R^{n\times d}$, text_feat_all is $T_a \in R^{d\times(n+n_q)}$, sim_targets noted $y \in R^{n\times(n+n_q)}$, $p^{\text{i2t}}(I)=\mathop{\text{softmax}}(S(I,T_a))$, $q^\text{i2T}(I)=\mathop{\text{softmax}}(S(I_m,T_a))$. Here, $_m$ means momentum.
Suppose that $n$batch_size = 2 , queue_size = 2, so $n_q = 2 \times 2 = 4$.
The first term is not a KL divergence between $q$ and $p$, i.e., a self-entropy term lost. So, does this affect the performance of ALBEF? I think it should be a good regularization term.
The text was updated successfully, but these errors were encountered:
Add the Negative Entropy of sim_i2t_m to the total loss by multiplying the coefficient $\alpha$. You will get the KL divergence of $q$ and $p$. Some equations for reference:
In the loss computation process in
[ALBEF](https://github.com/salesforce/ALBEF/blob/b9727e43c3040491774d1b22cc27718aa7772fac/models/model_pretrain.py#L103C3-L103C3)
, the computation is a little different to the raw paper.Let's take
loss_i2t
for example.This loss should be
In the code above-cited,$I_m \in R^{n\times d}$ , $T_a \in R^{d\times(n+n_q)}$ , $y \in R^{n\times(n+n_q)}$ , $p^{\text{i2t}}(I)=\mathop{\text{softmax}}(S(I,T_a))$ , $q^\text{i2T}(I)=\mathop{\text{softmax}}(S(I_m,T_a))$ . Here, $_m$ means momentum.
image_feat_m
istext_feat_all
issim_targets
notedSuppose that$n$ $n_q = 2 \times 2 = 4$ .
batch_size = 2
,queue_size = 2
, soThe first term is not a KL divergence between$q$ and $p$ , i.e., a self-entropy term lost. So, does this affect the performance of
ALBEF
? I think it should be a good regularization term.The text was updated successfully, but these errors were encountered: