Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch formation potentially causing false negatives #6

Closed
wpeebles opened this issue Jan 28, 2018 · 1 comment
Closed

Batch formation potentially causing false negatives #6

wpeebles opened this issue Jan 28, 2018 · 1 comment

Comments

@wpeebles
Copy link

wpeebles commented Jan 28, 2018

Since for each image in MS-COCO there are 5 captions, I believe in data.py that when a batch is formed, it is possible that two or more of the images in that batch will be identical (they will just be paired with different captions). Since the ContrastiveLoss implementation assumes only the diagonal of the scores matrix represents scores for aligned images and captions, doesn't this mean it is possible for images and captions that are aligned in the dataset to be treated as unaligned when computing/ backpropagating the loss? Here is an example to illustrate this idea:

Consider a batch size of 128. Perhaps the 5th and 19th image selected in the batch are identical (the 5th and 19th captions selected are different, but describe the same image). In the scores matrix in the forward method of ContrativeLoss, the (5, 5) and (19, 19) entries will be correctly treated as scores for aligned embeddings. However, the (5, 19) and (19, 5) entries will be incorrectly treated as scores for unaligned embeddings.

Did I misunderstand anything with the code? If not, I believe this would affect the cost_s portion of ContrastiveLoss but not the cost_im portion.

@fartashf
Copy link
Owner

This is a simplification that in practice does not hurt the training at the scale of MS-COCO.

Particularly, if you look at the loss over the whole mini-batch, the incorrect terms cancel out. In your example, the loss for (i_5, c_19) says I want image i_5 to be closer to c_5 than to c_19 and (i_19, c_5) which is in fact (i_5, c_5) says I want i_5 to be closer to c_19 than to c_5. The gradients from these terms are theoretically exactly the opposite of each other and would cancel out. This is true for both portions of the loss.

Such a simplification would hurt though if the probability of sampling such opposing pairs is high. In that case, the gradient from one mini-batch is accumulated over only a few effective examples and hence the variance of the estimate of the gradient would be high.

Just as a simple test, I tried using a mask to filter out such opposing terms but it did not help.

Besides, this simplification has been typically done in previous work. Take a look at these for example:
https://github.com/ryankiros/visual-semantic-embedding/blob/master/homogeneous_data.py
https://github.com/ryankiros/visual-semantic-embedding/blob/master/datasets.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants