-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with Loss Function #16
Comments
@cem-sirin I partly agree with your opinion. Regarding the label term for calculating the cross entropy, both the CLIP paper and the OpenCLIP code (https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/loss.py#L92) employ hard targets. Moreover, recent work (https://openreview.net/forum?id=U_2kuqoTcB) elucidates the theoretical underpinnings of CLIP's effectiveness. Although the owner of this repo providing an explanation for the loss in the ReadMe, I remain unconvinced about the rationale to modify it. |
Hi @YichaoCai1, I suppose you are referring to the paragraph that starts with "Here's why I didn't use a simpler approach..." as the explanation for the loss function. In his explanation he gives his reason for not using a hard target, and creating (let's say) soft targets, so that he does not penalize repeating images in the same batch. The problem is that these targets are now part of the backward propagation, while that was not the case for the hard targets. I can't imagine such a thing working, have you ran a long enough training session @YichaoCai1, I am curious if it works? Also, (I skimmed the article I may be wrong) I don't think the loss function they present is meant to replicate CLIP's loss function, and they only refer to CLIP in the introduction. Anyways, I can share my loss while I was implementing CLIP-like model for something else after translate it to PyTorch, so that people can point out which exact line they disagree with. |
Hey @cem-sirin , thanks for posting this, I was confused and wondering if this repo's implementation of the loss function would really work as well. About your comment on symmetric cross entropy, doesn't this repo implement symmetric cross entropy, since it's calculating loss in both directions?:
|
Hi @cem-sirin, Apologies for my delayed response. I haven't executed the code in this repository, as the modifications presented have not convinced me of their efficacy. Additionally, the paper I mentioned serves as further evidence to support my opinion that adhering to the original implementation of the symmetric InfoNCE loss is essential, a principle that, IMHO, is not followed in this repository. |
The loss fuction implemented in this repo really with some problemes,the model not work |
Hi there,
I was searching how people implemented CLIP and found this repo. Problems/differences with the loss function based on the CLIP Paper:
Aaand on a final note: The goal of CLIP's contrastive learning is not only to align the latent space contexts. The assumption is that, via contrastive learning you will get encodings that are richer in context, which is open to debate. You mention that you have trained your model for 24 minutes on the GPU, and I think its probable that your model achieves latent space alignment and no further. Moreover, if you had trained further, you should have faced dimensional collapse due to your loss configuration.
Again thanks to OpenAI for not writing the loss function clearly, and only providing a cryptic pseudo-code. I am quite surprised that this repo is used for the conference proceedings and has ~400 stars. It kinda shows that a sizable portion of the ML community is just copy pasting code hoping that stuff will eventually work.
The text was updated successfully, but these errors were encountered: