New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Negative Loss, Transfer Learning/Fine-Tuning Question #6
Comments
Check this function: byol-pytorch/byol_pytorch/byol_pytorch.py Lines 36 to 39 in e25b588
I believe it is not the same as in the paper : https://arxiv.org/pdf/2006.07733.pdf Check page 28, section G.3 Loss function. Though it is in JAX (and not very clear to me), not sure if it is the same. But I could be wrong and if someone could shed some light on this I would really appreciate it! Thanks. |
@Nihal94 it should be the same, the loss function is nothing more than the negative of the cosine similarity (negative because we are trying to maximize the similarity) import torch.nn.functional as F
import torch
from torch import nn
def loss_fn(x, y):
x = F.normalize(x, dim=-1, p=2)
y = F.normalize(y, dim=-1, p=2)
return -2 * (x * y).sum(dim=-1)
def loss_fn_2(x, y):
return -2 * ((x * y).sum(dim=-1) / (x.norm(dim=-1) * y.norm(dim=-1)))
x = torch.randn(2, 4)
y = torch.randn(2, 4)
l1 = loss_fn(x, y)
l2 = loss_fn_2(x, y)
l3 = -2 * nn.CosineSimilarity(dim=-1)(x, y)
print(l1, l2, l3) |
@rsomani95 I haven't yet, I was hoping someone will! Currently doing some transformers work, but will get back to this later this week! |
@lucidrains thanks for clarifying. Now it makes sense. Also check the screenshot below, 4th epoch and that's the loss currently |
@tiredrandomuser yea, maybe I should normalize the loss so it lies between 0 and -1, just so it doesn't scare people into thinking it is broken |
@tiredrandomuser it shouldn't make a difference to training anyhow |
ok done! 1cce49d losses should fall around [-4, 0] now |
@lucidrains yes. I'm yet to check the model. Will update here soon if possible. And thanks for the quick and clean implementation |
Awesome. I should have some more time on my hands in a couple of weeks to play around with this too. The biggest hurdle is compute time. A dataset of 120,000 images takes about 1.5 hrs per epoch on a K40 on colab. |
@rsomani95 let us know! finally it seems self-supervised learning could be accessible to us mere peasants lol |
Hahahaha that's precisely how I feel |
solved with #7 ! |
good |
Hi! Thanks for sharing this repo -- really clean and easy to use.
When training using the PyTorch Lightning script from the repo, my loss is negative (and gets more negative over time) when training. Is this expected?
I'm curious to know if you've fine-tuned a pretrained model using this BYOL as the README example suggested. If yes, how were the results? Any intuition regarding how many epochs to fine-tune for?
Thanks!
The text was updated successfully, but these errors were encountered: