Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative Loss, Transfer Learning/Fine-Tuning Question #6

Closed
rsomani95 opened this issue Jun 22, 2020 · 13 comments
Closed

Negative Loss, Transfer Learning/Fine-Tuning Question #6

rsomani95 opened this issue Jun 22, 2020 · 13 comments

Comments

@rsomani95
Copy link

Hi! Thanks for sharing this repo -- really clean and easy to use.

When training using the PyTorch Lightning script from the repo, my loss is negative (and gets more negative over time) when training. Is this expected?
Screenshot 2020-06-22 at 6 23 47 PM


I'm curious to know if you've fine-tuned a pretrained model using this BYOL as the README example suggested. If yes, how were the results? Any intuition regarding how many epochs to fine-tune for?

Thanks!

@nihal1294
Copy link

nihal1294 commented Jun 22, 2020

Check this function:

def loss_fn(x, y):
x = F.normalize(x, dim=-1, p=2)
y = F.normalize(y, dim=-1, p=2)
return -2 * (x * y).sum(dim=-1)

I believe it is not the same as in the paper : https://arxiv.org/pdf/2006.07733.pdf

Check page 28, section G.3 Loss function. Though it is in JAX (and not very clear to me), not sure if it is the same. But I could be wrong and if someone could shed some light on this I would really appreciate it!

Thanks.

@lucidrains
Copy link
Owner

@Nihal94 it should be the same, the loss function is nothing more than the negative of the cosine similarity (negative because we are trying to maximize the similarity)

import torch.nn.functional as F
import torch
from torch import nn

def loss_fn(x, y): 
     x = F.normalize(x, dim=-1, p=2) 
     y = F.normalize(y, dim=-1, p=2) 
     return -2 * (x * y).sum(dim=-1)

def loss_fn_2(x, y):
    return -2 * ((x * y).sum(dim=-1) / (x.norm(dim=-1) * y.norm(dim=-1)))

x = torch.randn(2, 4)
y = torch.randn(2, 4)

l1 = loss_fn(x, y)
l2 = loss_fn_2(x, y)
l3 = -2 * nn.CosineSimilarity(dim=-1)(x, y)

print(l1, l2, l3)

@lucidrains
Copy link
Owner

@rsomani95 I haven't yet, I was hoping someone will! Currently doing some transformers work, but will get back to this later this week!

@nihal1294
Copy link

@lucidrains thanks for clarifying. Now it makes sense. Also check the screenshot below, 4th epoch and that's the loss currently

image

@lucidrains
Copy link
Owner

lucidrains commented Jun 22, 2020

@tiredrandomuser yea, maybe I should normalize the loss so it lies between 0 and -1, just so it doesn't scare people into thinking it is broken

@lucidrains
Copy link
Owner

@tiredrandomuser it shouldn't make a difference to training anyhow

@lucidrains
Copy link
Owner

ok done! 1cce49d losses should fall around [-4, 0] now

@nihal1294
Copy link

@lucidrains yes. I'm yet to check the model. Will update here soon if possible. And thanks for the quick and clean implementation

@rsomani95
Copy link
Author

@rsomani95 I haven't yet, I was hoping someone will! Currently doing some transformers work, but will get back to this later this week!

Awesome. I should have some more time on my hands in a couple of weeks to play around with this too. The biggest hurdle is compute time. A dataset of 120,000 images takes about 1.5 hrs per epoch on a K40 on colab.

@lucidrains
Copy link
Owner

lucidrains commented Jun 22, 2020

@rsomani95 let us know! finally it seems self-supervised learning could be accessible to us mere peasants lol

@rsomani95
Copy link
Author

Hahahaha that's precisely how I feel

lucidrains added a commit that referenced this issue Jun 24, 2020
@lucidrains
Copy link
Owner

solved with #7 !

@AderonHuang
Copy link

good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants