-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug in latent vector loss calculation? #34
Comments
Digging around further in the loss calculation code, I'm also curious about two other points:
|
@walmsley Hi Will! You caught a bug in regards to your first comment! I've fixed it in 0.5.1 🙏 For your second comment, I double checked the original colab from Ryan, and that's what he has there, so you may want to redirect your question to him (he originally devised this technique) As for #2, topk actually returns a tuple, the first element being the actual topk values, and the second being the topk indices. The [0] is there to capture the values, not for referencing the first latent |
That fully addresses the above points, thanks! BUT also perhaps most significantly, there is a deeper possible bug that I'm curious about. I was trying to understand why big-sleep/big_sleep/big_sleep.py Lines 89 to 92 in 31fa846
Diving deeper, it seems as though these 32 different vectors are only actually used within Line 510 in 31fa846
and here: Line 520 in 31fa846
I debugged the loop surrounding line 520 above (using the current 512px BigGAN model), and found that the model only actually contains 15 layers; of those 15, only 14 of those layers are GenBlock layers which trigger line 520. The result is that of the 32 latent vectors we create, only indices {0,1,2,3,4,5,6,7,8,10,11,12,13,14,15} are actually ever used. This wouldn't be a problem, except that the remaining 17 unused latent vectors may still be influencing the loss calculation. I'm still trying to work out whether their influence on the loss calculation is significant enough to merit fixing this, because the fix would be slightly nontrivial as it varies depending on the size of BigGAN model chosen. |
@walmsley it's not a big deal, because there's only like less than 3 BigGAN models, and can just store a dictionary of the GAN size -> num latents somewhere |
Okay so I currently believe that a solution could look like this:
|
I wonder how much it helps to calculate the skews and kurtoses for the relevant latents only, when the mean and std of all 32 latents are anyway used for the same loss here big-sleep/big_sleep/big_sleep.py Lines 198 to 200 in 31fa846
Came to think about this while experimenting on image search from BigGAN using the same code as a basis. Perhaps we should instead simply dimension the Latents object with the correct amount to begin with. Also we are still missing the proper way to accumulate the skews and kurtoses from each latent. Simply add up their absolute values (that's what I am doing right now) ? The inner part of the loop anyway is the same as here https://discuss.pytorch.org/t/statistics-for-whole-dataset/74511 Or, alternatively, skew and kurtosis might not be so important here either, if it had worked so well as it is. Anyway, I like to experiment. Changing the loss function a bit might not make it objectively better, but anyhow give visually different results (which is what matters to me, visual diversity to be explored). |
Yes, that was the intention, thanks for clarifying –
Note that this was fixed with release 0.5.1 as mentioned by @lucidrains |
To be precise, that must be done when the Latents object is instantiated, either by this default big-sleep/big_sleep/big_sleep.py Line 92 in 31fa846
big-sleep/big_sleep/big_sleep.py Lines 129 to 132 in 31fa846
Or just use latents[:num_latents] in calculating the mean and the std. |
I don't think anything was done to this. As far as I know, @lucidrains stated the code comes directly from Ryan (so I sent him a link to this discussion). What I have currently is (but can't see any major effect... it might be that the whole skewness etc factor is not so critical here?). My application is not big_sleep but image search which explains the small difference (latents vs. lats.normu).
|
@htoyryla the skew/kurtosis fix was subtle, it's just the change in indentation of line 211 here: 226b973#diff-a32d425a1d65b549cda9588699a004a9d283f46d0623256309606cc74f8d3dd8R211 |
I see. I was only looking at the newest commit, with your name mentioned, did not realise this commit was also so recent. I therefore thought that the error Phil had fixed was the missing dim=1 :) Looks like the code with indent is more or less equivalent to my solution. |
BTW, my interest in this loss calculation comes from my ongoing experiment to skip the one-hot coded class label in BigGAN here Lines 574 to 575 in 31fa846
|
Created new PR with proposed final fix @ #35 Overall status of 4 possible bugs mentioned in this issue:
|
I'm confused by this, and wondering if it could be a bug? It seems as though
latents
is of size (32,128), which means thatfor array in latents:
iterates 32 times. However, the results from these iterations aren't stored anywhere, so they are at best a waste of time and at worst causing a miscalculation. Perhaps the intention was to accumulate the kurtoses and skews for each array in latents, and then computinglat_loss
using all the accumulated values?Occurs at https://github.com/lucidrains/big-sleep/blob/main/big_sleep/big_sleep.py#L211
The text was updated successfully, but these errors were encountered: