Possible bug in latent vector loss calculation? #34

walmsley · 2021-02-15T01:00:21Z

I'm confused by this, and wondering if it could be a bug? It seems as though latents is of size (32,128), which means that for array in latents: iterates 32 times. However, the results from these iterations aren't stored anywhere, so they are at best a waste of time and at worst causing a miscalculation. Perhaps the intention was to accumulate the kurtoses and skews for each array in latents, and then computing lat_loss using all the accumulated values?

for array in latents:
    mean = torch.mean(array)
    diffs = array - mean
    var = torch.mean(torch.pow(diffs, 2.0))
    std = torch.pow(var, 0.5)
    zscores = diffs / std
    skews = torch.mean(torch.pow(zscores, 3.0))
    kurtoses = torch.mean(torch.pow(zscores, 4.0)) - 3.0

lat_loss = lat_loss + torch.abs(kurtoses) / num_latents + torch.abs(skews) / num_latents

Occurs at https://github.com/lucidrains/big-sleep/blob/main/big_sleep/big_sleep.py#L211

The text was updated successfully, but these errors were encountered:

walmsley · 2021-02-15T01:25:50Z

Digging around further in the loss calculation code, I'm also curious about two other points:

Is this supposed to be torch.abs(torch.mean(latents, dim=1)).mean()? Otherwise I believe the outer mean is redundant because the inner mean has already reduced everything to a single value.

big-sleep/big_sleep/big_sleep.py

Line 199 in 3531538

torch.abs(torch.mean(latents)).mean() + \
The class loss term processes soft_one_hot_classes of size (32,1000) but if I'm reading it correctly, the zero index [0] extracts only the first of 32 arrays from the output of the topk() operation in topk(soft_one_hot_classes, ..., dim=1)[0].
a) Is it intentional that the loss only cares about the first array of soft_one_hot_classes and ignores the other 31?
b) If so, then the topk operation looks like it could be made faster by only operating on topk(soft_one_hot_classes[0], ...) instead of topk(soft_one_hot_classes, ..., dim=1)[0]

big-sleep/big_sleep/big_sleep.py

Line 212 in 3531538

cls_loss = ((50 * torch.topk(soft_one_hot_classes, largest = False, dim = 1, k = 999)[0]) ** 2).mean()

lucidrains · 2021-02-15T02:40:12Z

@walmsley Hi Will! You caught a bug in regards to your first comment! I've fixed it in 0.5.1 🙏

For your second comment, I double checked the original colab from Ryan, and that's what he has there, so you may want to redirect your question to him (he originally devised this technique)

As for #2, topk actually returns a tuple, the first element being the actual topk values, and the second being the topk indices. The [0] is there to capture the values, not for referencing the first latent

lucidrains · 2021-02-15T02:47:24Z

@walmsley going to go with your suggestion for #1, because I think you are right :) you should send Ryan a poke and let him know!

walmsley · 2021-02-15T03:44:41Z

That fully addresses the above points, thanks!

BUT also perhaps most significantly, there is a deeper possible bug that I'm curious about. I was trying to understand why num_latents is set to 32 here:

big-sleep/big_sleep/big_sleep.py

Lines 89 to 92 in 31fa846

    
           class Latents(torch.nn.Module): 
        
               def __init__( 
        
                   self, 
        
                   num_latents = 32,

Diving deeper, it seems as though these 32 different vectors are only actually used within cond_vector here:

big-sleep/big_sleep/biggan.py

Line 510 in 31fa846

z = self.gen_z(cond_vector[0].unsqueeze(0))

and here:

big-sleep/big_sleep/biggan.py

Line 520 in 31fa846

z = layer(z, cond_vector[i+1].unsqueeze(0), truncation)

I debugged the loop surrounding line 520 above (using the current 512px BigGAN model), and found that the model only actually contains 15 layers; of those 15, only 14 of those layers are GenBlock layers which trigger line 520.

The result is that of the 32 latent vectors we create, only indices {0,1,2,3,4,5,6,7,8,10,11,12,13,14,15} are actually ever used. This wouldn't be a problem, except that the remaining 17 unused latent vectors may still be influencing the loss calculation. I'm still trying to work out whether their influence on the loss calculation is significant enough to merit fixing this, because the fix would be slightly nontrivial as it varies depending on the size of BigGAN model chosen.

lucidrains · 2021-02-15T03:55:10Z

@walmsley it's not a big deal, because there's only like less than 3 BigGAN models, and can just store a dictionary of the GAN size -> num latents somewhere

walmsley · 2021-02-15T04:15:17Z

Okay so I currently believe that a solution could look like this:

count the number of GenBlock layers by obtaining len(self.model.biggan.config.layers) (e.g. 14 layers), and add 1 to account for the 0th index being used for line 510. AKA num_latents = len(self.model.biggan.config.layers) + 1 which is typically 15.

modify the loop around line 520 to avoid incrementing the index when we arrive at a non-GenBlock layer:

big-sleep/big_sleep/biggan.py

Lines 518 to 523 in 31fa846

    
           for i, layer in enumerate(self.layers): 
        
               if isinstance(layer, GenBlock): 
        
                   z = layer(z, cond_vector[i+1].unsqueeze(0), truncation) 
        
                   # z = layer(z, cond_vector[].unsqueeze(0), truncation) 
        
               else: 
        
                   z = layer(z)

... changed to something like:

        next_available_latent_index = 1
        for layer in self.layers:
            if isinstance(layer, GenBlock):
                z = layer(z, cond_vector[next_available_latent_index].unsqueeze(0), truncation)
                next_available_latent_index += 1
            else:
                z = layer(z)

htoyryla · 2021-02-15T10:24:48Z

I wonder how much it helps to calculate the skews and kurtoses for the relevant latents only, when the mean and std of all 32 latents are anyway used for the same loss here

big-sleep/big_sleep/big_sleep.py

Lines 198 to 200 in 31fa846

    
           lat_loss =  torch.abs(1 - torch.std(latents, dim=1)).mean() + \ 
        
                       torch.abs(torch.mean(latents, dim = 1)).mean() + \ 
        
                       4 * torch.max(torch.square(latents).mean(), latent_thres)

Came to think about this while experimenting on image search from BigGAN using the same code as a basis. Perhaps we should instead simply dimension the Latents object with the correct amount to begin with.

Also we are still missing the proper way to accumulate the skews and kurtoses from each latent. Simply add up their absolute values (that's what I am doing right now) ? The inner part of the loop anyway is the same as here https://discuss.pytorch.org/t/statistics-for-whole-dataset/74511

Or, alternatively, skew and kurtosis might not be so important here either, if it had worked so well as it is. Anyway, I like to experiment. Changing the loss function a bit might not make it objectively better, but anyhow give visually different results (which is what matters to me, visual diversity to be explored).

walmsley · 2021-02-15T13:43:45Z

Perhaps we should instead simply dimension the Latents object with the correct amount to begin with.

Yes, that was the intention, thanks for clarifying – num_latents = len(self.model.biggan.config.layers) + 1 should be used to fix num_latents from 32->15 at the source, not just for the sake of kurtoses/skews.

Also we are still missing the proper way to accumulate the skews and kurtoses from each latent

Note that this was fixed with release 0.5.1 as mentioned by @lucidrains

htoyryla · 2021-02-15T13:58:45Z

Yes, that was the intention, thanks for clarifying – num_latents = len(self.model.biggan.config.layers) + 1 should be used to fix num_latents from 32->15 at the source, not just for the sake of kurtoses/skews.

To be precise, that must be done when the Latents object is instantiated, either by this default

big-sleep/big_sleep/big_sleep.py

Line 92 in 31fa846

num_latents = 32,

or here

big-sleep/big_sleep/big_sleep.py

Lines 129 to 132 in 31fa846

    
           self.latents = Latents( 
        
               max_classes = self.max_classes, 
        
               class_temperature = self.class_temperature 
        
           )

Or just use latents[:num_latents] in calculating the mean and the std.

htoyryla · 2021-02-15T14:04:41Z

Also we are still missing the proper way to accumulate the skews and kurtoses from each latent

Note that this was fixed with release 0.5.1 as mentioned by @lucidrains

I don't think anything was done to this. As far as I know, @lucidrains stated the code comes directly from Ryan (so I sent him a link to this discussion).

What I have currently is (but can't see any major effect... it might be that the whole skewness etc factor is not so critical here?). My application is not big_sleep but image search which explains the small difference (latents vs. lats.normu).

skews = 0
    kurtoses = 0
    for array in lats.normu:
            mean = torch.mean(array)
            diffs = array - mean
            var = torch.mean(torch.pow(diffs, 2.0))
            std = torch.pow(var, 0.5)
            zscores = diffs / std
            skews += torch.abs(torch.mean(torch.pow(zscores, 3.0)))
            kurtoses +=  torch.abs(torch.mean(torch.pow(zscores, 4.0)) - 3.0)

walmsley · 2021-02-15T14:10:30Z

@htoyryla the skew/kurtosis fix was subtle, it's just the change in indentation of line 211 here: 226b973#diff-a32d425a1d65b549cda9588699a004a9d283f46d0623256309606cc74f8d3dd8R211

htoyryla · 2021-02-15T14:15:02Z

@htoyryla the skew/kurtosis fix was subtle, it's just the change in indentation of line 211 here: 226b973#diff-a32d425a1d65b549cda9588699a004a9d283f46d0623256309606cc74f8d3dd8R211

I see. I was only looking at the newest commit, with your name mentioned, did not realise this commit was also so recent. I therefore thought that the error Phil had fixed was the missing dim=1 :)

Looks like the code with indent is more or less equivalent to my solution.

htoyryla · 2021-02-15T14:42:43Z

BTW, my interest in this loss calculation comes from my ongoing experiment to skip the one-hot coded class label in BigGAN here

big-sleep/big_sleep/biggan.py

Lines 574 to 575 in 31fa846

    
           embed = self.embeddings(class_label) 
        
           cond_vector = torch.cat((z, embed), dim=1)

and instead use a 128 element class embedding. I modified the loss to use similar code for the class embedding as for the latents, and then I noticed the loss was not working correctly at all, but pushing the values to a wrong direction.

walmsley · 2021-02-16T02:30:14Z

Created new PR with proposed final fix @ #35

Overall status of 4 possible bugs mentioned in this issue:

kurtosis/skew accumulation (fixed in release 0.5.1)
latent loss mean(dim=1) (fixed in release 0.5.2)
class loss topk[0] (not a bug, no fix needed)
num_latents 32 -> 15 (fixed in release 0.5.3)

walmsley mentioned this issue Feb 16, 2021

Fixing num_latents from 32 to 15 #35

Merged

walmsley closed this as completed Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in latent vector loss calculation? #34

Possible bug in latent vector loss calculation? #34

walmsley commented Feb 15, 2021 •

edited

Loading

walmsley commented Feb 15, 2021 •

edited

Loading

lucidrains commented Feb 15, 2021

lucidrains commented Feb 15, 2021 •

edited

Loading

walmsley commented Feb 15, 2021 •

edited

Loading

lucidrains commented Feb 15, 2021

walmsley commented Feb 15, 2021 •

edited

Loading

htoyryla commented Feb 15, 2021 •

edited

Loading

walmsley commented Feb 15, 2021 •

edited

Loading

htoyryla commented Feb 15, 2021 •

edited

Loading

htoyryla commented Feb 15, 2021 •

edited

Loading

walmsley commented Feb 15, 2021

htoyryla commented Feb 15, 2021 •

edited

Loading

htoyryla commented Feb 15, 2021 •

edited

Loading

walmsley commented Feb 16, 2021 •

edited

Loading

Possible bug in latent vector loss calculation? #34

Possible bug in latent vector loss calculation? #34

Comments

walmsley commented Feb 15, 2021 • edited Loading

walmsley commented Feb 15, 2021 • edited Loading

lucidrains commented Feb 15, 2021

lucidrains commented Feb 15, 2021 • edited Loading

walmsley commented Feb 15, 2021 • edited Loading

lucidrains commented Feb 15, 2021

walmsley commented Feb 15, 2021 • edited Loading

htoyryla commented Feb 15, 2021 • edited Loading

walmsley commented Feb 15, 2021 • edited Loading

htoyryla commented Feb 15, 2021 • edited Loading

htoyryla commented Feb 15, 2021 • edited Loading

walmsley commented Feb 15, 2021

htoyryla commented Feb 15, 2021 • edited Loading

htoyryla commented Feb 15, 2021 • edited Loading

walmsley commented Feb 16, 2021 • edited Loading

walmsley commented Feb 15, 2021 •

edited

Loading

walmsley commented Feb 15, 2021 •

edited

Loading

lucidrains commented Feb 15, 2021 •

edited

Loading

walmsley commented Feb 15, 2021 •

edited

Loading

walmsley commented Feb 15, 2021 •

edited

Loading

htoyryla commented Feb 15, 2021 •

edited

Loading

walmsley commented Feb 15, 2021 •

edited

Loading

htoyryla commented Feb 15, 2021 •

edited

Loading

htoyryla commented Feb 15, 2021 •

edited

Loading

htoyryla commented Feb 15, 2021 •

edited

Loading

htoyryla commented Feb 15, 2021 •

edited

Loading

walmsley commented Feb 16, 2021 •

edited

Loading