`use_linear_attn = True` produce noise and unstable loss #55

ken012git · 2022-06-13T21:39:42Z

After moving from v0.0.60. to v0.1.10, I found the Imagen loss is unstable in the early training steps and the results is noisy from the early stage.

The problem is gone when I set use_linear_attn = False

The text was updated successfully, but these errors were encountered:

lucidrains · 2022-06-13T21:44:50Z

@ken012git 🙏 do you want to try 0.2.4? i think i found the issue 🤦‍♂️

ken012git · 2022-06-13T21:52:09Z

Sure! Thanks for your immediate response!

I would also like to know what causes the issue. =)

lucidrains · 2022-06-13T21:54:31Z

@ken012git forgot the residual 🤦 and also needed a feedforward after it anyways

ken012git · 2022-06-13T23:12:39Z

Hi @lucidrains ,

I have tested v0.2.4 and the issue seems gone. Thanks!

# test model, resolution 64
unet1 = Unet(
        dim = 32,
        cond_dim = 512,
        dim_mults = (1, 2, 4, 8),
        num_resnet_blocks = (2, 2, 2, 2),    # small
        layer_attns = (False, False, False, True),
        layer_cross_attns = (False, False, False, True),
       # use_linear_attn = False,        
        use_linear_attn = True,
    )

Loss curve, blue: use_linear_attn =False,red: use_linear_attn =True

early stage results, left: use_linear_attn =False,right: use_linear_attn =True

I am wondering we should use transformers or linear attention layers at this line that configured by use_linear_attn.

Would you point me relevant papers? Thanks

lucidrains · 2022-06-13T23:28:15Z

@ken012git thank you for the experiments! basically, in a lot of papers, researchers remove attention past a certain token length (1024 or 2048) since it is prohibitively expensive due to the quadratic compute. but i like to substitute them with linear attention, even if it is a bit weaker. my favorite linear attention remains https://arxiv.org/abs/1812.01243 , and here i am also giving it a depthwise conv recommended by the primer paper

ken012git mentioned this issue Jun 13, 2022

Noise in output #50

Closed

lucidrains closed this as completed Jun 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`use_linear_attn = True` produce noise and unstable loss #55

`use_linear_attn = True` produce noise and unstable loss #55

ken012git commented Jun 13, 2022

lucidrains commented Jun 13, 2022

ken012git commented Jun 13, 2022

lucidrains commented Jun 13, 2022

ken012git commented Jun 13, 2022

lucidrains commented Jun 13, 2022 •

edited

use_linear_attn = True produce noise and unstable loss #55

use_linear_attn = True produce noise and unstable loss #55

Comments

ken012git commented Jun 13, 2022

lucidrains commented Jun 13, 2022

ken012git commented Jun 13, 2022

lucidrains commented Jun 13, 2022

ken012git commented Jun 13, 2022

lucidrains commented Jun 13, 2022 • edited

`use_linear_attn = True` produce noise and unstable loss #55

`use_linear_attn = True` produce noise and unstable loss #55

lucidrains commented Jun 13, 2022 •

edited