When I should use --attn-res-layers and what principle to setup values at this parameter? #31

Dok11 · 2020-12-23T10:33:00Z

Why default is [32] and when it need to increase count of items ([32,64,128] for example) or values?
I see what it use more memory so I think it must increase quality but where is tradeoff?

Mut1nyJD · 2020-12-23T12:05:22Z

Yes using attention does improve quality at least reflected in the FID scores they tend to go lower.
Tradeoff is larger more memory usage and longer training time.

Dok11 · 2020-12-23T12:25:56Z

What is better? Change [32] to [96] or [32,64]? What is the different?

woctezuma · 2020-12-23T14:01:31Z

What is better? Change [32] to [96] or [32,64]? What is the different?

I think it should be a power of 2, so 96 would not be valid.

lightweight-gan/lightweight_gan/lightweight_gan.py

Lines 396 to 401 in 845eb9d

    
           for (res, (chan_in, chan_out)) in zip(self.res_layers, in_out_features): 
        
               image_width = 2 ** res 
        
               attn = None 
        
               if image_width in attn_res_layers: 
        
                   attn = Rezero(GSA(dim = chan_in, norm_queries = True))

Dok11 · 2020-12-23T16:32:14Z

Of course, but my question about different beween one large value vs. two smaller values

Mut1nyJD · 2020-12-23T16:54:51Z

@Dok11 I think you are misunderstanding the value, it puts multiple attention layers at the resolutions you specify into the neural network graph, so at more resolutions the better of course as you'll get attention at different levels. It's the same as convolutions. If you can only effort one it depends on your training data has it lot of global structure (then lower resolution layer is beneficial) or lot of local structure (then a higher resolution layer is more beneficial)

Dok11 · 2020-12-23T17:44:36Z

I thought same but I hoped someone can help me with some examples with this values.
For example, for some purpose we use param is [32], for other purpose/images we use params if [8,16,32,64].
Maybe have reason to create some synthetic dataset to test this parameters in practice? Like this:

Dok11 · 2021-01-23T11:14:59Z

@Mut1nyJD I still not undrestanding attn layers, but I think have reasonable question. By changing attn-res-layers from [32] to [32,64,128,256] the model file size does not increase more then two megabytes. So does it really must improve quality?
Yes, model trainig requires more memory and time. So I confused, trainig slower, but model size still same size (almost). I think it mean that model doesnt increse own possibilities. How model will make more detailed images with same size..
If you know some sources with simple description of this technique let me know please.

woctezuma · 2021-01-23T11:35:09Z

Implementation of GSA in the code is from:

https://github.com/lucidrains/global-self-attention-network
https://openreview.net/forum?id=KiFeuZu24k (rejected by ICLR 2021 by the way)

Based on lucidrains' repository, one could refer to this for prior work:

Efficient attention is an attention mechanism that substantially optimizes the memory and computational efficiency while retaining exactly the same expressive power as the conventional dot-product attention.

Apparently, it is a cheaper way to have attention.
It brings attention mechanism to the model, but does not increase its size a lot because it does not add new features, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When I should use --attn-res-layers and what principle to setup values at this parameter? #31

When I should use --attn-res-layers and what principle to setup values at this parameter? #31

Dok11 commented Dec 23, 2020

Mut1nyJD commented Dec 23, 2020 •

edited

Loading

Dok11 commented Dec 23, 2020

woctezuma commented Dec 23, 2020

Dok11 commented Dec 23, 2020

Mut1nyJD commented Dec 23, 2020

Dok11 commented Dec 23, 2020

Dok11 commented Jan 23, 2021

woctezuma commented Jan 23, 2021 •

edited

Loading

When I should use --attn-res-layers and what principle to setup values at this parameter? #31

When I should use --attn-res-layers and what principle to setup values at this parameter? #31

Comments

Dok11 commented Dec 23, 2020

Mut1nyJD commented Dec 23, 2020 • edited Loading

Dok11 commented Dec 23, 2020

woctezuma commented Dec 23, 2020

Dok11 commented Dec 23, 2020

Mut1nyJD commented Dec 23, 2020

Dok11 commented Dec 23, 2020

Dok11 commented Jan 23, 2021

woctezuma commented Jan 23, 2021 • edited Loading

Mut1nyJD commented Dec 23, 2020 •

edited

Loading

woctezuma commented Jan 23, 2021 •

edited

Loading