You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I want to generate image of size 512*512, the attention module will take int(64 / 8)*(512 ** 2) = 2097152,the parameters will take a lot of memory. Is the self-attention not suitable for large image generation? How to solve this?
The text was updated successfully, but these errors were encountered:
In my view, there are two solutions. In this implementation, feature number N is equal to pixel number. However, the paper does not indicate strictly if feature should be a pixel. Thus a quick solution is 1) EntonyTang's interpretation, #1 (where the pitfall is you have only one feature. Not sure if this is correct. In this way, are attention maps as in the paper possible?) or another possible application is to 2) group a set of pixels as one feature, such as 10x10 pixels to get one attention map, to form less number of attention maps per layer.
If I want to generate image of size 512*512, the attention module will take
int(64 / 8)*(512 ** 2) = 2097152
,the parameters will take a lot of memory. Is the self-attention not suitable for large image generation? How to solve this?The text was updated successfully, but these errors were encountered: