Positional embeddings for different image sizes #7

PhilippMarquardt · 2021-03-02T12:51:43Z

Hi,
once again thanks for your great work! Since I want to use the axial attention with positional embedding for unknown image sizes (But I know the max size), I was wondering if you think that changing https://github.com/lucidrains/axial-attention/blob/master/axial_attention/axial_attention.py#L104 to

for cnt, param in enumerate(self.params):
    x = x + param[([slice(None)] * (cnt + 2) + [slice(x.shape[cnt + 2])])]

does the right thing. I can now do this

v = AxialImageTransformer(64, depth = 1, axial_pos_emb_shape = (64,64), dim_index = 1)       
t1 = torch.randn(2, 64, 17, 16)
t2 = torch.randn(2, 64, 13, 18)
t3 = torch.randn(2, 64, 64, 64)
print(v(t1).shape)
print(v(t2).shape)
print(v(t3).shape)
Output:
torch.Size([2, 64, 17, 16])
torch.Size([2, 64, 13, 18])
torch.Size([2, 64, 64, 64])

I think that makes it easier to integrate it in fully convolutional nets for multi scale training.

lucidrains · 2021-03-04T19:11:48Z

@PhilippMarquardt Hey! So if you read the vision transformers paper, they actually try to have the positional embedding generalize to different sizes by interpolating them. It depends on what your goal is

PhilippMarquardt closed this as completed Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Positional embeddings for different image sizes #7

Positional embeddings for different image sizes #7

PhilippMarquardt commented Mar 2, 2021 •

edited

lucidrains commented Mar 4, 2021

Positional embeddings for different image sizes #7

Positional embeddings for different image sizes #7

Comments

PhilippMarquardt commented Mar 2, 2021 • edited

lucidrains commented Mar 4, 2021

PhilippMarquardt commented Mar 2, 2021 •

edited