Possible bug in computing positional embeddings for patches #226

ByrdOfAFeather · 2022-12-29T22:16:05Z

Hi all - currently looking into fine-tuning this model and have run into an issue with images of varying different sizes.
For this example:

max_height = 192, max_width = 672, patch_size=16

The error causing line is here:

        x += self.pos_embed[:, pos_emb_ind]

(pix2tex.models.hybrid line 25 in CustomVisionTransformer forward_features)

If I have an image of size 522 x 41, this line will throw an error.
X consists of 99 patches (+ the cls tokens) making it size [100, 256]

However, the positional embedding indices are only 66 in length. I am currently investigating this issue but don't quite understand the formula used to compute how many positional embedding indicies we are going to need. Right now it is computing 66 different indicies when we should be getting 100 different indicies. I think the issue arises when convolutions from the resnet embedder overlap and the formula doesn't account for this (it requires the image to be divisible by patch_size X patch_size for this formula to work).

If anyone has any thoughts on how to fix this let me know! I'm definitely no computer vision expert but I believe a simple change to account for overlapping convolutions in the embedding may be enough to fix this!

lukas-blecher · 2022-12-30T16:22:04Z

Hello,

For an explanation of the positional embeddings see the discussion here: #130

Your router shows up when the images are not dividable by the patch size
It is more efficient to pad the images beforehand but you can also set pad to true I'm the settings file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in computing positional embeddings for patches #226

Possible bug in computing positional embeddings for patches #226

ByrdOfAFeather commented Dec 29, 2022 •

edited

lukas-blecher commented Dec 30, 2022

Possible bug in computing positional embeddings for patches #226

Possible bug in computing positional embeddings for patches #226

Comments

ByrdOfAFeather commented Dec 29, 2022 • edited

lukas-blecher commented Dec 30, 2022

ByrdOfAFeather commented Dec 29, 2022 •

edited