Custom masking #20

harsmac · 2023-11-30T14:22:35Z

Hi, thanks for the code.
You answered that we can modify the PatchShuffle class to create custom masks. However, the patch shuffle class takes the output of a Conv2d layer, making it hard to know precisely what part of the image we are masking. Is there any reason for this?

Originally posted by @wenhaowang1995 in #14 (comment)

IcarusWizard · 2023-12-01T10:54:28Z

Hi,

The PatchShuffle class is doing two things in sequence:

create the mask, in which the cnn output here is only help to specify the dimensions.
use the mask to mask out the input.

You can of course implement these two things separately with two classes or functions. I implemented in this way only for convinent. And it is different with the official implementation since when I wrote the code, the official one was not yet released.

And it is also very straightforward to understand which patch comes from which region of the image. Say your input is 224x224 image, and patch size is 14, then you will get a 16x16 grid of patches from the conv and each patch on this grid is from a 14x14 region from the original image without overlapping.

amirrezadolatpour2000 · 2024-02-17T12:24:36Z

Hi, thank you for sharing the code.
why did not you use sine-cosine positional embedding as it is mentioned in the paper?

IcarusWizard · 2024-02-17T21:40:25Z

I don't find where they mention of using sin-cos positional embedding in the paper. Actually, the original ViT paper clearly mentioned that a "learned" positional encoding is added after patchfication. Also for images, it is not necessary to use the sin-cos positional encoding since there is no extrapolation beyond the trained length. Could you point out where you read it?

amirrezadolatpour2000 · 2024-02-17T21:44:53Z

Sure, in the paper https://arxiv.org/abs/2111.06377, on page 11, first paragraph.

IcarusWizard · 2024-02-17T21:49:14Z

ah, I see. Thanks for the reference. I didn't pay much attention to this detail. But, as I said, I don't think it will make a large difference to the result. Feel free to experiment with that.

IcarusWizard · 2024-02-17T21:55:53Z

Also, I just checked their official code and they don't even follow this detail. The code uses the ViT model from timm which follows the details in the ViT paper with learned positional encoding.

amirrezadolatpour2000 · 2024-02-17T21:59:10Z

https://github.com/facebookresearch/mae/blob/main/models_mae.py
You can see that they utilized the frozen positional embedding using the sine-cosine approach.

IcarusWizard · 2024-02-17T22:06:19Z

Ah, thanks for the correction. I had looked at a wrong file. Then I don't know why they don't like to follow the ViT architecture precisely.

IcarusWizard · 2024-02-17T22:26:56Z

Oh, I don't think I followed all the details from the paper precisely. As in the readme, the purpose of this code is only to verify the idea of mae, not a replicate. For example, I think I didn't implement the normalization for reconstruction loss. There could be more details that I missed.

hugoWR · 2024-06-13T20:25:31Z

I'm my own experiments, it appears that using frozen sine-cosine positional embedding speed-up learning quite significantly. I guess it makes sense because that's one thing that the network doesn't have to learn and it can focus on reconstructing the right texture.

Anyway, I just wanted to let you know. Great repo otherwise !

This comment was marked as duplicate.

Sign in to view

Repository owner deleted a comment from jk8898 Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom masking #20

Custom masking #20

harsmac commented Nov 30, 2023 •

edited

Loading

IcarusWizard commented Dec 1, 2023

amirrezadolatpour2000 commented Feb 17, 2024

IcarusWizard commented Feb 17, 2024

amirrezadolatpour2000 commented Feb 17, 2024

IcarusWizard commented Feb 17, 2024

IcarusWizard commented Feb 17, 2024

amirrezadolatpour2000 commented Feb 17, 2024

IcarusWizard commented Feb 17, 2024

This comment was marked as duplicate.

IcarusWizard commented Feb 17, 2024

hugoWR commented Jun 13, 2024

Custom masking #20

Custom masking #20

Comments

harsmac commented Nov 30, 2023 • edited Loading

IcarusWizard commented Dec 1, 2023

amirrezadolatpour2000 commented Feb 17, 2024

IcarusWizard commented Feb 17, 2024

amirrezadolatpour2000 commented Feb 17, 2024

IcarusWizard commented Feb 17, 2024

IcarusWizard commented Feb 17, 2024

amirrezadolatpour2000 commented Feb 17, 2024

IcarusWizard commented Feb 17, 2024

This comment was marked as duplicate.

IcarusWizard commented Feb 17, 2024

hugoWR commented Jun 13, 2024

harsmac commented Nov 30, 2023 •

edited

Loading