Masked vision transformer #1482

ersi-lightly · 2024-01-22T09:27:25Z

Creates a common backbone for any architecture that uses some sort of masked variant of a Vision Transformer.

codecov · 2024-01-22T09:28:24Z

Codecov Report

Attention: 191 lines in your changes are missing coverage. Please review.

Comparison is base (a583747) 84.37% compared to head (e712bb5) 81.62%.

❗ Current head e712bb5 differs from pull request most recent head df6732f. Consider uploading reports for the commit df6732f to get more accurate results

Files	Patch %	Lines
...y/models/modules/masked_vision_transformer_timm.py	0.00%	99 Missing ⚠️
...s/modules/masked_vision_transformer_torchvision.py	0.00%	69 Missing ⚠️
...ightly/models/modules/masked_vision_transformer.py	0.00%	16 Missing ⚠️
lightly/models/utils.py	16.66%	5 Missing ⚠️
lightly/models/modules/masked_autoencoder_timm.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@                             Coverage Diff                             @@
##           ersi-lig-3910-update-mae-benchmark-code    #1482      +/-   ##
===========================================================================
- Coverage                                    84.37%   81.62%   -2.75%     
===========================================================================
  Files                                          134      137       +3     
  Lines                                         5690     5862     +172     
===========================================================================
- Hits                                          4801     4785      -16     
- Misses                                         889     1077     +188

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

guarin

Left some comments :)

lightly/models/modules/masked_vision_transformer_timm.py

lightly/models/modules/masked_vision_transformer.py

ersi-lightly · 2024-01-24T17:03:04Z

Left some comments :)

Many thanks Guarin! Although the PR was not ready for review yet :)

guarin

Thanks for the update!

lightly/models/modules/masked_autoencoder_timm.py

lightly/models/modules/masked_vision_transformer.py

benchmarks/imagenet/vitb16/mae.py

lightly/models/modules/masked_vision_transformer_timm.py

benchmarks/imagenet/vitb16/mae.py

benchmarks/imagenet/vitb16/finetune_eval.py

guarin

🎉

* modified imagenette benchmark * formatted * edited vitb16 benchmark * added the posibility to handle images of different sizes * formatted * removed comments * revert * changed import * initialize class token * specified that class token should be used * chabged architecture * addressed comments * formatted * Masked vision transformer (#1482) * added hackathon * changed comments * formatted * addressed comments * fixed typing * addressed comments * added pre-norm and fixed arguments * added masked vision transformer with Torchvision * weight initialization * cleanup * modifies imagenette benchmark * made mask token optional and adapted benchmarks * removed unused import * adapted to dynamic image size * moved positional embed init to utils * updated benchmark * adapted benchmark * moved mask token to decoder * revert example * removed example * removed file * inheriting from Module * reverted dataset paths * use timm's drop_path_rate * removed unused import * removed private method * changed slicing * formatted * path dropout only for fine tune * formatted * account for mask token in backbone * mask token of decoder * removed appending of mask token in params

* Add MAE evaluation * Add stochastic depth dropout * Add MAE * Drop assertion * Fix smooth cross entropy loss and mixup * Update comments * Add layer lr decay and weight decay * Update comment * Add test for MAE images_to_tokens * Disable BN update * Add BN before classification head * Format * Fix BN freezing * Cleanup * Use torch.no_grad instead of deactivating gradients manually * This is required as torch.no_grad doesn't change the model configuration while manual gradient deactivation/activation can have unintended consequences. For example, MAE ViT positional embeddings are parameters with requires_grad=False that should never receive an update. But if we use activate_requires_grad for finetuning we break those parameters. * Create new stochastic depth instances * Add mask token to learnable params * Add sine-cosine positional embedding * Initialize parameters as in paper * Fix types * Format * adjusted to existing interface * draft * remove * added modifications * added mae implementation with timm and example * formatted * fixed import * removed * fixed typing * addressed comments * fixed typing and formatted * addressed comments * added docstring and formatted * removed images to tokens method * Ersi lig 3910 update mae benchmark code (#1468) * modified imagenette benchmark * formatted * edited vitb16 benchmark * added the posibility to handle images of different sizes * formatted * removed comments * revert * changed import * initialize class token * specified that class token should be used * chabged architecture * addressed comments * formatted * Masked vision transformer (#1482) * added hackathon * changed comments * formatted * addressed comments * fixed typing * addressed comments * added pre-norm and fixed arguments * added masked vision transformer with Torchvision * weight initialization * cleanup * modifies imagenette benchmark * made mask token optional and adapted benchmarks * removed unused import * adapted to dynamic image size * moved positional embed init to utils * updated benchmark * adapted benchmark * moved mask token to decoder * revert example * removed example * removed file * inheriting from Module * reverted dataset paths * use timm's drop_path_rate * removed unused import * removed private method * changed slicing * formatted * path dropout only for fine tune * formatted * account for mask token in backbone * mask token of decoder * removed appending of mask token in params * resolved conflicts * formatted * adjusted examples * removed comment * added test * added message in case of ImportError * fixed skipping of test * removed example * handling the TIMM dependency * added note to docs for MAE installation * added unit tests for MAE with torchvision * removed unecessary maks token definition * addressed comments * moved test to separate file * added typing * fixed import * fixes typing * fixed typing * fixed typing * Ersi lig 4471 cleanup and merge mae branch (#1510) * renamed test class * fixed imports * ficed imports * fixed import * fixed imports and decreased batch size * format * removed comments * use function defined in utils * added docstrings * added doctrings * added docstring * formatted * formatted * import Tensor --------- Co-authored-by: guarin <guarin@lightly.ai>

ersi-lightly added 2 commits January 19, 2024 08:55

added hackathon

e82c2f3

changed comments

a81d151

guarin requested changes Jan 22, 2024

View reviewed changes

formatted

59a88ee

ersi-lightly added 17 commits January 24, 2024 17:16

addressed comments

055a61b

fixed typing

9a3444b

addressed comments

3b987a1

added pre-norm and fixed arguments

c573d95

added masked vision transformer with Torchvision

7cd1c1a

weight initialization

8d5e799

cleanup

ee1b161

modifies imagenette benchmark

9426e9a

made mask token optional and adapted benchmarks

d4efb6b

removed unused import

b1ffbfc

adapted to dynamic image size

2fa018a

moved positional embed init to utils

b1a0937

updated benchmark

2b0116d

adapted benchmark

5bb9a4c

moved mask token to decoder

742effd

revert example

be4305e

removed example

fd3a085

ersi-lightly marked this pull request as ready for review January 29, 2024 15:33

removed file

c4b1c2c

ersi-lightly changed the base branch from master to ersi-lig-3910-update-mae-benchmark-code January 30, 2024 08:24

ersi-lightly added 4 commits January 30, 2024 09:14

resolved conflict

d3dcf00

inheriting from Module

2e59598

reverted dataset paths

82da5bd

use timm's drop_path_rate

32d4dc2

removed unused import

214f3d8

guarin requested changes Feb 6, 2024

View reviewed changes

ersi-lightly added 7 commits February 6, 2024 11:22

removed private method

5e28d1f

changed slicing

d51af90

formatted

487ce14

path dropout only for fine tune

b55960c

formatted

fd917ad

account for mask token in backbone

d10d9e9

mask token of decoder

e712bb5

guarin reviewed Feb 6, 2024

View reviewed changes

benchmarks/imagenet/vitb16/mae.py Outdated Show resolved Hide resolved

benchmarks/imagenet/vitb16/finetune_eval.py Outdated Show resolved Hide resolved

removed appending of mask token in params

df6732f

guarin approved these changes Feb 6, 2024

View reviewed changes

ersi-lightly merged commit 034a6bd into ersi-lig-3910-update-mae-benchmark-code Feb 6, 2024
6 of 8 checks passed

ersi-lightly deleted the masked_vision_transformer branch February 6, 2024 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Masked vision transformer #1482

Masked vision transformer #1482

ersi-lightly commented Jan 22, 2024

codecov bot commented Jan 22, 2024 •

edited

Loading

guarin left a comment

ersi-lightly commented Jan 24, 2024

guarin left a comment

guarin left a comment

Masked vision transformer #1482

Masked vision transformer #1482

Conversation

ersi-lightly commented Jan 22, 2024

codecov bot commented Jan 22, 2024 • edited Loading

Codecov Report

guarin left a comment

Choose a reason for hiding this comment

ersi-lightly commented Jan 24, 2024

guarin left a comment

Choose a reason for hiding this comment

guarin left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 22, 2024 •

edited

Loading