-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BEiT #12994
Add BEiT #12994
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome addition! No big remark on my side, this looks ready to be merged soon (as long as the tests are fixed ;-) ), left a few comments.
src/transformers/models/detr/convert_detr_original_pytorch_checkpoint_to_pytorch.py
Outdated
Show resolved
Hide resolved
tests/test_modeling_beit.py
Outdated
|
||
|
||
@require_torch | ||
class BEiTModelTest(ModelTesterMixin, unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question. The tests are different for a bunch of vision models now, maybe we should have a special tester class for them and refactor the common tests of vision models there? I'm not familiar enough with how similar those tests are to be sure it's worth it, so tell me if it makes no sense.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
I've uploaded all checkpoints to the hub: https://huggingface.co/models?search=microsoft/beit I've renamed the checkpoints which are fine-tuned on ImageNet-1k (after being intermediately fine-tuned on ImageNet-22k) to be just @donglixp if you're interested, could you write model cards for these models? Model cards are READMEs that describe the models in detail. You can take inspiration from ViT's model card. Also, I do have a notebook for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall very clean! I think you can safely ignore the error linked to model templates, it's running make fixup
which is looking for a file that was deleted in this PR.
Left just a nit regarding the naming convention.
@NielsRogge great work, any news on the future PR, to add the semantic segmentation model and the pretrained Ade20k? Thanks! |
@JStumpp say no more, it's added ;) |
What does this PR do?
It adds BEiT: BERT Pre-Training of Image Transformers to the library. It's the first paper that enables self-supervised pre-trained Vision Transformers (ViTs) to outperform their supervised pre-training counterparts. As a picture says more than a thousand (or 16x16?) words, this is a good summary of the approach:
The authors used OpenAI's DALL-E's encoder to map images to tokens, which the model then needs to predict based on masked patches. There are 3 models defined:
BEiTModel
,BEiTForMaskedImageModeling
andBEiTForImageClassification
.This PR also cleans up some scripts from the library, namely those that defined id2label dicts for several datasets. I have removed
imagenet_classes.py
andcoco_classes.py
from the utils directory. Instead, id2label's are now defined on the hub in their own repository. These can then be used in conversion scripts using thehuggingface_hub
library.To do
microsoft/beit_base_patch16_224_pt22k_ft22k_to_1k
is getting out of handBEiTForMaskedImageModeling
model. For this, tagging one of the original authors: @donglixpIn a future PR, I also plan to add the semantic segmentation model, which obtains SOTA on Ade20k.