-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSN (Masked Siamese Networks) for ViT #18815
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Co-authored-by: Niels <niels.rogge1@gmail.com>
@NielsRogge, after studying the pretraining script of MSN thoroughly I am still unsure of how to put together a
Both the EMA and sharpening components operate with their own schedules. Given this, I think it's best to resort to a separate pre-training script and use this model for feature extraction and fine-tuning. There's an ongoing discussion around releasing the weights of the linear classification layers and fine-tuned models. So when that's available, we could directly support those via What do you think? |
Thanks for your PR! It would be great to have the For pretraining, if multiple new pieces are needed, maybe it could go in a research project at first, where you can add more modules? |
Sounds good to me.
Sure, I will continue the work from here on then. Thank you! |
|
||
@slow | ||
def test_inference_image_classification_head(self): | ||
torch.manual_seed(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is here so as to ensure the classification head params are always initialized using the same init. Since we don't have classification head params from the authors yet, this is needed to ensure the tests are passing.
@sgugger @NielsRogge @amyeroberts ready for review. |
@sgugger @NielsRogge @amyeroberts a friendly nudge on the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more comments, then the weights should be transferred to the right org before we merge this PR.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
@sgugger addressed your comments. After the weights are transferred to the right org, I will open a PR there adding README. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice PR ❤️ All LGTM.
Thanks for your work adding this model 💪
) | ||
|
||
|
||
# Caution: We don't have the weights for the classification head yet. This class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I am contacting the model author Mido to check if they can release the classification params before we merge. Will keep y'all posted.
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Hi @sayakpaul . First, thank you for this PR 🤗 . The doctest for this model is currently failing, as
this outputs the predicted label, but there is no expected value provided. The config has Could you take a look for this config, as well as the missing expected outputs for the doctest? Thank you! Here is the failing doctest job: https://github.com/huggingface/transformers/actions/runs/3109562462/jobs/5039877349 |
The model was trained on ImageNet-1k. I will add the expected outputs. Thanks for flagging it. |
What does this PR do?
Adds the MSN checkpoints for ViT. MSN shines in the few-shot regimes which would benefit real-world use cases. Later we could add a pre-training script so that people can actually perform pre-training with MSN with their own datasets.
Closes #18758
Who can review?
@sgugger @NielsRogge @amyeroberts
TODO