-
Notifications
You must be signed in to change notification settings - Fork 25.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CvT] Tensorflow implementation #18597
Conversation
The documentation is not available anymore as the PR was closed or merged. |
6ed28b5
to
8b46457
Compare
a7e2a0e
to
5509d04
Compare
Thanks for your PR @mathieujouffroy! Let me ping @amyeroberts for review :) |
You're welcome. Cool thanks, should I create an Issue ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the model and for such a nice PR ❤️ Just a few nits and one comment regarding a permutation.
We recently had the first community contributor open a PR to add weights on the hub (see PR comment here on steps). Once we've had two other 👍 we should be good to add the model weights, run the slow tests, and then merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the addition of CvT 🔥
I've added a few comments in the code, the most concerning one being the comment about the output shapes.
After the issues raised by me and @amyeroberts, we can finalize the review process by:
- tagging a core maintainer for a third review;
- opening PRs for the TF weights (I will share the instructions on how to do it :) )
850c8eb
to
6954bfa
Compare
Thanks a lot for both of your reviews 🙏 ! |
@mathieujouffroy awesome, seems like we are ready to move on to the next stage. I'm adding @sgugger as the last reviewer. Meanwhile, you can open the PR to the TF model weights on the hub as follows:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your PR! Good to merge once the weights have been added!
I am getting an error when using
It seems that both the I am trying to figure out how to correct this error (WIP). |
@mathieujouffroy ~1e-2 is quite large -- does this happen exclusively on |
Yess, unfortunately it happens across all CvT models.
I can't seem to correct this issue. I was wondering if this was due to floating points operations. |
@mattchurgin in these cases, a deep dive has to be done -- place a pair of |
Hello @gante, sorry for the late response. Is this divergence due to the momentum definition of Batch Normalization being different in tensorflow than in pytorch ? When removing the Batch Normalization layers from both frameworks, the difference in the output tensors and the hidden states is greatly reduced. I get a I'm a bit confused. Therefore I've inspected the ViT model ( Is this behaviour normal/acceptable ? |
@mathieujouffroy That's a great in-depth exploration! Previously we didn't have these checks in place, so it is possible that issues like the one you're seeing slipped through the cracks. It's not positive at all to have such large mismatches (it implies that TF users will have a poorer experience). I've had in my plans to go back and double-check the most popular models with the recently introduced checks, and you've just raised the priority of the task with your message :) I think @amyeroberts has seen similar TF/PT mismatches due to the Batch Normalization layer. @amyeroberts do you mind pitching in? |
@mathieujouffroy Thanks for all the work digging into this 🕵️ As @gante @mathieujouffroy Yes, I had similar issues with the TF ResNet port (a weights PR for reference). Like this model, the batch norm layer introduced differences which then got amplified through the forward pass. @ydshieh did some excellent detective work, and found that matching all of the parameters and inputs to produce an equivalent TF and PyTorch layer would still produce outputs with a difference on the order of Ultimately, we decided to add the weights as the difference between the logits was small ~1e-5. I think the ~1e-4 absolute differences in this case are acceptable for adding the weights. @sgugger Is this OK? |
Yes, as long as it stays in the range of 1e-4, we can accept the difference between frameworks. |
Thank you for pitching in @amyeroberts :D @mathieujouffroy feel free to use |
Hi @amyeroberts, thanks for your mention !! I've added the PR regarding the pytorch model. @gante following your recommendation I've added the weights on the hub 😊 |
@mathieujouffroy weights merged 🙌 |
Cool thanks @gante 😊 ! |
@mathieujouffroy off-topic: are you working with |
…elevant files, added an exception in convert_tf_weight_name_to_pt_weight_name, added quick testing file to compare with pytorch model
…assed formatting test
6d2770a
to
4b84c80
Compare
@gante Yess I was working with |
All seems ready, merging as soon as CI turns green. @mathieujouffroy on behalf of TF users, thank you for making the ecosystem richer 🧡 |
@gante @amyeroberts thanks a lot for your help and feedbacks !! 💛 |
* implemented TFCvtModel and TFCvtForImageClassification and modified relevant files, added an exception in convert_tf_weight_name_to_pt_weight_name, added quick testing file to compare with pytorch model * added docstring + testing file in transformers testing suite * added test in testing file, modified docs to pass repo-consistency, passed formatting test * refactoring + passing all test * small refacto, removing unwanted comments * improved testing config * corrected import error * modified acces to pretrained model archive list, to pass tf_test * corrected import structure in init files * modified testing for keras_fit with cpu * correcting PR issues + Refactoring * Refactoring : improving readability and reducing the number of permutations * corrected momentum value + cls_token initialization * removed from_pt as weights were added to the hub * Update tests/models/cvt/test_modeling_tf_cvt.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* implemented TFCvtModel and TFCvtForImageClassification and modified relevant files, added an exception in convert_tf_weight_name_to_pt_weight_name, added quick testing file to compare with pytorch model * added docstring + testing file in transformers testing suite * added test in testing file, modified docs to pass repo-consistency, passed formatting test * refactoring + passing all test * small refacto, removing unwanted comments * improved testing config * corrected import error * modified acces to pretrained model archive list, to pass tf_test * corrected import structure in init files * modified testing for keras_fit with cpu * correcting PR issues + Refactoring * Refactoring : improving readability and reducing the number of permutations * corrected momentum value + cls_token initialization * removed from_pt as weights were added to the hub * Update tests/models/cvt/test_modeling_tf_cvt.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* implemented TFCvtModel and TFCvtForImageClassification and modified relevant files, added an exception in convert_tf_weight_name_to_pt_weight_name, added quick testing file to compare with pytorch model * added docstring + testing file in transformers testing suite * added test in testing file, modified docs to pass repo-consistency, passed formatting test * refactoring + passing all test * small refacto, removing unwanted comments * improved testing config * corrected import error * modified acces to pretrained model archive list, to pass tf_test * corrected import structure in init files * modified testing for keras_fit with cpu * correcting PR issues + Refactoring * Refactoring : improving readability and reducing the number of permutations * corrected momentum value + cls_token initialization * removed from_pt as weights were added to the hub * Update tests/models/cvt/test_modeling_tf_cvt.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
What does this PR do?
This PR adds the Cvt model implementation in Tensorflow.
This includes the base model and the model with an image classification head on top.
TODO
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Questions
layer_norm_eps
is initialized at1e-12
.However, it seems that in the original implementation, the authors use
epsilon=1e-5
.Moreover, the Cvt model in pytorch (HuggingFace), does not seem to use the configuration
layer_norm_eps=1e-12
for layer normalization throughout the model, instead using the defaultepsilon=1e-5
.What is the use of layer_norm_eps in the configuration file (of the Cvt model) ?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.