-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct order of overflowing tokens for LayoutLmV2 tokenizer #13495
Correct order of overflowing tokens for LayoutLmV2 tokenizer #13495
Conversation
Thank you very much for the PR. As it seems to me that we didn't see any tests failing for LayoutLMv2 at the time your previous PR was merged, could you confirm that this behaviour is tested for LayoutLMv2 now? As I see that in the PR you did not change any tests, I'd like to make sure we catch the problem with the tests next time 🙂 . |
@SaulLu, The test |
@SaulLu, I have added a function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for all this work @Apoorvgarg-creator. 🎊
I left you mostly comments about the testing part. Especially as LayoutLmV2 has extra boxes, I think we should also add checks on the corresponding sequences.
Finally, if I'm not mistaken, there is no test_maximum_encoding_length_pair_input
test for lajoutlmv2, it would be nice to have it too. 😄
Let me know if you need some help!
@SaulLu, I have gone through the reviews. I will make the changes. |
@SaulLu @NielsRogge @LysandreJik, Sorry for the delay in the PR. I am done with most of the work, but I would like to ask for help in Comparing |
@SaulLu, Working on |
@SaulLu @LysandreJik @NielsRogge, could you please review the test added and changes. |
@patrickvonplaten @SaulLu @LysandreJik @NielsRogge , Could you please review the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the very extensive tests, great work!
Regarding the format:
- Could you add some comments explaining what is done? There is a lot of code so I fear it will become unmaintainable without clear comments explaining what is done.
- Please replace all
assert
with unittest's specific methods.
I'll let @NielsRogge review the logic.
Sure, Thank you for reviewing. |
…hub.com/Apoorvgarg-creator/transformers into LayoutLmV2-overflowing-tokens-order-fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me.
Pinging @SaulLu for a final review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for all your work @Apoorvgarg-creator !
Thanks especially for adding all those tests! I have one last little request which is about the overflowing boxes check, but to save you time I have opened a PR on your branch which includes the changes I am referring to. If you like these proposals, don't hesitate to merge this PR in your branch. 🙂
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
add overflowing bbox test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these latest changes! It's all good for me!
@SaulLu @LysandreJik, I have been getting these mails, I tried checking the reason for their failing but couldn't understand the issue. |
Thnaks a gain for all your work on this! |
What does this PR do?
This PR objective is same as #13179
Fixes #13148
The issue was resolved for every tokenizer except the LayoutLmV2 tokenizer.
Before submitting
Pull Request section? Yes 👍🏻.
The following test have been added which check the sequence of
overflowing tokens
,bbox sequence
andinput_ids
.Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@LysandreJik @sgugger @SaulLu @NielsRogge