-
Notifications
You must be signed in to change notification settings - Fork 25.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix list index out of range when padding nested empty lists #13876
Conversation
What is the use-case of passing nested empty lists like this? |
I'm using # labels may look like this, where it should be padded to (batch_size, max_answers),
# and sometimes the entire batch might have no answer being like this: `[[], [], []]`, so I proposed this PR.
start_position = [
[3, 5, 10], # 3 answers
[4], # 1 answer
[], # no answer
]
end_position = [
[4, 7, 14],
[8],
[],
] Also, could you checkout this issue #13879 and see if that makes sense to you? (similar use-case as this PR.) Thanks a lot! |
I would appreciate it greatly if you could take a look at this when you're back @SaulLu! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your contribution @qqaatw !
What you propose makes sense to me! I just have a small request: could you add a test (probably in test_padding
in the test_tokenization_common.py
file)
input_p = tokenizer_p.encode_plus("This is a input 1") | ||
input_p = tokenizer_p.pad(input_p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an overlooking when copying test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! Great catch! Thanks! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is perfect! Thanks for the addition! 🙌
input_p = tokenizer_p.encode_plus("This is a input 1") | ||
input_p = tokenizer_p.pad(input_p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! Great catch! Thanks! 👍
…ace#13876) * Fix index out of range when padding * Apply suggestions from code review * Style
What does this PR do?
This PR fixes
list index out of range
error when nested empty lists are supplied.@LysandreJik