Fix list index out of range when padding nested empty lists #13876

qqaatw · 2021-10-05T13:44:39Z

What does this PR do?

This PR fixes list index out of range error when nested empty lists are supplied.

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
padded = tokenizer.pad({"input_ids": [[], [], []] })
print(padded)

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    padded = tokenizer.pad({"input_ids": [[], [], []] })
  File "/src/transformers/tokenization_utils_base.py", line 2730, in pad
    while len(required_input[index]) == 0:
IndexError: list index out of range

@LysandreJik

LysandreJik · 2021-10-08T18:27:03Z

What is the use-case of passing nested empty lists like this?

qqaatw · 2021-10-09T01:34:11Z

What is the use-case of passing nested empty lists like this?

I'm using tokenizer.pad to pad question answering labels like the below snippet,

# labels may look like this, where it should be padded to (batch_size, max_answers), 
# and sometimes the entire batch might have no answer being like this: `[[], [], []]`, so I proposed this PR.
start_position = [
    [3, 5, 10], # 3 answers
    [4], # 1 answer
    [], # no answer
]
end_position = [
    [4, 7, 14],
    [8],
    [],
]

Also, could you checkout this issue #13879 and see if that makes sense to you? (similar use-case as this PR.)

Thanks a lot!

LysandreJik · 2021-10-11T20:46:53Z

I would appreciate it greatly if you could take a look at this when you're back @SaulLu!

SaulLu

Thank you very much for your contribution @qqaatw !

What you propose makes sense to me! I just have a small request: could you add a test (probably in test_padding in the test_tokenization_common.py file)

qqaatw · 2021-11-10T06:42:54Z

tests/test_tokenization_common.py

+                input_p = tokenizer_p.encode_plus("This is a input 1")
+                input_p = tokenizer_p.pad(input_p)


I think this is an overlooking when copying test cases.

oh! Great catch! Thanks! 👍

SaulLu

That is perfect! Thanks for the addition! 🙌

SaulLu · 2021-11-10T20:25:28Z

tests/test_tokenization_common.py

+                input_p = tokenizer_p.encode_plus("This is a input 1")
+                input_p = tokenizer_p.pad(input_p)


oh! Great catch! Thanks! 👍

…ace#13876) * Fix index out of range when padding * Apply suggestions from code review * Style

Fix index out of range when padding

632f21f

huggingface deleted a comment from github-actions bot Nov 6, 2021

SaulLu self-requested a review November 9, 2021 17:51

SaulLu reviewed Nov 9, 2021

View reviewed changes

Apply suggestions from code review

c3b027b

qqaatw commented Nov 10, 2021

View reviewed changes

qqaatw added 2 commits November 10, 2021 14:45

Style

3d9456b

Merge branch 'master' into fix_pad

fdb6b7f

qqaatw requested a review from SaulLu November 10, 2021 06:57

SaulLu approved these changes Nov 10, 2021

View reviewed changes

SaulLu merged commit 9e37c5c into huggingface:master Nov 10, 2021

Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 27, 2022

Fix list index out of range when padding nested empty lists (huggingf…

592daec

…ace#13876) * Fix index out of range when padding * Apply suggestions from code review * Style

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix list index out of range when padding nested empty lists #13876

Fix list index out of range when padding nested empty lists #13876

qqaatw commented Oct 5, 2021

LysandreJik commented Oct 8, 2021

qqaatw commented Oct 9, 2021 •

edited

Loading

LysandreJik commented Oct 11, 2021

SaulLu left a comment

qqaatw Nov 10, 2021

SaulLu Nov 10, 2021

SaulLu left a comment

SaulLu Nov 10, 2021

		input_p = tokenizer_p.encode_plus("This is a input 1")
		input_p = tokenizer_p.pad(input_p)

Fix list index out of range when padding nested empty lists #13876

Fix list index out of range when padding nested empty lists #13876

Conversation

qqaatw commented Oct 5, 2021

What does this PR do?

LysandreJik commented Oct 8, 2021

qqaatw commented Oct 9, 2021 • edited Loading

LysandreJik commented Oct 11, 2021

SaulLu left a comment

Choose a reason for hiding this comment

qqaatw Nov 10, 2021

Choose a reason for hiding this comment

SaulLu Nov 10, 2021

Choose a reason for hiding this comment

SaulLu left a comment

Choose a reason for hiding this comment

SaulLu Nov 10, 2021

Choose a reason for hiding this comment

qqaatw commented Oct 9, 2021 •

edited

Loading