Error while calling dataset.py: ValueError: 50264 is not in list #29

YashBit · 2021-12-08T04:15:28Z

Taken from: https://github.com/shi-kejian
Hi,
Thanks again for the great work.

Today I actually encountered the same error as issue #7 ., when testing a model prompt-tuned on SST-2 directly on imdb movie review dataset, by replacing the dev.tsv in /original with the imdb dataset, as mentioned in issue #14 .

What I did:

prompt tune a model ckpt on SST-2, and save the model
replace the data/original/SST-2/dev.tsv with my own imdb dataset, and format it correctly
run tools/generate_k_shot.py again. The data/k-shot/SST-2/test.tsv turns to imdb.
load the model in 1) and put --no_train, --do_predict, --overwrite_cache, and other necessary flags to zero-shot on the imdb dataset. I also cleared the cache before I run it.
Error occurs.
Traceback (most recent call last):
File "run.py", line 628, in
main()
File "run.py", line 466, in main
if training_args.do_predict
File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 465, in init
verbose=True if _ == 0 else False,
File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 585, in convert_fn
other_sent_limit=self.args.other_sent_limit,
File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 243, in tokenize_multipart_input
mask_pos = [input_ids.index(tokenizer.mask_token_id)]
ValueError: 50264 is not in list
This "50264" is the same error as in issue Index not in list error when evaluating models zero-shot #7
Sorry for the inconvenience but do you happen to know what might went wrong?
Many thanks.

gaotianyu1350 · 2021-12-09T19:45:09Z

Hi Yash,

This is caused by truncating the mask token. For most of the templates, the mask token is put at the end of the sentence. If the input is too long and exceeds the max length, the mask token might be truncated. This can be solved by increasing the maximum length of the model

Best,
Tianyu

YashBit · 2021-12-11T15:07:55Z

Dear Tianyu,
Thank you for your help. It works, I also limited the first_sent_len, so there is no error.

Regards,
Yash.

YashBit closed this as completed Dec 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while calling dataset.py: ValueError: 50264 is not in list #29

Error while calling dataset.py: ValueError: 50264 is not in list #29

YashBit commented Dec 8, 2021

gaotianyu1350 commented Dec 9, 2021

YashBit commented Dec 11, 2021

Error while calling dataset.py: ValueError: 50264 is not in list #29

Error while calling dataset.py: ValueError: 50264 is not in list #29

Comments

YashBit commented Dec 8, 2021

gaotianyu1350 commented Dec 9, 2021

YashBit commented Dec 11, 2021