Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while calling dataset.py: ValueError: 50264 is not in list #29

Closed
YashBit opened this issue Dec 8, 2021 · 2 comments
Closed

Error while calling dataset.py: ValueError: 50264 is not in list #29

YashBit opened this issue Dec 8, 2021 · 2 comments

Comments

@YashBit
Copy link

YashBit commented Dec 8, 2021

Taken from: https://github.com/shi-kejian
Hi,
Thanks again for the great work.

Today I actually encountered the same error as issue #7 ., when testing a model prompt-tuned on SST-2 directly on imdb movie review dataset, by replacing the dev.tsv in /original with the imdb dataset, as mentioned in issue #14 .

What I did:

prompt tune a model ckpt on SST-2, and save the model
replace the data/original/SST-2/dev.tsv with my own imdb dataset, and format it correctly
run tools/generate_k_shot.py again. The data/k-shot/SST-2/test.tsv turns to imdb.
load the model in 1) and put --no_train, --do_predict, --overwrite_cache, and other necessary flags to zero-shot on the imdb dataset. I also cleared the cache before I run it.
Error occurs.
Traceback (most recent call last):
File "run.py", line 628, in
main()
File "run.py", line 466, in main
if training_args.do_predict
File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 465, in init
verbose=True if _ == 0 else False,
File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 585, in convert_fn
other_sent_limit=self.args.other_sent_limit,
File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 243, in tokenize_multipart_input
mask_pos = [input_ids.index(tokenizer.mask_token_id)]
ValueError: 50264 is not in list
This "50264" is the same error as in issue Index not in list error when evaluating models zero-shot #7
Sorry for the inconvenience but do you happen to know what might went wrong?
Many thanks.

@gaotianyu1350
Copy link
Member

Hi Yash,

This is caused by truncating the mask token. For most of the templates, the mask token is put at the end of the sentence. If the input is too long and exceeds the max length, the mask token might be truncated. This can be solved by increasing the maximum length of the model

Best,
Tianyu

@YashBit
Copy link
Author

YashBit commented Dec 11, 2021

Dear Tianyu,
Thank you for your help. It works, I also limited the first_sent_len, so there is no error.

Regards,
Yash.

@YashBit YashBit closed this as completed Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants