Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing: New Data for GLUE Tasks #28

Closed
YashBit opened this issue Dec 6, 2021 · 6 comments
Closed

Testing: New Data for GLUE Tasks #28

YashBit opened this issue Dec 6, 2021 · 6 comments

Comments

@YashBit
Copy link

YashBit commented Dec 6, 2021

Now, I can see that there was another issue similar to this. However, I am still not clear on how to deal with OOD Test Data.

I want to train and validation on original train.tsv and dev.tsv in the folder ORIGINAL. But, I want to test on an out of distribution dataset.

So, let's say I want to test SST-2 on IMDB for roberta-base. How should I go about it? Currently, I replace test.tsv in ORIGINAL folder and generate K shot data. The I run the file using the commands given on README on the repo page. However, the test eval accuracy is the same as the original SST-2 test dataset. I don't know what is happening here. To reiterate:

My objective:

  1. Test IMDB on roberta-base 42 seed SST-2. But train and validate on original data provided with repo.

Action:

  1. Replace test.tsv of ORIGINAL SST-2 with IMDB.

Observed Behaviour:

  1. Same test eval accuracy as original one as if not replaced test.tsv.

Expected Behaviour:

  1. Same test and dev accuracy, different test accuracy.

Request:

  1. Please help :) We changed the original test.tsv and then generated K shot again, but there was no change.
@ajfisch
Copy link
Collaborator

ajfisch commented Dec 6, 2021

Hi,

Make sure your cache files are either deleted, or you use a completely separate data directory/file naming from the original, or you specify the cache overwrite flag. The data loader will load existing cached torch files if --overwrite_cache is not set. Which is the default.

Reference:

# Cache name distinguishes mode, task name, tokenizer, and length. So if you change anything beyond these elements, make sure to clear your cache.

@YashBit
Copy link
Author

YashBit commented Dec 6, 2021

Ok, I will delete the cache directories @ajfisch. So I should replace the test.tsv files in the original folder for all tasks in a similar manner?

@ajfisch
Copy link
Collaborator

ajfisch commented Dec 6, 2021

Yes, either deleting existing cache files (and then the code would overwrite the missing file), or saving the alternate data to a new data directory (so then the cache files would be saved and loaded from new_data_dir/<cache_file_name>) should work.

@shi-kejian
Copy link

shi-kejian commented Dec 7, 2021

Hi,
Thanks again for the great work.

Today I actually encountered the same error as issue #7 ., when testing a model prompt-tuned on SST-2 directly on imdb movie review dataset, by replacing the dev.tsv in /original with the imdb dataset, as mentioned in issue #14 .

What I did:

  1. prompt tune a model ckpt on SST-2, and save the model
  2. replace the data/original/SST-2/dev.tsv with my own imdb dataset, and format it correctly
  3. run tools/generate_k_shot.py again. The data/k-shot/SST-2/test.tsv turns to imdb.
  4. load the model in 1) and put --no_train, --do_predict, --overwrite_cache, and other necessary flags to zero-shot on the imdb dataset. I also cleared the cache before I run it.
    Error occurs.
    Traceback (most recent call last):
    File "run.py", line 628, in
    main()
    File "run.py", line 466, in main
    if training_args.do_predict
    File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 465, in init
    verbose=True if _ == 0 else False,
    File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 585, in convert_fn
    other_sent_limit=self.args.other_sent_limit,
    File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 243, in tokenize_multipart_input
    mask_pos = [input_ids.index(tokenizer.mask_token_id)]
    ValueError: 50264 is not in list
    This "50264" is the same error as in issue Index not in list error when evaluating models zero-shot #7
    Sorry for the inconvenience but do you happen to know what might went wrong?

Many thanks.

@YashBit YashBit closed this as completed Dec 8, 2021
@YashBit
Copy link
Author

YashBit commented Dec 8, 2021

Making another issue, since new error is different.

@hujian233
Copy link

hujian233 commented Dec 8, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants