Skip to content

'AssertionError: 28531 sample gt_str != true_gt' on Py150 #103

@wannita901

Description

@wannita901

Hi! I downloaded the repo two weeks ago and ran the code completion token level. The java one is all runnable. However, the python one for fine-tuning was completed, but the evaluation found a similar error to #98.

02/23/2022 15:15:50 - INFO - main - 3200 are done!
02/23/2022 15:15:50 - INFO - main - 33962002, 0.7616674364485344
02/23/2022 15:16:39 - INFO - main - 3300 are done!
02/23/2022 15:16:39 - INFO - main - 35024275, 0.7621001719521675
02/23/2022 15:17:28 - INFO - main - 3400 are done!
02/23/2022 15:17:28 - INFO - main - 36095119, 0.7619678162025175
02/23/2022 15:18:18 - INFO - main - 3500 are done!
02/23/2022 15:18:18 - INFO - main - 37150743, 0.7621929122655771
Traceback (most recent call last):
File "run_lm.py", line 715, in
main()
File "run_lm.py", line 710, in main
test_total, test_cr = eval_acc(args, model, tokenizer, 'test')
File "run_lm.py", line 459, in eval_acc
total_samples = post_process(args, total_pred, total_gt, open(os.path.join(args.data_dir, f"{file_type}.txt")).readlines(), saved_file)
File "run_lm.py", line 478, in post_process
assert gt_str == true_gts[cnt].strip(), f"{cnt} sample gt_str != true_gt"
AssertionError: 28531 sample gt_str != true_gt

I try deleting the line 28530th in test.txt and so on, but it's still keeping the error. My evaluation command was

export CUDA_VISIBLE_DEVICES=0
LANG=python                       # set python for py150
DATADIR=../dataset/py150/token_completion
LITFILE=../dataset/py150/literals.json
OUTPUTDIR=../save/py150
PRETRAINDIR=../save/py150/checkpoint-last       # directory of your saved model
LOGFILE=completion_py150_eval.log

python -u run_lm.py \
        --data_dir=$DATADIR \
        --lit_file=$LITFILE \
        --langs=$LANG \
        --output_dir=$OUTPUTDIR \
        --pretrain_dir=$PRETRAINDIR \
        --log_file=$LOGFILE \
        --model_type=gpt2 \
        --block_size=1024 \
        --do_eval \
        --per_gpu_eval_batch_size=16 \
        --logging_steps=100 \
        --seed=42 

Any suggestion will be very appreciated. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions