-
Couldn't load subscription status.
- Fork 387
Description
Hi! I downloaded the repo two weeks ago and ran the code completion token level. The java one is all runnable. However, the python one for fine-tuning was completed, but the evaluation found a similar error to #98.
02/23/2022 15:15:50 - INFO - main - 3200 are done!
02/23/2022 15:15:50 - INFO - main - 33962002, 0.7616674364485344
02/23/2022 15:16:39 - INFO - main - 3300 are done!
02/23/2022 15:16:39 - INFO - main - 35024275, 0.7621001719521675
02/23/2022 15:17:28 - INFO - main - 3400 are done!
02/23/2022 15:17:28 - INFO - main - 36095119, 0.7619678162025175
02/23/2022 15:18:18 - INFO - main - 3500 are done!
02/23/2022 15:18:18 - INFO - main - 37150743, 0.7621929122655771
Traceback (most recent call last):
File "run_lm.py", line 715, in
main()
File "run_lm.py", line 710, in main
test_total, test_cr = eval_acc(args, model, tokenizer, 'test')
File "run_lm.py", line 459, in eval_acc
total_samples = post_process(args, total_pred, total_gt, open(os.path.join(args.data_dir, f"{file_type}.txt")).readlines(), saved_file)
File "run_lm.py", line 478, in post_process
assert gt_str == true_gts[cnt].strip(), f"{cnt} sample gt_str != true_gt"
AssertionError: 28531 sample gt_str != true_gt
I try deleting the line 28530th in test.txt and so on, but it's still keeping the error. My evaluation command was
export CUDA_VISIBLE_DEVICES=0
LANG=python # set python for py150
DATADIR=../dataset/py150/token_completion
LITFILE=../dataset/py150/literals.json
OUTPUTDIR=../save/py150
PRETRAINDIR=../save/py150/checkpoint-last # directory of your saved model
LOGFILE=completion_py150_eval.log
python -u run_lm.py \
--data_dir=$DATADIR \
--lit_file=$LITFILE \
--langs=$LANG \
--output_dir=$OUTPUTDIR \
--pretrain_dir=$PRETRAINDIR \
--log_file=$LOGFILE \
--model_type=gpt2 \
--block_size=1024 \
--do_eval \
--per_gpu_eval_batch_size=16 \
--logging_steps=100 \
--seed=42
Any suggestion will be very appreciated. Thanks!