Fail to reproduce the codellama baseline with the code base #16

xufangzhi · 2023-12-13T08:38:29Z

Hello, thanks for sharing the great work!

But we tried to reproduce the codellama-13b-pal results on GSM-Hard dataset, with the following script:

set -ex

MODEL_NAME_OR_PATH="LOCAL CODELLAMA-13B MODEL WEIGHTS"

# DATA_LIST = ['math', 'gsm8k', 'gsm-hard', 'svamp', 'tabmwp', 'asdiv', 'mawps']

DATA_NAME="gsm-hard"

SPLIT="test"
PROMPT_TYPE="pal"
NUM_TEST_SAMPLE=-1

CUDA_VISIBLE_DEVICES=0 TOKENIZERS_PARALLELISM=false \
python -um infer.inference \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--data_name ${DATA_NAME} \
--split ${SPLIT} \
--prompt_type ${PROMPT_TYPE} \
--num_test_sample ${NUM_TEST_SAMPLE} \
--seed 0 \
--temperature 0 \
--n_sampling 1 \
--top_p 1 \
--start 0 \
--end -1 \

The output results are as follows:

Num samples: 1319
Num scores: 1319
Timeout samples: 10
Empty samples: 293
Mean score: [0.1]

Time use: 7540.60s
Time use: 125:40

It shows the result is 0.1, which has large gaps with the reported one. Would you please share the baseline script on CodeLLaMA with PAL strategy? Thanks a lot.

The text was updated successfully, but these errors were encountered:

ZubinGou · 2024-01-03T04:20:30Z

Maybe check if the environment is consistent with the requirements.txt ~

ZubinGou closed this as completed Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to reproduce the codellama baseline with the code base #16

Fail to reproduce the codellama baseline with the code base #16

xufangzhi commented Dec 13, 2023 •

edited

Loading

ZubinGou commented Jan 3, 2024

Fail to reproduce the codellama baseline with the code base #16

Fail to reproduce the codellama baseline with the code base #16

Comments

xufangzhi commented Dec 13, 2023 • edited Loading

ZubinGou commented Jan 3, 2024

xufangzhi commented Dec 13, 2023 •

edited

Loading