Can you release the raw gpt-4 responses as well as the prompts that triggered #1

manyoso · 2023-04-01T13:27:02Z

Hi, can you release the actual gpt-4 responses as well as the prompts that triggered them in a json or csv file?

piresramon · 2023-04-01T20:51:26Z

Hi. Thank you for your interest in our work. We made available all the responses we got for the ENEM 2022 dataset. You can access in 9 individual json files in the folder reports. Each file corresponds to one experiment. For example: engine=gpt-4-0314_numfewshot=3_task=enem_cot_2022.json used the GPT-4 model with 3-shot with CoT.

Getting the prompt is easy. You can save them in a txt file using the script scripts/write_out.py and visualize. If you need to save the prompts along with the question-ids, just update the script (line 78) to save the ctx (few-shot context) and the doc['id'] in a json. Then you can join this with the reports.

After that, just run:

python scripts/write_out.py --output_base_path output_dir --task enem_cot_2022 --num_fewshot 3 --num_examples -1 --description_dict_path description.json --set test

If you prefer to use another experiment, change the task to enem_2022 to ignore CoT, and num_fewshot to 0 to get zero-shot.

viniciusarruda · 2023-08-25T22:44:37Z

Hi. I also would like to have all prompts that produced the results in the paper for reproducibility purposes.

Running the script you mentioned, I got:

python write_out.py --output_base_path output_dir --task enem_2022 --num_fewshot 0 --num_examples -1 --description_dict_path description.json --set test
Traceback (most recent call last):
  File "write_out.py", line 81, in <module>
    main()
  File "write_out.py", line 77, in main
    f.write(ctx + "\n")
  File "cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03a9' in position 84: character maps to <undefined>

I don't know if is related, but the data shared here contains several unicode characters which seems not to be the desired.

viniciusarruda · 2023-08-25T23:11:42Z

Never mind. Running on Ubuntu (WSL 2) worked. I was running on Windows when the error occurred.

viniciusarruda · 2023-08-30T20:32:33Z

Hi again, I ran the script you mentioned. When checking the three-shot prompts, three of them have only two shots instead of three shots. Is it intentional or a bug?
Their ids are: ENEM_2022_21, ENEM_2022_88, and ENEM_2022_143.

Also, FYI, I've placed the processed dataset in this repo. You can check this issue there.

piresramon · 2023-08-30T20:43:00Z

Hi Vinicius. It is intentional. Those three questions are the ones selected as few-shot. If one of them is the current test example, it is removed from the few-shot examples.

You can see this removal in this part of the code: https://github.com/piresramon/gpt-4-enem/blob/702c51f16b971fafbb8783927a5ed1df3a2d9650/lm_eval/tasks/enem.py#L418C20-L418C84

This is also explained in the paper:
For experiments on the ENEM 2022 dataset, when one of the three selected few-shot examples is evaluated, we exclude
that example from the few-shot context and only use the remaining two examples as few-shot examples.

piresramon · 2023-08-30T20:52:06Z

To be clear, you can use this repo to evaluate any model in HuggingFace, not only OpenAI models. ;-)

python main.py \
    --model hf \
    --model_args pretrained=unicamp-dl/ptt5-base-portuguese-vocab \
    --task enem_cot_2022 \
    --num_fewshot 3 \
    --description_dict_path description.json

rafaxy · 2024-02-26T15:15:48Z

Hi, I've been trying to replicate your project. I used the following command:

python main.py \
    --model chatgpt \
    --model_args engine=gpt-3.5-turbo-0125 \
    --tasks enem_cot_2022_captions \
    --description_dict_path description.json \
    --num_fewshot 3 \
    --conversation_template chatgpt

The script runs successfully and I can see the aggregate results in the terminal, but no .json reports are generated in the ./reports such as the ones in your repository. How can I generate them?

piresramon · 2024-02-27T23:19:37Z

Hi @rafaxy. This framework did not save the reports. However, you can make some minor adjustments in the code to save the report in the format you prefer.
I suggest you add a piece of code into this function: https://github.com/piresramon/gpt-4-enem/blob/main/lm_eval/tasks/enem_multimodal.py#L55
Just create a dict containing the necessary data as below, adding what you need, and save in a JSON:

{
        "id": doc["id"],
        "response": response,
        "gold": doc["gold"],
        "area": area
}

piresramon closed this as completed Apr 1, 2023

piresramon reopened this Apr 1, 2023

piresramon closed this as completed Apr 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you release the raw gpt-4 responses as well as the prompts that triggered #1

Can you release the raw gpt-4 responses as well as the prompts that triggered #1

manyoso commented Apr 1, 2023

piresramon commented Apr 1, 2023

viniciusarruda commented Aug 25, 2023

viniciusarruda commented Aug 25, 2023

viniciusarruda commented Aug 30, 2023

piresramon commented Aug 30, 2023

piresramon commented Aug 30, 2023

rafaxy commented Feb 26, 2024

piresramon commented Feb 27, 2024

Can you release the raw gpt-4 responses as well as the prompts that triggered #1

Can you release the raw gpt-4 responses as well as the prompts that triggered #1

Comments

manyoso commented Apr 1, 2023

piresramon commented Apr 1, 2023

viniciusarruda commented Aug 25, 2023

viniciusarruda commented Aug 25, 2023

viniciusarruda commented Aug 30, 2023

piresramon commented Aug 30, 2023

piresramon commented Aug 30, 2023

rafaxy commented Feb 26, 2024

piresramon commented Feb 27, 2024