Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you release the raw gpt-4 responses as well as the prompts that triggered #1

Closed
manyoso opened this issue Apr 1, 2023 · 8 comments

Comments

@manyoso
Copy link

manyoso commented Apr 1, 2023

Hi, can you release the actual gpt-4 responses as well as the prompts that triggered them in a json or csv file?

@piresramon
Copy link
Owner

Hi. Thank you for your interest in our work. We made available all the responses we got for the ENEM 2022 dataset. You can access in 9 individual json files in the folder reports. Each file corresponds to one experiment. For example: engine=gpt-4-0314_numfewshot=3_task=enem_cot_2022.json used the GPT-4 model with 3-shot with CoT.

Getting the prompt is easy. You can save them in a txt file using the script scripts/write_out.py and visualize. If you need to save the prompts along with the question-ids, just update the script (line 78) to save the ctx (few-shot context) and the doc['id'] in a json. Then you can join this with the reports.

After that, just run:

python scripts/write_out.py --output_base_path output_dir --task enem_cot_2022 --num_fewshot 3 --num_examples -1 --description_dict_path description.json --set test

If you prefer to use another experiment, change the task to enem_2022 to ignore CoT, and num_fewshot to 0 to get zero-shot.

@viniciusarruda
Copy link

Hi. I also would like to have all prompts that produced the results in the paper for reproducibility purposes.

Running the script you mentioned, I got:

python write_out.py --output_base_path output_dir --task enem_2022 --num_fewshot 0 --num_examples -1 --description_dict_path description.json --set test
Traceback (most recent call last):
  File "write_out.py", line 81, in <module>
    main()
  File "write_out.py", line 77, in main
    f.write(ctx + "\n")
  File "cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03a9' in position 84: character maps to <undefined>

I don't know if is related, but the data shared here contains several unicode characters which seems not to be the desired.

@viniciusarruda
Copy link

Never mind. Running on Ubuntu (WSL 2) worked. I was running on Windows when the error occurred.

@viniciusarruda
Copy link

Hi again, I ran the script you mentioned. When checking the three-shot prompts, three of them have only two shots instead of three shots. Is it intentional or a bug?
Their ids are: ENEM_2022_21, ENEM_2022_88, and ENEM_2022_143.

Also, FYI, I've placed the processed dataset in this repo. You can check this issue there.

@piresramon
Copy link
Owner

Hi Vinicius. It is intentional. Those three questions are the ones selected as few-shot. If one of them is the current test example, it is removed from the few-shot examples.

You can see this removal in this part of the code: https://github.com/piresramon/gpt-4-enem/blob/702c51f16b971fafbb8783927a5ed1df3a2d9650/lm_eval/tasks/enem.py#L418C20-L418C84

This is also explained in the paper:
For experiments on the ENEM 2022 dataset, when one of the three selected few-shot examples is evaluated, we exclude
that example from the few-shot context and only use the remaining two examples as few-shot examples.

@piresramon
Copy link
Owner

To be clear, you can use this repo to evaluate any model in HuggingFace, not only OpenAI models. ;-)

python main.py \
    --model hf \
    --model_args pretrained=unicamp-dl/ptt5-base-portuguese-vocab \
    --task enem_cot_2022 \
    --num_fewshot 3 \
    --description_dict_path description.json

@rafaxy
Copy link

rafaxy commented Feb 26, 2024

Hi, I've been trying to replicate your project. I used the following command:

python main.py \
    --model chatgpt \
    --model_args engine=gpt-3.5-turbo-0125 \
    --tasks enem_cot_2022_captions \
    --description_dict_path description.json \
    --num_fewshot 3 \
    --conversation_template chatgpt

The script runs successfully and I can see the aggregate results in the terminal, but no .json reports are generated in the ./reports such as the ones in your repository. How can I generate them?

@piresramon
Copy link
Owner

Hi @rafaxy. This framework did not save the reports. However, you can make some minor adjustments in the code to save the report in the format you prefer.
I suggest you add a piece of code into this function: https://github.com/piresramon/gpt-4-enem/blob/main/lm_eval/tasks/enem_multimodal.py#L55
Just create a dict containing the necessary data as below, adding what you need, and save in a JSON:

{
        "id": doc["id"],
        "response": response,
        "gold": doc["gold"],
        "area": area
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants