-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you release the raw gpt-4 responses as well as the prompts that triggered #1
Comments
Hi. Thank you for your interest in our work. We made available all the responses we got for the ENEM 2022 dataset. You can access in 9 individual json files in the folder reports. Each file corresponds to one experiment. For example: Getting the prompt is easy. You can save them in a txt file using the script After that, just run:
If you prefer to use another experiment, change the task to enem_2022 to ignore CoT, and num_fewshot to 0 to get zero-shot. |
Hi. I also would like to have all prompts that produced the results in the paper for reproducibility purposes. Running the script you mentioned, I got: python write_out.py --output_base_path output_dir --task enem_2022 --num_fewshot 0 --num_examples -1 --description_dict_path description.json --set test
Traceback (most recent call last):
File "write_out.py", line 81, in <module>
main()
File "write_out.py", line 77, in main
f.write(ctx + "\n")
File "cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03a9' in position 84: character maps to <undefined> I don't know if is related, but the data shared here contains several unicode characters which seems not to be the desired. |
Never mind. Running on Ubuntu (WSL 2) worked. I was running on Windows when the error occurred. |
Hi again, I ran the script you mentioned. When checking the three-shot prompts, three of them have only two shots instead of three shots. Is it intentional or a bug? Also, FYI, I've placed the processed dataset in this repo. You can check this issue there. |
Hi Vinicius. It is intentional. Those three questions are the ones selected as few-shot. If one of them is the current test example, it is removed from the few-shot examples. You can see this removal in this part of the code: https://github.com/piresramon/gpt-4-enem/blob/702c51f16b971fafbb8783927a5ed1df3a2d9650/lm_eval/tasks/enem.py#L418C20-L418C84 This is also explained in the paper: |
To be clear, you can use this repo to evaluate any model in HuggingFace, not only OpenAI models. ;-)
|
Hi, I've been trying to replicate your project. I used the following command:
The script runs successfully and I can see the aggregate results in the terminal, but no .json reports are generated in the |
Hi @rafaxy. This framework did not save the reports. However, you can make some minor adjustments in the code to save the report in the format you prefer.
|
Hi, can you release the actual gpt-4 responses as well as the prompts that triggered them in a json or csv file?
The text was updated successfully, but these errors were encountered: