From 568a1f42f2ed347f7c5a66287d455da2e93dc9ff Mon Sep 17 00:00:00 2001 From: Aman Madaan Date: Sun, 15 Oct 2023 15:23:59 -0400 Subject: [PATCH] [pie] add evaluation details for pie --- docs/pie_eval.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/pie_eval.md b/docs/pie_eval.md index f9c2833..0e4d037 100644 --- a/docs/pie_eval.md +++ b/docs/pie_eval.md @@ -1,5 +1,7 @@ # Instructions for evaluating runtime for PIE experiments +TLDR: From the self-refine outputs, create a flattened version of the outputs, and then use the PIE repo to evaluate the runtime and get a report. Parse the report using `src/pie/pie_eval.py`. + 1. **Step 1** (construct yaml): For evaluating runtime for PIE experiments, we need a yaml file that contains information about the dataset, the model outputs, and the reference file. Note that self-refine generates outputs in a slightly different format. While Self-Refine generates the outputs in an array (one version per refinement step), the evaluation requires the program to be present in a single column as a script. You can optionally use [https://github.com/madaan/self-refine/tree/main/src/pie](prep_for_pie_eval.py) for this. `prep_for_pie_eval.py` creates a single file where the output from the i^th step is present in the `attempt_i_code` column. The following is an example for evaluating the initial output (`y0`). - See `data/tasks/pie/gpt4_outputs_self_refine.jsonl` and `data/tasks/pie/gpt4_outputs_flattened.jsonl` for examples of the outputs from self-refine and the flattened version, respectively. @@ -32,7 +34,7 @@ output_report_file_path: "Where should the report file be generated?" Using the yaml file generated in the above step, please use the [evaluating your method](https://github.com/madaan/pie-perf/blob/main/README.md#evaluating-your-method) field to evaluate the outputs. If you run self-refine for 4 timesteps, you would create 4 yaml files and run this evaluation four times, once for each timestep. See `data/tasks/pie/gpt4_outputs.zip` for the 4 yaml files and the reports from these steps. -3. **Step 3** (parse reports and aggregate results) After the evaluation, the report is saved in `output_report_file_path.` Then, you can use src/pie/pie_eval.py to aggregate the results. +3. **Step 3** (parse reports and aggregate results) After the evaluation, the report is saved in `output_report_file_path.` Then, you can use `src/pie/pie_eval.py` to aggregate the results. ### Sample outputs