Skip to content

Commit

Permalink
[pie] add evaluation details for pie
Browse files Browse the repository at this point in the history
  • Loading branch information
madaan committed Oct 15, 2023
1 parent b9ed187 commit 568a1f4
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion docs/pie_eval.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Instructions for evaluating runtime for PIE experiments

TLDR: From the self-refine outputs, create a flattened version of the outputs, and then use the PIE repo to evaluate the runtime and get a report. Parse the report using `src/pie/pie_eval.py`.

1. **Step 1** (construct yaml): For evaluating runtime for PIE experiments, we need a yaml file that contains information about the dataset, the model outputs, and the reference file. Note that self-refine generates outputs in a slightly different format. While Self-Refine generates the outputs in an array (one version per refinement step), the evaluation requires the program to be present in a single column as a script. You can optionally use [https://github.com/madaan/self-refine/tree/main/src/pie](prep_for_pie_eval.py) for this. `prep_for_pie_eval.py` creates a single file where the output from the i^th step is present in the `attempt_i_code` column. The following is an example for evaluating the initial output (`y0`).

- See `data/tasks/pie/gpt4_outputs_self_refine.jsonl` and `data/tasks/pie/gpt4_outputs_flattened.jsonl` for examples of the outputs from self-refine and the flattened version, respectively.
Expand Down Expand Up @@ -32,7 +34,7 @@ output_report_file_path: "Where should the report file be generated?"

Using the yaml file generated in the above step, please use the [evaluating your method](https://github.com/madaan/pie-perf/blob/main/README.md#evaluating-your-method) field to evaluate the outputs. If you run self-refine for 4 timesteps, you would create 4 yaml files and run this evaluation four times, once for each timestep. See `data/tasks/pie/gpt4_outputs.zip` for the 4 yaml files and the reports from these steps.

3. **Step 3** (parse reports and aggregate results) After the evaluation, the report is saved in `output_report_file_path.` Then, you can use src/pie/pie_eval.py to aggregate the results.
3. **Step 3** (parse reports and aggregate results) After the evaluation, the report is saved in `output_report_file_path.` Then, you can use `src/pie/pie_eval.py` to aggregate the results.

### Sample outputs

Expand Down

0 comments on commit 568a1f4

Please sign in to comment.