From 568a1f42f2ed347f7c5a66287d455da2e93dc9ff Mon Sep 17 00:00:00 2001
From: Aman Madaan <amadaan@cs.cmu.edu>
Date: Sun, 15 Oct 2023 15:23:59 -0400
Subject: [PATCH] [pie] add evaluation details for pie

---
 docs/pie_eval.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/pie_eval.md b/docs/pie_eval.md
index f9c2833..0e4d037 100644
--- a/docs/pie_eval.md
+++ b/docs/pie_eval.md
@@ -1,5 +1,7 @@
 # Instructions for evaluating runtime for PIE experiments
 
+TLDR: From the self-refine outputs, create a flattened version of the outputs, and then use the PIE repo to evaluate the runtime and get a report. Parse the report using `src/pie/pie_eval.py`.
+
 1. **Step 1** (construct yaml): For evaluating runtime for PIE experiments, we need a yaml file that contains information about the dataset, the model outputs, and the reference file. Note that self-refine generates outputs in a slightly different format. While Self-Refine generates the outputs in an array (one version per refinement step), the evaluation requires the program to be present in a single column as a script. You can optionally use [https://github.com/madaan/self-refine/tree/main/src/pie](prep_for_pie_eval.py) for this. `prep_for_pie_eval.py` creates a single file where the output from the i^th step is present in the `attempt_i_code` column. The following is an example for evaluating the initial output (`y0`).
 
 - See `data/tasks/pie/gpt4_outputs_self_refine.jsonl` and `data/tasks/pie/gpt4_outputs_flattened.jsonl` for examples of the outputs from self-refine and the flattened version, respectively.
@@ -32,7 +34,7 @@ output_report_file_path: "Where should the report file be generated?"
 
 Using the yaml file generated in the above step, please use the [evaluating your method](https://github.com/madaan/pie-perf/blob/main/README.md#evaluating-your-method) field to evaluate the outputs. If you run self-refine for 4 timesteps, you would create 4 yaml files and run this evaluation four times, once for each timestep. See `data/tasks/pie/gpt4_outputs.zip` for the 4 yaml files and the reports from these steps.
 
-3. **Step 3** (parse reports and aggregate results) After the evaluation, the report is saved in `output_report_file_path.` Then, you can use src/pie/pie_eval.py to aggregate the results. 
+3. **Step 3** (parse reports and aggregate results) After the evaluation, the report is saved in `output_report_file_path.` Then, you can use `src/pie/pie_eval.py` to aggregate the results. 
 
 ### Sample outputs