# Experiment TextualVerifier Using Best Sample

In [1]:
import pandas as pd
import textgrad as tg
from textgrad.engine import get_engine
from textgrad.variable import Variable
from textgrad.optimizer import TextualGradientDescent
from textgrad.verifier import TextualVerifierExperiment
from textgrad.loss import TextLoss

## Load Dataset

In [2]:
sample = pd.read_csv("dataset/sample/prm800k-03-algo3-clean.csv")
sample

Unnamed: 0,id,labeler,timestamp,problem,ground_truth_answer,total_steps,steps,neg_1,zero,pos_1
0,1,debabc6d-f79c-4ee5-a9db-5e284390254c,2022-07-30T14:37:13.296218,There are an infinite number of vectors $\math...,\begin{pmatrix} -7 \\ 16 \\ 5 \end{pmatrix},34,"[{'text': ""Let's set $\\mathbf{v} = \\begin{pm...",19,6,9
1,2,debabc6d-f79c-4ee5-a9db-5e284390254c,2022-07-30T13:26:58.414691,When rolling a certain unfair six-sided die wi...,29,35,"[{'text': ""Well, let's think about this for a ...",18,1,16
2,3,debabc6d-f79c-4ee5-a9db-5e284390254c,2022-07-31T14:39:30.588403,Find all solutions to\n\[\sin \left( \tan^{-1}...,3 \pm 2 \sqrt{2},34,"[{'text': ""Let's set $y = \\tan^{-1} x$."", 'ra...",11,1,22
3,4,debabc6d-f79c-4ee5-a9db-5e284390254c,2022-07-29T07:48:01.714041,The solutions of the equation $z^4+4z^3i-6z^2-...,11,40,[{'text': 'There is a formula for the area of ...,16,2,21
4,5,e90a38f3-3135-4465-87af-3e6322e3d772,2022-07-22T20:02:50.866783,A sequence $(a_n)$ is defined as follows:\n\[a...,-1,36,"[{'text': ""So we're given that $a_{i + 1} = \\...",7,3,26
...,...,...,...,...,...,...,...,...,...,...
66,440,debabc6d-f79c-4ee5-a9db-5e284390254c,2022-07-28T08:12:20.344377,Find the product $CD$ of the integers $C$ and ...,-5,17,[{'text': 'I think the first step here is to f...,3,0,14
67,442,d8aa7923-b970-45e1-9734-e4a7f6c4a7db,2022-07-31T22:47:06.498122,What real values of $x$ are not in the domain ...,-4,31,[{'text': 'To find values of $x$ that are not ...,1,0,30
68,444,d8aa7923-b970-45e1-9734-e4a7f6c4a7db,2022-07-24T10:40:50.685197,How many license plates can be formed if every...,58500,14,[{'text': 'So we need to count the number of p...,2,2,10
69,445,debabc6d-f79c-4ee5-a9db-5e284390254c,2022-07-30T11:25:46.657657,"If $f(x)=5x^2+3x+4$, what is the value of $f(-...",18,7,"[{'text': 'To find f(-2), we just need to plug...",1,0,6


## Experiment

In [3]:
engine = get_engine("gemini-1.5-pro")
tg.set_backward_engine("gemini-1.5-pro", override=True)

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
def format_steps(steps):
    formatted_steps = ""
    for step in steps:
        new_step = f"<Step>{step['text']}</Step>\n"
        formatted_steps += new_step
    return formatted_steps

In [5]:
import ast

def evaluate_sample_with_textgrad_textual_verifier(row_data):
    problem = row_data['problem'] 
    steps_list = ast.literal_eval(row_data['steps'])
    solution_steps = format_steps(steps_list)
    print(problem)
    print(solution_steps)

    solution = Variable(solution_steps,
                        requires_grad=True,
                        role_description=f"Solution to the math question: {problem}")
    verification_prompt = Variable("You will evaluate the solution to a math question.",
                                    requires_grad=False,
                                    role_description="system prompt")

    # TextualVerifierV3
    verifier = TextualVerifierExperiment(verifier_engine=engine, step_eval_iterations=3, logger=True)
    verified_result = verifier.verify(instance=solution, 
                                    prompt=verification_prompt,
                                    calculation=solution)
    verified_result_value = verified_result.value

    print(verified_result_value)

    # result = {
    #     "id": row_data["id"],
    #     "raw_solution": row_data["raw_solution"],
    #     "correct_answer": row_data["correct_answer"],
    #     "source": row_data["source"],
    #     "subject": row_data["subject"]
    # }

    # return result

In [6]:
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm
import time

results = []
start_time = time.time()

with ThreadPoolExecutor(max_workers=128) as executor:
    # Submit all tasks
    futures = [
        executor.submit(evaluate_sample_with_textgrad_textual_verifier, row.to_dict()) 
        for _, row in sample[6:7].iterrows()
    ]
    
    # Use tqdm for progress tracking
    for future in tqdm(as_completed(futures), total=len(futures), desc="Processing"):
        result = future.result()
        if result is not None:
            results.append(result)

# experiment_df = pd.DataFrame(results)

# print(f"Completed in {time.time() - start_time:.1f} seconds")
# experiment_df.to_csv('results/prm800k-03-algo3-clean-result.csv', index=False)

In how many ways can $7$ people sit around a round table if no two of the $3$ people Pierre, Rosa, and Thomas can sit next to each other? (Seating arrangements which are rotations of each other are treated as the same.)
<Step>There are a total of $7$ people, so there are $7!$ ways to seat all of them with no restrictions.</Step>
<Step>Yes, and let's make sure that they are not sitting next to each other.</Step>
<Step>That's a good idea. There are $7$ seats, so there are $7$ ways for the first person to be seated.</Step>
<Step>This leaves us with $7 - (3+2) = 2$ people left to seat.</Step>
<Step>First, let's take the case where Rosa is two seats from Pierre.</Step>
<Step>That means that she is either one seat to the left or one seat to the right of Pierre.</Step>
<Step>And there are two ways to seat Thomas: he can be in the seat next to Pierre, or in the seat next to Rosa.</Step>
<Step>Yes, and there is only one way for Pierre, Rosa, and Thomas to sit in these seats.</Step>
<Step>From t

Processing:   0%|          | 0/1 [00:00<?, ?it/s]

Verifying step 2/15 with context from 1 previous steps
Verifying step 3/15 with context from 2 previous steps
Verifying step 4/15 with context from 3 previous steps
Verifying step 5/15 with context from 4 previous steps
Verifying step 6/15 with context from 5 previous steps
Verifying step 7/15 with context from 6 previous steps
Verifying step 8/15 with context from 7 previous steps
Verifying step 9/15 with context from 8 previous steps
Verifying step 10/15 with context from 9 previous steps
Verifying step 11/15 with context from 10 previous steps
Verifying step 12/15 with context from 11 previous steps
Verifying step 13/15 with context from 12 previous steps
Verifying step 14/15 with context from 13 previous steps
Verifying step 15/15 with context from 14 previous steps
INFO:textgrad:TextualVerifier:merge_verified_steps Merging verified steps...
INFO:textgrad:TextualVerifier:make_decision Making final decision...


Processing: 100%|██████████| 1/1 [01:41<00:00, 101.50s/it]

[X] Original is insufficient - using verified version
[V] Verification complete!
<Step>There are a total of $7$ people, so there are $7! = 5040$ ways to seat all of them with no restrictions.</Step>
<Step>We need to find the number of ways to seat 7 people such that Pierre, Rosa, and Thomas are not seated next to each other.</Step>
<Step>We should consider the complementary problem: find the number of ways where Pierre, Rosa, and Thomas ARE seated together, and subtract this from the total number of ways to seat 7 people ($7!$).  Let's treat Pierre, Rosa, and Thomas as a single block.</Step>
<Step>There are $7 - 3 = 4$ other people besides Pierre, Rosa, and Thomas.  Treating Pierre, Rosa, and Thomas as a single block, we have $4+1 = 5$ entities to arrange.</Step>
<Step>There are $5! = 120$ ways to arrange the 5 entities (4 people + the block of Pierre, Rosa, and Thomas).  Within the block, there are $3! = 6$ ways to arrange Pierre, Rosa, and Thomas.</Step>
<Step>There are $5! \times 3!


