# Solution Optimization Evaluaton TV TextGrad

In [1]:
import pandas as pd
import textgrad as tg
from textgrad.engine import get_engine
from textgrad.variable import Variable
from textgrad.optimizer import TextualGradientDescent
from textgrad.verifier import TextualVerifier
from textgrad.loss import TextLoss

## Load Datasets

In [2]:
initial_solution = pd.read_csv("csv/initial_solution.csv")
initial_solution

Unnamed: 0,id,formatted_question,raw_solution,correct_answer,source,subject
0,4,Answer the following multiple choice question....,Maxwell's equations in our universe are:\n\n1....,A,GPQA-Diamond,-
1,2,Answer the following multiple choice question....,Here's how we can determine the number of carb...,B,GPQA-Diamond,-
2,10,Answer the following multiple choice question....,We need to determine which planet has the high...,C,GPQA-Diamond,-
3,1,Answer the following multiple choice question....,The energy-time uncertainty principle states t...,C,GPQA-Diamond,-
4,9,Answer the following multiple choice question....,Let's analyze the symmetry of each molecule:\n...,D,GPQA-Diamond,-
5,6,Answer the following multiple choice question....,"The potential is given by:\nV(r, θ) = (1/2)kr^...",B,GPQA-Diamond,-
6,8,Answer the following multiple choice question....,Here's how we can analyze the results and dete...,B,GPQA-Diamond,-
7,3,Answer the following multiple choice question....,The given state is $|\psi\rangle = 0.5|\uparro...,B,GPQA-Diamond,-
8,7,Answer the following multiple choice question....,The process described is $\gamma\gamma\rightar...,C,GPQA-Diamond,-
9,5,Answer the following multiple choice question....,Here's how we can find the eigenvector:\n\n1. ...,A,GPQA-Diamond,-


In [3]:
# Test size only 50 rows each datasets (Total 150 rows)

df_gpqa = initial_solution[initial_solution['source'] == 'GPQA-Diamond'].head(50)
df_mmlu_ml = initial_solution[initial_solution['source'] == 'MMLU-ML'].head(50)
df_mmlu_cp = initial_solution[initial_solution['source'] == 'MMLU-CP'].head(50)
df_test = pd.concat([df_gpqa, df_mmlu_ml, df_mmlu_cp], ignore_index=True)

df_test

Unnamed: 0,id,formatted_question,raw_solution,correct_answer,source,subject
0,4,Answer the following multiple choice question....,Maxwell's equations in our universe are:\n\n1....,A,GPQA-Diamond,-
1,2,Answer the following multiple choice question....,Here's how we can determine the number of carb...,B,GPQA-Diamond,-
2,10,Answer the following multiple choice question....,We need to determine which planet has the high...,C,GPQA-Diamond,-
3,1,Answer the following multiple choice question....,The energy-time uncertainty principle states t...,C,GPQA-Diamond,-
4,9,Answer the following multiple choice question....,Let's analyze the symmetry of each molecule:\n...,D,GPQA-Diamond,-
5,6,Answer the following multiple choice question....,"The potential is given by:\nV(r, θ) = (1/2)kr^...",B,GPQA-Diamond,-
6,8,Answer the following multiple choice question....,Here's how we can analyze the results and dete...,B,GPQA-Diamond,-
7,3,Answer the following multiple choice question....,The given state is $|\psi\rangle = 0.5|\uparro...,B,GPQA-Diamond,-
8,7,Answer the following multiple choice question....,The process described is $\gamma\gamma\rightar...,C,GPQA-Diamond,-
9,5,Answer the following multiple choice question....,Here's how we can find the eigenvector:\n\n1. ...,A,GPQA-Diamond,-


## Experiment

In [4]:
engine = get_engine("gemini-1.5-pro")
tg.set_backward_engine("gemini-1.5-pro", override=True)

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
def evaluate_with_raw_textgrad(row_data):
    match = initial_solution[initial_solution["id"] == row_data["id"]]
    if match.empty:
        return None  # or raise error
    formatted_question = match.iloc[0]["formatted_question"]
    result = {
        "id": row_data["id"],
        "raw_solution": row_data["raw_solution"],
        "correct_answer": row_data["correct_answer"],
        "source": row_data["source"],
        "subject": row_data["subject"]
    }
    
    solution = Variable(row_data["raw_solution"],
                    requires_grad=True,
                    role_description=f"Solution to the math question: {formatted_question}")
    loss_system_prompt = Variable("""You will evaluate a solution to a math question. 
                                    Do not attempt to solve it yourself, do not give a solution, 
                                    only identify errors. Be super concise.""",
                                    requires_grad=False,
                                    role_description="system prompt")
    optimizer = TextualGradientDescent([solution])
    loss = TextLoss(loss_system_prompt, engine=engine)

    # TextualVerifier
    verifier = TextualVerifier(verifier_engine=engine, step_eval_iterations=3, logger=False)
    
    # Iterate 5 times
    for i in range(1, 6):
        optimizer.zero_grad()  # Clean gradients
        loss_result = loss(solution)

        # TextualVerifier
        verified_result = verifier.verify(instance=solution, 
                                    prompt=loss_system_prompt,
                                    calculation=loss_result)
        loss_result.set_value(verified_result.value) 
        
        loss_result.backward()
        optimizer.step()
        result[f"solution_{i}"] = solution.value

    return result

## Running Evaluation

### TV TextGrad

In [6]:
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm
import time

results = []
start_time = time.time()

with ThreadPoolExecutor(max_workers=32) as executor:
    # Submit all tasks
    futures = [
        executor.submit(evaluate_with_raw_textgrad, row.to_dict()) 
        for _, row in df_test.iterrows()
    ]
    
    # Use tqdm for progress tracking
    for future in tqdm(as_completed(futures), total=len(futures), desc="Processing"):
        result = future.result()
        if result is not None:
            results.append(result)

raw_textgrad = pd.DataFrame(results)

print(f"Completed in {time.time() - start_time:.1f} seconds")
raw_textgrad.to_csv('results/tv_textgrad.csv', index=False)

Processing:   0%|          | 0/10 [00:00<?, ?it/s]

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> Here\'s how we can determine the number of carbon atoms in product 3:\n\n1. **Reaction 1:** trans-Cinnamaldehyde (C9H8O) reacts with methylmagnesium bromide (CH3MgBr), a Grignard reagent.  Grignard reagents add to the carbonyl carbon, forming an alcohol.  This adds one carbon atom (from the methyl group) to the molecule.  So, product 1 has 10 carbons (9 + 1 = 10).\n\n2. **Reaction 2:** Product 1 (the secondary alcohol) is treated with pyridinium chlorochromate (PCC). PCC is an oxidizing agent that converts secondary alcohols to ketones.  The number of carbon atoms remains the same. So, product 2 still has 10 carbons.\n\n3. **Reaction 3:** Product 2 (the ketone) is treat

Processing:  10%|█         | 1/10 [07:17<1:05:41, 437.92s/it]

["Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> **1. Converting to Cartesian Coordinates:**\nGiven the potential in polar coordinates:  V(r, θ) = (1/2)kr^2 + (3/2)kr^2 cos^2(θ)\nWe use the following transformations to convert to Cartesian coordinates: x = rcos(θ) and y = rsin(θ).  This also implies r^2 = x^2 + y^2.\n\nSubstituting for r^2 and cos(θ):\nV(x, y) = (1/2)k(x^2 + y^2) + (3/2)k(x^2 + y^2)(x^2/(x^2 + y^2))\nV(x, y) = (1/2)kx^2 + (1/2)ky^2 + (3/2)kx^2\nV(x, y) = 2kx^2 + (1/2)ky^2\n\n**2. Separating the Potential:**\nThe potential V(x, y) can be expressed as the sum of two independent functions, one depending only on x and the other only on y: V(x, y) = V_x(x) + V_y(y) where V_x(x) = 2kx^2 and V_y(y) = (1/2)ky

Processing:  40%|████      | 4/10 [08:59<08:04, 80.80s/it]   

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> Let\'s analyze the data, explicitly defining a baseline resistance of 100% for the wild-type.  We\'ll express the "effect" of a knockout mutation as the proportional reduction in resistance.  For example, Effect(g1) = 0.25 (reducing resistance from 100% to 75%).\n\n**1. G2\'s Essential Role:**  Any double mutant involving g2 results in 0% resistance. This strongly suggests G2 is essential for resistance and likely acts upstream of G1 and G3.\n\n**2. Evaluating G1 and G3 Interaction:**\n\n* **Additive Model:** Resistance = 1 - Effect(g1) - Effect(g3)\n    * Predicted g1g3 resistance: 1 - 0.25 - 0.50 = 0.25 (25%)\n    * Observed g1g3 resistance: 0.10 (10%)\n    * The addi

Processing:  50%|█████     | 5/10 [09:42<05:35, 67.15s/it]

["Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> Let's analyze the density of each exoplanet using the formula density = mass/volume, where volume = (4/3) * pi * radius^3:\n\na) This planet has the same mass and radius as Earth. Therefore, its density is approximately the same as Earth's, which is about 5.5 g/cm³.\n\nb) This planet has a given density of 5.5 g/cm³.\n\nc) This planet has the same composition as Earth but 5 times the mass.  While the composition is the same, we do *not* know the radius.  Increased mass leads to gravitational compression, which would tend to increase density. However, a larger radius would decrease density.  Since we don't know the radius, we cannot calculate the density.\n\nd) This plan

Processing:  60%|██████    | 6/10 [09:56<03:16, 49.07s/it]

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> Here\'s a step-by-step analysis of how Maxwell\'s equations change with magnetic monopoles, followed by an evaluation of the multiple-choice options:\n\n**Original Maxwell\'s Equations:**\n\n1. ∇ ⋅ **E** = ρ/ε₀ (Gauss\'s Law for Electricity)\n2. ∇ ⋅ **B** = 0 (Gauss\'s Law for Magnetism)\n3. ∇ × **E** = - ∂**B**/∂t (Faraday\'s Law of Induction)\n4. ∇ × **B** = μ₀**J** + μ₀ε₀ ∂**E**/∂t (Ampère-Maxwell\'s Law)\n\n**Maxwell\'s Equations with Magnetic Monopoles:**\n\n1. ∇ ⋅ **E** = ρ/ε₀  (No change)\n2. ∇ ⋅ **B** = ρₘ/μ₀ (Modified:  ρₘ represents magnetic charge density)\n3. ∇ × **E** = - ∂**B**/∂t - μ₀**J**ₘ (Modified: **J**ₘ represents magnetic current density)\n4. ∇ × **

Processing:  70%|███████   | 7/10 [10:50<02:31, 50.63s/it]

["Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> Here's how we find the eigenvector:\n\n1. **Define the operator:** The operator $\\vec{P}$ along an arbitrary direction $\\vec{n}$ in the x-z plane can be written as $\\vec{P} = P_x \\sin\\theta + P_z \\cos\\theta$, where $\\theta$ is the angle between $\\vec{n}$ and the z-axis.  We are given:\n\n   $P_x = \\frac{\\hbar}{2} \\begin{pmatrix} 0 & 1 \\\\ 1 & 0 \\end{pmatrix}$\n   $P_z = \\frac{\\hbar}{2} \\begin{pmatrix} 1 & 0 \\\\ 0 & -1 \\end{pmatrix}$\n\n   Therefore, $\\vec{P} = \\frac{\\hbar}{2} \\begin{pmatrix} \\cos\\theta & \\sin\\theta \\\\ \\sin\\theta & -\\cos\\theta \\end{pmatrix}$.\n\n2. **Eigenvalue equation:** We are looking for the eigenvector $v = \\begin{

Processing:  90%|█████████ | 9/10 [11:17<00:30, 30.63s/it]

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> Let\'s analyze the symmetry of each molecule systematically:\n\n**A) Triisopropyl borate (B(OCH(CH3)2)3):**  (Assuming time-averaged structure due to free rotation of isopropyl groups)\n\n1. **Cn:** C3 rotation axis exists perpendicular to the BO3 plane.\n2. **C2 perpendicular to C3:** No C2 axes perpendicular to the C3 axis.  Rotating the molecule by 120° about the C3 axis does not reveal any perpendicular C2 axes.\n3. **σh:** No horizontal mirror plane.\n4. **i:** No inversion center. Inversion through the boron atom does not yield an equivalent structure.\n5. **σv:** No vertical mirror planes.\n6. **Sn:** No Sn axes.\n\nTherefore, the point group is *C3*.\n\n**B) Qui

Processing: 100%|██████████| 10/10 [11:44<00:00, 70.46s/it]

Completed in 704.7 seconds



