# Solution Optimization Evaluaton TV3 TextGrad

In [1]:
import pandas as pd
import textgrad as tg
from textgrad.engine import get_engine
from textgrad.variable import Variable
from textgrad.optimizer import TextualGradientDescent
from textgrad.verifier import TextualVerifierV4
from textgrad.loss import TextLoss

## Load Datasets

In [2]:
initial_solution = pd.read_csv("csv/initial_solution.csv")
initial_solution

Unnamed: 0,id,formatted_question,raw_solution,correct_answer,source,subject
0,2,Answer the following multiple choice question....,Here's how we can determine the number of carb...,A,GPQA-Diamond,-
1,4,Answer the following multiple choice question....,Maxwell's equations in our universe are:\n\n1....,A,GPQA-Diamond,-
2,8,Answer the following multiple choice question....,Here's how we can analyze the results and dete...,B,GPQA-Diamond,-
3,1,Answer the following multiple choice question....,The energy-time uncertainty principle states t...,A,GPQA-Diamond,-
4,22,Answer the following multiple choice question....,The question asks about the oxidizing power of...,D,GPQA-Diamond,-
...,...,...,...,...,...,...
407,394,Answer the following multiple choice question....,The police car is moving towards the wall. Le...,B,MMLU-CP,college_physics
408,384,Answer the following multiple choice question....,Here's how we can solve this problem:\n\n1. **...,A,MMLU-CP,college_physics
409,404,Answer the following multiple choice question....,The diffraction of electrons by a crystal latt...,A,MMLU-CP,college_physics
410,390,Answer the following multiple choice question....,Here's how we can solve this problem:\n\n1. **...,D,MMLU-CP,college_physics


In [3]:
# Test size only 50 rows each datasets (Total 150 rows)

df_gpqa = initial_solution[initial_solution['source'] == 'GPQA-Diamond'].head(50)
df_mmlu_ml = initial_solution[initial_solution['source'] == 'MMLU-ML'].head(50)
df_mmlu_cp = initial_solution[initial_solution['source'] == 'MMLU-CP'].head(50)
df_test = pd.concat([df_gpqa, df_mmlu_ml, df_mmlu_cp], ignore_index=True)

df_test

Unnamed: 0,id,formatted_question,raw_solution,correct_answer,source,subject
0,2,Answer the following multiple choice question....,Here's how we can determine the number of carb...,A,GPQA-Diamond,-
1,4,Answer the following multiple choice question....,Maxwell's equations in our universe are:\n\n1....,A,GPQA-Diamond,-
2,8,Answer the following multiple choice question....,Here's how we can analyze the results and dete...,B,GPQA-Diamond,-
3,1,Answer the following multiple choice question....,The energy-time uncertainty principle states t...,A,GPQA-Diamond,-
4,22,Answer the following multiple choice question....,The question asks about the oxidizing power of...,D,GPQA-Diamond,-
...,...,...,...,...,...,...
145,339,Answer the following multiple choice question....,The proton is initially accelerated through a ...,D,MMLU-CP,college_physics
146,388,Answer the following multiple choice question....,Einstein's theory of the photoelectric effect ...,D,MMLU-CP,college_physics
147,364,Answer the following multiple choice question....,We are given that the mass of object B is twic...,C,MMLU-CP,college_physics
148,380,Answer the following multiple choice question....,"The electric displacement current, denoted by ...",A,MMLU-CP,college_physics


## Experiment

In [4]:
engine = get_engine("gemini-1.5-pro")
tg.set_backward_engine("gemini-1.5-pro", override=True)

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
def evaluate_with_raw_textgrad(row_data):
    match = initial_solution[initial_solution["id"] == row_data["id"]]
    if match.empty:
        return None  # or raise error
    formatted_question = match.iloc[0]["formatted_question"]
    result = {
        "id": row_data["id"],
        "raw_solution": row_data["raw_solution"],
        "correct_answer": row_data["correct_answer"],
        "source": row_data["source"],
        "subject": row_data["subject"]
    }
    
    solution = Variable(row_data["raw_solution"],
                    requires_grad=True,
                    role_description=f"Solution to the math question: {formatted_question}")
    loss_system_prompt = Variable("""You will evaluate a solution to a math question. 
                                    Do not attempt to solve it yourself, do not give a solution, 
                                    only identify errors. Be super concise.""",
                                    requires_grad=False,
                                    role_description="system prompt")
    optimizer = TextualGradientDescent([solution])
    loss = TextLoss(loss_system_prompt, engine=engine)

    # TextualVerifierV4
    verifier = TextualVerifierV4(verifier_engine=engine, step_eval_iterations=3, logger=False)
    
    # Iterate 5 times
    for i in range(1, 6):
        optimizer.zero_grad()  # Clean gradients
        loss_result = loss(solution)

        # TextualVerifierV4
        verified_result = verifier.verify(instance=solution, 
                                    prompt=loss_system_prompt,
                                    calculation=loss_result)
        loss_result.set_value(verified_result.value) 
        
        loss_result.backward()
        optimizer.step()
        result[f"solution_{i}"] = solution.value

    return result

## Running Evaluation

### TV TextGrad

In [6]:
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm
import time

results = []
start_time = time.time()

with ThreadPoolExecutor(max_workers=128) as executor:
    # Submit all tasks
    futures = [
        executor.submit(evaluate_with_raw_textgrad, row.to_dict()) 
        for _, row in initial_solution[:10].iterrows()
    ]
    
    # Use tqdm for progress tracking
    for future in tqdm(as_completed(futures), total=len(futures), desc="Processing"):
        result = future.result()
        if result is not None:
            results.append(result)

raw_textgrad = pd.DataFrame(results)

print(f"Completed in {time.time() - start_time:.1f} seconds")
raw_textgrad.to_csv('results/tv4_textgrad.csv', index=False)

I0000 00:00:1748862626.454101 2437646 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers
Processing:   0%|          | 0/10 [00:00<?, ?it/s]

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> Here\'s how to determine the correct answer:\n\n**Step 1: Analyze tautomerism**\n\nTautomerism is a type of isomerism where a rapid equilibrium exists between two constitutional isomers.  It usually involves the movement of a proton and a shift of a double bond.  Cyclohexane-1,3,5-trione can exist in equilibrium with its enol forms.  Benzoquinone, however, does not have any hydrogens alpha to the carbonyl groups that can readily participate in tautomerism.\n\n**Step 2: Analyze optical isomerism**\n\nOptical isomerism arises when a molecule has a chiral center – a carbon atom bonded to four different groups.  Methyl 2-hydroxypropanoate has a chiral center (the carbon bon

Processing:  10%|█         | 1/10 [01:02<09:20, 62.27s/it]

["Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> * The reaction is a Diels-Alder reaction, a [4+2] cycloaddition.\n* 2,5-dimethylthiophene acts as the diene.\n* Furan-2,5-dione (maleic anhydride) acts as the dienophile.\n* The reaction proceeds with heat.\n* The product will be a bicyclic structure.\n* The two methyl groups on the thiophene will end up on the bridgehead carbons of the bicyclic product.\n* Since the methyl groups are at positions 2 and 5 on the thiophene, they will be on the same side of the newly formed six-membered ring in the product. This means they will have a *cis* relationship.\n* The oxygen bridge from the maleic anhydride will be *syn* to the sulfur in the thiophene ring.  This leads to the *e

Processing:  20%|██        | 2/10 [01:40<06:25, 48.20s/it]

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> The energy-time uncertainty principle states that ΔE * Δt >= ħ/2. To distinguish two energy levels, their difference must be greater than the uncertainty in each energy level.\n\nGiven lifetimes (Δt) of 10⁻⁹ s and 10⁻⁸ s, and ħ ≈ 6.58 * 10⁻¹⁶ eV⋅s:\n\nΔE₁ = ħ / (2 * Δt₁) = (6.58 * 10⁻¹⁶ eV⋅s) / (2 * 10⁻⁹ s) ≈ 3.29 * 10⁻⁷ eV\nΔE₂ = ħ / (2 * Δt₂) = (6.58 * 10⁻¹⁶ eV⋅s) / (2 * 10⁻⁸ s) ≈ 3.29 * 10⁻⁸ eV\n\nThe energy difference must be greater than *each* of these uncertainties to clearly resolve the two levels. If the energy difference is smaller than either uncertainty, the energy levels will essentially overlap and be indistinguishable.\n\nA) 10⁻⁴ eV > 3.29 * 10⁻⁷ eV and 1

Processing:  30%|███       | 3/10 [01:47<03:25, 29.40s/it]

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> Here\'s how we can determine the number of carbon atoms in product 3:\n\n1. **Reaction 1:** trans-Cinnamaldehyde (C9H8O) reacts with methylmagnesium bromide (CH3MgBr), a Grignard reagent. Grignard reagents add to the carbonyl carbon, forming an alcohol. This adds one carbon atom (from the methyl group) to the molecule. So, product 1 has 9 + 1 = 10 carbon atoms.\n\n2. **Reaction 2:** Product 1 (the secondary alcohol) is treated with pyridinium chlorochromate (PCC). PCC is an oxidizing agent that converts secondary alcohols to ketones. The number of carbon atoms remains the same. So, product 2 still has 10 carbon atoms.\n\n3. **Reaction 3:** There appears to be a typo in 

Processing:  40%|████      | 4/10 [02:22<03:08, 31.49s/it]

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> The energy of the emitted light is given as 2.3393 eV.  The question asks for the color of light *absorbed* by the compound.  Since the dye *emits* light at 2.3393 eV, this is the energy *lost* by the molecule when it transitions from an excited state to a lower energy state.  Therefore, the molecule must *absorb* light of the same energy to reach the excited state.\n\n1. Convert eV to Joules:\nE = 2.3393 eV * (1.602 x 10^-19 J/eV) = 3.748 x 10^-19 J\n\n2. Calculate the wavelength:\nλ = h * c / E\nλ = (6.626 x 10^-34 J s) * (3 x 10^8 m/s) / (3.748 x 10^-19 J)\nλ ≈ 5.30 x 10^-7 m\nλ ≈ 530 nm\n\nThis wavelength (530 nm) corresponds to green light.\n\nConsidering the given

Processing:  50%|█████     | 5/10 [02:38<02:09, 26.00s/it]

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> Here\'s how to determine the correct answer:\n\n**Step 1: Analyze tautomerism**\n\nTautomerism is a type of isomerism where a rapid equilibrium exists between two constitutional isomers. It usually involves the movement of a proton and a shift of a double bond. Cyclohexane-1,3,5-trione can exist in equilibrium with its enol forms because it has alpha hydrogens (hydrogens on a carbon adjacent to a carbonyl group) that can participate in tautomerism.  Benzoquinone, however, does not have any such alpha hydrogens. Therefore, benzoquinone does *not* exhibit tautomerism.\n\n**Step 2: Analyze optical isomerism**\n\nOptical isomerism arises when a molecule has a chiral center 

Processing:  60%|██████    | 6/10 [02:43<01:15, 18.95s/it]

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> 1. **G2\'s Importance:** The g2 mutant (and any double mutant including g2) shows 0% resistance. This strongly suggests G2 is essential for resistance and thus the most likely candidate for the transcription factor.\n\n2. **G1 and G3\'s Roles:** Individually, g1 and g3 mutants show some resistance (75% and 50%, respectively).  However, the g1g3 double mutant shows drastically reduced resistance (10%). This synergistic interaction, where the combined effect is much less than the sum of the individual effects, suggests G1 and G3 work together in the same pathway or through complementary mechanisms, contributing to resistance. This is not redundancy, where one gene could c

Processing:  70%|███████   | 7/10 [02:49<00:44, 14.75s/it]

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> The energy of the emitted light is given as 2.3393 eV. The question asks for the color of light *absorbed* by the compound. Since the dye *emits* light at 2.3393 eV, this is the energy *lost* by the molecule when it transitions from an excited state to a lower energy state. Therefore, the molecule must *absorb* light of the same energy to reach the excited state.\n\n1. Convert eV to Joules:\nE = 2.3393 eV * (1.602 x 10^-19 J/eV) = 3.748 x 10^-19 J\n\n2. Calculate the wavelength:\nλ = h * c / E\nλ = (6.626 x 10^-34 J s) * (3 x 10^8 m/s) / (3.748 x 10^-19 J)\nλ ≈ 5.30 x 10^-7 m\nλ ≈ 530 nm\n\nThis wavelength (530 nm) corresponds to green light.\n\nThe provided options are

Processing:  90%|█████████ | 9/10 [03:05<00:10, 10.81s/it]

['Here is a conversation:\n\n<CONVERSATION><LM_SYSTEM_PROMPT> You will evaluate a solution to a math question. \n                                    Do not attempt to solve it yourself, do not give a solution, \n                                    only identify errors. Be super concise. </LM_SYSTEM_PROMPT>\n\n<LM_INPUT> Let\'s analyze the electrochemical behavior of oxygen in acidic and basic solutions, addressing both thermodynamic and kinetic aspects.\n\n**Thermodynamically (Oxidizing Power):**\n\nThe oxidizing power of a species is reflected in its standard reduction potential (E°). A higher E° value indicates a stronger oxidant.\n\n* **In acidic solutions:** The half-reaction is O₂ + 4H⁺ + 4e⁻ → 2H₂O, with E° = +1.23 V.\n* **In basic solutions:** The half-reaction is O₂ + 2H₂O + 4e⁻ → 4OH⁻, with E° = +0.40 V.\n\nSince the E° value is higher in acidic conditions (+1.23 V) compared to basic conditions (+0.40 V), oxygen is a *stronger* oxidant in acidic solutions and consequently a *w

Processing: 100%|██████████| 10/10 [03:09<00:00, 18.93s/it]

Completed in 189.4 seconds



