When the ai considers whether to step back or not, does its one-word answer differ from its CoT answer?

In [1]:
from dataclasses import dataclass
import sample

@dataclass
class Result:
    context: sample.Context
    one_token_cpc_result: str
    cot_cpc_result: str

from dotenv import load_dotenv
load_dotenv()

from llm import LLM
from openai import OpenAI
llm = LLM(OpenAI(), "gpt-3.5-turbo")

Each 'passage' is a lengthy text where we are reasoning through a problem.
Consider progressively larger context parts of each passage (iterating through 'checkpoints' in the passage):

In [2]:
import json
passages = json.load(open("data/passages1.json"))
checkpoints = (text for document in passages for text in sample.checkpoints(document, 1000))

and for each context part, ask the llm if the current approach is working or not.

In [3]:
results = (
    Result(
        context=context,
        one_token_cpc_result=perform_one_token_cpc(llm, context),
        cot_cpc_result=perform_cot_cpc(llm, context)
    )
    for context in checkpoints
)

For each result, determine whether the result is good (the two cpc methods agreed) or bad (they disagreed).

In [4]:
from solver import perform_one_token_cpc, perform_cot_cpc
from judge import JudgeResult

evaluations = (
    JudgeResult(
        result=result,
        score=1.0 if result.one_token_cpc_result.lower() == result.cot_cpc_result.lower() else 0.0
    )
    for result in results
)


In [5]:
for evaluation in evaluations:
    print(f"For context ending in '{evaluation.result.context.text[-90:]}'...")
    print(f"one_token_cpc_result={evaluation.result.one_token_cpc_result}, "
          f"cot_cpc_result={evaluation.result.cot_cpc_result}, score={evaluation.score}")

[Yes]
For context ending in 'sonal and individualized concept that each person must discover and define for themselves.'...
one_token_cpc_result=Possibly, cot_cpc_result=Yes, score=0.0
