# Experiment 1

The goal of this experiment is to test whether a model can recognize a legal action given a state. 

Input:  A state s and a legal action $x_t$ 

Output: The model's judgment on whether the action is legal or not. 

In [12]:
from openai import OpenAI
from pydantic import BaseModel
import json
import pandas as pd
from tqdm import tqdm

import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

In [13]:
prompt = """
You are a blockworld planner. You are given a state, and an action. You must determine if the action is feasible given the current state. If it is, you must return True. If it is not, you must return False, and explain your reasoning.
The possible commands in the game are:
- (pick-up b1): Pick up block b1.
- (put-down b1): Put down block b1.
- (stack b1 b2): Stack block b1 on top of block b2.
- (unstack b1 b2): Unstack block b1 from block b2.
Here are a few examples of states and actions:
state: (on b2 b1) (arm-empty) (on b1 b3) (on-table b3) (clear b2); action: (unstack b2 b1). Return True. Explanation: The action is legal because the arm is empty, the block b2 is on b1, and the block b2 is clear.
state: (on b1 b3) (holding b2) (clear b1) (on-table b3); action: (put-down b2). Return True. Explanation: The action is legal because the arm is holding b2.
state: (on-table b2) (arm-empty) (on b1 b3) (clear b1) (on-table b3) (clear b2); action: (unstack b1 b3). Return True. Explanation: The action is legal because the arm is empty, the block b1 is on b3, and the block b1 is clear.
state: (on-table b1) (on-table b2) (clear b2) (holding b3) (clear b1); action: (stack b1 b3). Return False. Explanation: The action is not legal because the arm is holding b3 and b1 is on the table, so cannot be stacking b1 on b3.
state: (on-table b1) (on-table b2) (clear b2) (on b3 b1) (arm-empty); action: (pick-up b1). Return False. Explanation: The action is not legal because b3 is on b1, so b1 cannot be picked up.
state: (on b2 b1) (arm-empty) (clear b3) (on-table b3) (on-table b1) (clear b2); action: (stack b2 b1). Return False. Explanation: The action is not legal because b2 is alreadyon b1, so b2 cannot be stacked on b1.
"""

class Response(BaseModel):
    result: bool
    explanation: str

def call_model(state, action):
    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        response_format= Response,
        messages=[
            {"role": "system", "content": prompt},
            {"role": "user", "content": f"state: {state}; action: {action}"}
        ],
        temperature=0.0,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )
    return response.choices[0].message.parsed.result, response.choices[0].message.parsed.explanation


Parse dataset to get pairs of states and actions

In [15]:
with open("blockworld_dataset.json", "r") as f:
    dataset = json.load(f)[1:1000]
df = pd.DataFrame(columns=["state", "action", "result", "explanation", "correct"])
for group in tqdm(dataset):
    for state, action in zip(group["states"], group["actions"]):
        result, explanation = call_model(state, action)
        if result:
            correct = 1
        else:
            correct = 0
        df.loc[len(df)] = {"state": state, "action": action, "result": result, "explanation": explanation, "correct": correct}

df.to_csv("experiment1.csv", index=False)


 88%|████████▊ | 879/999 [1:54:15<15:35,  7.80s/it]    


RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

In [16]:
df.to_csv("experiment1.csv", index=False)
accuracy = df["correct"].sum() / len(df)
print(f"Accuracy: {accuracy}")

Accuracy: 0.950497287522604
