# Automating Validation of LLM Responses for Sokoban Plans
Our goal is to automate the validation of LLM responses.
<br>One way is by using another LLM to convert the responses into a structured JSON format. However, we need to come up with a JSON schema first.
<br>Then, we can use a Python script to validate the JSON responses against the schema.     

In [2]:
# pip install jsonschema
from jsonschema import validate  # for validating JSON objects against a schema

In [3]:
# JSON schema for Sokoban plan explanations
# The schema defines the structure of the JSON object that represents a Sokoban plan explanation.
# The schema also specifies the constraints that the JSON object should adhere to.
# The constraints include the allowed values for certain fields and the relationships between different fields.
# Note: There are 4 types of questions: planValidity, actionTaken, actionNotTaken, and actionComparison.
schema = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "questionType": {
      "type": "string",
      "enum": ["planValidity", "actionTaken", "actionNotTaken", "actionComparison"]
    },
    "conclusion": {
      "type": "string",
      "enum": ["valid", "invalid", "partially valid", "inconclusive"]
    },
    "reasoning": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "step": { "type": "integer" },
          "explanation": { "type": "string" },
          "relevantAction": { "type": "string" },
          "preconditionsSatisfied": { "type": "boolean" },
          "effectsConsistent": { "type": "boolean" }
        },
        "required": ["step", "explanation"]
      }
    },
    "referencedGameElements": {
      "type": "array",
      "items": { "type": "string" }
    },
    "confidenceScore": {
      "type": "number",
      "minimum": 0,
      "maximum": 1
    },
    "relevantActions": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["questionType", "conclusion", "reasoning", "relevantActions"]
}

In [4]:
# Sample LLM Answer to the question type of why action A was chosen over action B:
# The action "pushleft sokoban l19 l18 l17 crate2" is used in the solution instead of "pushright sokoban l19 l18 l17 crate2" because they have the same effects and their preconditions are almost identical. The only difference is that the "pushleft" action requires "leftOf l18 l19" whereas the "pushright" action requires "leftOf l19 l18", but the initial state establishes "leftOf l18 l19", making the "pushleft" action applicable and not the "pushright" one.

# Translation into JSON, adhering the above schema, by another LLM:
answer = {
  "questionType": "actionComparison",
  "conclusion": "valid",
  "reasoning": [
    {
      "step": 1,
      "explanation": "The actions 'pushleft sokoban l19 l18 l17 crate2' and 'pushright sokoban l19 l18 l17 crate2' have the same effects.",
      "relevantAction": "pushleft sokoban l19 l18 l17 crate2"
    },
    {
      "step": 2,
      "explanation": "The preconditions of both actions are almost identical, with one key difference.",
      "relevantAction": "pushright sokoban l19 l18 l17 crate2"
    },
    {
      "step": 3,
      "explanation": "The 'pushleft' action requires 'leftOf l18 l19', while the 'pushright' action requires 'leftOf l19 l18'.",
      "preconditionsSatisfied": True,
      "effectsConsistent": True
    },
    {
      "step": 4,
      "explanation": "The initial state establishes 'leftOf l18 l19', making the 'pushleft' action applicable and not the 'pushright' one.",
      "preconditionsSatisfied": True,
      "effectsConsistent": True
    }
  ],
  "referencedGameElements": ["sokoban", "l19", "l18", "l17", "crate2"],
  "confidenceScore": 0.95,
  "relevantActions": [
    "pushleft sokoban l19 l18 l17 crate2",
    "pushright sokoban l19 l18 l17 crate2"
  ]
}

In [None]:
def validate_sokoban_explanation(explanation, initial_state, question_info, plan=None):
    # Existing schema validation checks...

    # New semantic checks
    if explanation['questionType'] == 'planValidity':
        if plan is None:
            raise ValueError("Plan must be provided for planValidity questions")
        
        # Check if conclusion matches reasoning
        invalid_steps = [step for step in explanation['reasoning'] if step.get('preconditionsSatisfied') == False]
        if invalid_steps and explanation['conclusion'] != 'invalid':
            raise ValueError("Conclusion should be 'invalid' if any steps have unsatisfied preconditions")
        
        # Check if all mentioned steps are in the plan
        mentioned_steps = [step['relevantAction'] for step in explanation['reasoning'] if 'relevantAction' in step]
        if not all(action in plan for action in mentioned_steps):
            raise ValueError("All mentioned steps should be in the provided plan")
        
        # Check if all invalid steps are explained
        explained_invalid_steps = set(step['relevantAction'] for step in invalid_steps)
        if len(explained_invalid_steps) < len(invalid_steps):
            raise ValueError("All invalid steps should have a unique explanation")
        
        # Check if referenced game elements are consistent
        plan_elements = set()
        for action in plan:
            plan_elements.update(action.split())
        if not set(explanation['referencedGameElements']).issubset(plan_elements):
            raise ValueError("All referenced game elements should appear in the plan")

    # Additional checks for other question types can be added here...

    print("Validation successful!")

In [9]:
# Consists of the objects and the initial states
initial_state = {
    "sokoban", "crate1", "crate2", "l1", "l10", "l11", "l12", "l17", "l18", "l19", "l22", 
    "l23", "l24", "l29", "l30", "l31", "l32", "l33", "l36", "l37", "l38", "l39", "l40"}

initial_state.update(
    {"crate crate1", "crate crate2", "leftOf l10 l11", "leftOf l11 l12", "leftOf l17 l18", 
    "leftOf l18 l19", "leftOf l22 l23", "leftOf l23 l24", "leftOf l29 l30", "leftOf l30 l31", 
    "leftOf l31 l32", "leftOf l32 l33", "leftOf l36 l37", "leftOf l37 l38", "leftOf l38 l39", 
    "leftOf l39 l40", "below l17 l10", "below l18 l11", "below l19 l12", "below l24 l17", 
    "below l29 l22", "below l30 l23", "below l31 l24", "below l36 l29", "below l37 l30", 
    "below l38 l31", "below l39 l32", "below l40 l33", "at sokoban l19", "at crate1 l17", 
    "at crate2 l18", "clear l1", "clear l10", "clear l11", "clear l12", "clear l22", 
    "clear l23", "clear l24", "clear l29", "clear l30", "clear l31", "clear l32", 
    "clear l33", "clear l36", "clear l37", "clear l38", "clear l39", "clear l40"}
)

Call the validation function with the answer, initial state, and question information to validate the explanation.
<br>For "actionComparison", it includes both actions being compared.
<br>For "actionTaken" and "actionNotTaken", it includes the single action in question.
<br>For "planValidity", it's an empty array since no specific actions are relevant to the question.

In [10]:
# Example for action comparison question
question_info_comparison = {
    "type": "actionComparison",
    "actions": ["pushleft sokoban l19 l18 l17 crate2", "pushright sokoban l19 l18 l17 crate2"]
}
validate_sokoban_explanation(answer, initial_state, question_info_comparison)

Validation successful!


In [49]:
# # Example for action taken question
# question_info_taken = {
#     "type": "actionTaken",
#     "action": "pushleft sokoban l19 l18 l17 crate2"
# }
# validate_sokoban_explanation(answer, initial_state, question_info_taken)

In [48]:
# # Example for plan validity question
# question_info_validity = {
#     "type": "planValidity"
# }
# validate_sokoban_explanation(answer, initial_state, question_info_validity)