# Automating Validation of LLM Responses for Sokoban Plans
Our goal is to automate the validation of LLM responses.
<br>One way is by using another LLM to convert the responses into a structured JSON format. However, we need to come up with a JSON schema first.
<br>Then, we can use a Python script to validate the JSON responses against the schema.     

In [2]:
# pip install jsonschema
from jsonschema import validate  # for validating JSON objects against a schema

In [3]:
# JSON schema for Sokoban plan explanations
# The schema defines the structure of the JSON object that represents a Sokoban plan explanation.
# The schema also specifies the constraints that the JSON object should adhere to.
# The constraints include the allowed values for certain fields and the relationships between different fields.
# Note: There are 4 types of questions: planValidity, actionTaken, actionNotTaken, and actionComparison.
schema = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "questionType": {
      "type": "string",
      "enum": ["planValidity", "actionTaken", "actionNotTaken", "actionComparison"]
    },
    "conclusion": {
      "type": "string",
      "enum": ["valid", "invalid", "partially valid", "inconclusive"]
    },
    "reasoning": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "step": { "type": "integer" },
          "explanation": { "type": "string" },
          "relevantAction": { "type": "string" },
          "preconditionsSatisfied": { "type": "boolean" },
          "effectsConsistent": { "type": "boolean" }
        },
        "required": ["step", "explanation"]
      }
    },
    "referencedGameElements": {
      "type": "array",
      "items": { "type": "string" }
    },
    "confidenceScore": {
      "type": "number",
      "minimum": 0,
      "maximum": 1
    },
    "relevantActions": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "required": ["questionType", "conclusion", "reasoning", "relevantActions"]
}

In [4]:
# Sample LLM Answer to the question type of why action A was chosen over action B:
# The action "pushleft sokoban l19 l18 l17 crate2" is used in the solution instead of "pushright sokoban l19 l18 l17 crate2" because they have the same effects and their preconditions are almost identical. The only difference is that the "pushleft" action requires "leftOf l18 l19" whereas the "pushright" action requires "leftOf l19 l18", but the initial state establishes "leftOf l18 l19", making the "pushleft" action applicable and not the "pushright" one.

# Translation into JSON, adhering the above schema, by another LLM:
answer = {
  "questionType": "actionComparison",
  "conclusion": "valid",
  "reasoning": [
    {
      "step": 1,
      "explanation": "The actions 'pushleft sokoban l19 l18 l17 crate2' and 'pushright sokoban l19 l18 l17 crate2' have the same effects.",
      "relevantAction": "pushleft sokoban l19 l18 l17 crate2"
    },
    {
      "step": 2,
      "explanation": "The preconditions of both actions are almost identical, with one key difference.",
      "relevantAction": "pushright sokoban l19 l18 l17 crate2"
    },
    {
      "step": 3,
      "explanation": "The 'pushleft' action requires 'leftOf l18 l19', while the 'pushright' action requires 'leftOf l19 l18'.",
      "preconditionsSatisfied": True,
      "effectsConsistent": True
    },
    {
      "step": 4,
      "explanation": "The initial state establishes 'leftOf l18 l19', making the 'pushleft' action applicable and not the 'pushright' one.",
      "preconditionsSatisfied": True,
      "effectsConsistent": True
    }
  ],
  "referencedGameElements": ["sokoban", "l19", "l18", "l17", "crate2"],
  "confidenceScore": 0.95,
  "relevantActions": [
    "pushleft sokoban l19 l18 l17 crate2",
    "pushright sokoban l19 l18 l17 crate2"
  ]
}

In [7]:
def validate_sokoban_explanation(explanation: dict, initial_state: set, question_info: dict):
    """
    Validate an explanation for a Sokoban plan against the JSON schema and additional constraints.
    @param explanation: The explanation to validate. Should be a dictionary representing a Sokoban plan explanation.
    @param initial_state: The initial state of the Sokoban plan. Should be a set of strings representing game elements.
        Note that the initial state consists of the objects and their initial positions.
    @param question_info: Information about the question that was asked. Should be a dictionary with keys 'type', 'action', and/or 'actions'.
    """
    # Validate against JSON schema
    validate(instance=explanation, schema=schema)  # Raises an exception if the explanation does not adhere to the schema
    
    # Check question type
    assert explanation['questionType'] in ['planValidity', 'actionTaken', 'actionNotTaken', 'actionComparison'], "Invalid question type"
    assert explanation['questionType'] == question_info['type'], "Question type doesn't match the provided question info"
    
    # Check conclusion
    assert explanation['conclusion'] in ['valid', 'invalid', 'partially valid', 'inconclusive'], "Invalid conclusion"
    
    # Check reasoning steps
    assert len(explanation['reasoning']) > 0, "Reasoning should have at least one step"
    
    # Check referenced game elements
    game_elements = set(explanation['referencedGameElements'])
    assert all(element in initial_state for element in game_elements), "All referenced game elements should be in the initial state"
    
    # Check actions based on question type
    if explanation['questionType'] == 'planValidity':
        assert len(explanation['relevantActions']) == 0, "Plan validity questions should not have relevant actions"
    
    elif explanation['questionType'] in ['actionTaken', 'actionNotTaken']:
        assert len(explanation['relevantActions']) == 1, f"{explanation['questionType']} questions should have exactly one relevant action"
        assert explanation['relevantActions'][0] == question_info['action'], "Relevant action doesn't match the provided question info"
    
    elif explanation['questionType'] == 'actionComparison':
        assert len(explanation['relevantActions']) == 2, "Action comparison questions should have exactly two relevant actions"
        assert set(explanation['relevantActions']) == set(question_info['actions']), "Relevant actions don't match the provided question info"
    
    # Check if relevant actions are mentioned in reasoning
    for action in explanation['relevantActions']:
        assert any(step.get('relevantAction') == action for step in explanation['reasoning']), f"Action {action} should be mentioned in reasoning"
    
    # Check preconditions and effects (if applicable)
    if explanation['questionType'] != 'planValidity':
        precondition_steps = [step for step in explanation['reasoning'] if 'preconditionsSatisfied' in step]
        assert len(precondition_steps) > 0, "At least one step should mention preconditions"
        assert all(step['preconditionsSatisfied'] for step in precondition_steps), "All mentioned preconditions should be satisfied"
        
        effect_steps = [step for step in explanation['reasoning'] if 'effectsConsistent' in step]
        assert len(effect_steps) > 0, "At least one step should mention effects"
        assert all(step['effectsConsistent'] for step in effect_steps), "All mentioned effects should be consistent"
    
    # Check for mention of initial state (if applicable)
    if explanation['questionType'] != 'planValidity':
        assert any("initial state" in step['explanation'].lower() for step in explanation['reasoning']), "Explanation should mention the initial state"
    
    # Check confidence score
    assert 0 <= explanation['confidenceScore'] <= 1, "Confidence score should be between 0 and 1"

    print("Validation successful!")  # If all checks pass, the explanation is valid

In [9]:
# Consists of the objects and the initial states
initial_state = {
    "sokoban", "crate1", "crate2", "l1", "l10", "l11", "l12", "l17", "l18", "l19", "l22", 
    "l23", "l24", "l29", "l30", "l31", "l32", "l33", "l36", "l37", "l38", "l39", "l40"}

initial_state.update(
    {"crate crate1", "crate crate2", "leftOf l10 l11", "leftOf l11 l12", "leftOf l17 l18", 
    "leftOf l18 l19", "leftOf l22 l23", "leftOf l23 l24", "leftOf l29 l30", "leftOf l30 l31", 
    "leftOf l31 l32", "leftOf l32 l33", "leftOf l36 l37", "leftOf l37 l38", "leftOf l38 l39", 
    "leftOf l39 l40", "below l17 l10", "below l18 l11", "below l19 l12", "below l24 l17", 
    "below l29 l22", "below l30 l23", "below l31 l24", "below l36 l29", "below l37 l30", 
    "below l38 l31", "below l39 l32", "below l40 l33", "at sokoban l19", "at crate1 l17", 
    "at crate2 l18", "clear l1", "clear l10", "clear l11", "clear l12", "clear l22", 
    "clear l23", "clear l24", "clear l29", "clear l30", "clear l31", "clear l32", 
    "clear l33", "clear l36", "clear l37", "clear l38", "clear l39", "clear l40"}
)

Call the validation function with the answer, initial state, and question information to validate the explanation.
<br>For "actionComparison", it includes both actions being compared.
<br>For "actionTaken" and "actionNotTaken", it includes the single action in question.
<br>For "planValidity", it's an empty array since no specific actions are relevant to the question.

In [10]:
# Example for action comparison question
question_info_comparison = {
    "type": "actionComparison",
    "actions": ["pushleft sokoban l19 l18 l17 crate2", "pushright sokoban l19 l18 l17 crate2"]
}
validate_sokoban_explanation(answer, initial_state, question_info_comparison)

Validation successful!


In [49]:
# # Example for action taken question
# question_info_taken = {
#     "type": "actionTaken",
#     "action": "pushleft sokoban l19 l18 l17 crate2"
# }
# validate_sokoban_explanation(answer, initial_state, question_info_taken)

In [48]:
# # Example for plan validity question
# question_info_validity = {
#     "type": "planValidity"
# }
# validate_sokoban_explanation(answer, initial_state, question_info_validity)