In this notebook, we test 2 capabilities of LLMs:

1. **Recognition**: Can an LLM recognize that a state-action trajectory is different than what is expected?
2. **Rectification**: Can an LLM update simulation code to match the expected state-action trajectory?

In [2]:
# Import standard packages
import os
import sys
import traceback
import multiprocessing

os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'

## Recognition

We will test whether an LLM can recognize that a state-action trajectory is different than what is expected and determine the source of the difference. We will use the Mountain Car environment from Gymnasium. The "external" environment will have a different gravity level than the "internal" environment. The LLM will see both state-action trajectories and determine what the source of the difference is. This is clearly a difficult problem computationally since the source of the difference is not directly observable and there can be many different sources of the difference.

We will try to start with the easiest possible test: Match the action sequence of the external environment to the internal environment.

### Setup LLM

In [None]:
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="qwen2.5:3B",
    temperature=0.5,
    n_gpu_layers=-1,
    n_threads=multiprocessing.cpu_count() - 1,
    verbose=False
)

# resp = llm.invoke("Write a Python program that generates the first 10 numbers in the Fibonacci sequence. Include only the code. Any comments or explanations should be removed.")
# print(resp.content)

LLMs tend to code better when told to output entire code snippets as opposed to individual lines. This is because the model can better understand the context of the code. Perhaps we can use chain-of-thought with entire code snippets as one part of the chain and then utilize standard git tools to determine changes.