## Trajectory evaluation
Trajectory evaluation compares a model's reasoning steps (prediction) with a reference sequence, assessing the consistency, order, and completeness of each step. It detects not only an incorrect final answer but also where a deviation occurred in the reasoning chain (e.g., a missed or distorted step), making it particularly useful for agents and multi-step tasks.

### Install libraries

In [1]:
!pip install langchain-core langchain-openai python-dotenv



### Trajectory evaluation

In [2]:
from langchain_classic.evaluation import load_evaluator
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
import json

load_dotenv()

# LLM used by evaluator
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

evaluator = load_evaluator("trajectory", llm=llm)

reference_steps = [
    "Fibonacci starts at 0, 1",
    "The next number is the sum of the previous two",
    "The 5th number is 5"
]

prediction_steps = [
    "Fibonacci starts at 0, 1",
    "Subsequent numbers are the sum of the previous ones",
    "The 5th number is 4"
]

result = evaluator.invoke({
    "question": "What is the 5th number of the Fibonacci sequence?",
    "answer": "4",
    "agent_trajectory": prediction_steps,
    "reference": {
        "answer": "5",
        "agent_trajectory": reference_steps
    }
})

print(json.dumps(result, indent=4))

{
    "question": "What is the 5th number of the Fibonacci sequence?",
    "answer": "4",
    "agent_trajectory": [
        "Fibonacci starts at 0, 1",
        "Subsequent numbers are the sum of the previous ones",
        "The 5th number is 4"
    ],
    "reference": "\n\nThe following is the expected answer. Use this to measure correctness:\n[GROUND_TRUTH]\n{'answer': '5', 'agent_trajectory': ['Fibonacci starts at 0, 1', 'The next number is the sum of the previous two', 'The 5th number is 5']}\n[END_GROUND_TRUTH]\n",
    "score": 0.0,
    "reasoning": "Let's evaluate the AI language model's answer step by step based on the provided criteria:\n\ni. **Is the final answer helpful?**\n   - The final answer given by the AI model is \"4,\" which is incorrect. The 5th number in the Fibonacci sequence is actually \"5.\" Therefore, the answer is not helpful.\n\nii. **Does the AI language model use a logical sequence of tools to answer the question?**\n   - The AI model provides a logical sequ

In [3]:
correct_prediction_steps = [
    "Fibonacci starts at 0, 1",
    "The next number is the sum of the previous two",
    "The 5th number is 5"
]

result = evaluator.invoke({
    "question": "What is the 5th number of the Fibonacci sequence?",
    "answer": "5",
    "agent_trajectory": correct_prediction_steps,
    "reference": {
        "answer": "5",
        "agent_trajectory": reference_steps
    }
})

print(json.dumps(result, indent=4))

{
    "question": "What is the 5th number of the Fibonacci sequence?",
    "answer": "5",
    "agent_trajectory": [
        "Fibonacci starts at 0, 1",
        "The next number is the sum of the previous two",
        "The 5th number is 5"
    ],
    "reference": "\n\nThe following is the expected answer. Use this to measure correctness:\n[GROUND_TRUTH]\n{'answer': '5', 'agent_trajectory': ['Fibonacci starts at 0, 1', 'The next number is the sum of the previous two', 'The 5th number is 5']}\n[END_GROUND_TRUTH]\n",
    "score": 1.0,
    "reasoning": "Let's evaluate the AI language model's answer step by step based on the provided criteria:\n\ni. **Is the final answer helpful?**\n   - Yes, the final answer \"5\" is correct and directly answers the question about the 5th number in the Fibonacci sequence.\n\nii. **Does the AI language model use a logical sequence of tools to answer the question?**\n   - The model's reasoning is logical. It starts with the definition of the Fibonacci sequen