# Chat Bot Evaluation as Multi-agent Simulation

When building a chat bot, such as a customer support assistant, it can be hard to properly evaluate your bot's performance. It's time-consuming to have to manually interact with it intensively for each code change.

One way to make the evaluation process easier and more reproducible is to simulate a user interaction.

With LangGraph, it's easy to set this up. Below is an example of how to create a "virtual user" to simulate a conversation.

The overall simulation looks something like this:

![diagram](./img/virtual_user_diagram.png)

First, we'll set up our environment.

In [4]:
// process.env.OPENAI_API_KEY = "sk_...";
// Optional tracing in LangSmith
// process.env.LANGCHAIN_API_KEY = "sk_...";
// process.env.LANGCHAIN_TRACING_V2 = "true";
// process.env.LANGCHAIN_PROJECT = "Agent Simulation Evaluation: LangGraphJS";

'Agent Simulation Evaluation: LangGraphJS'

In [5]:
import {
  AIMessage,
  AIMessageChunk,
  BaseMessage,
  HumanMessage,
} from '@langchain/core/messages'
import {
  ChatPromptTemplate,
  MessagesPlaceholder,
} from '@langchain/core/prompts'
import { Runnable } from '@langchain/core/runnables'
import { Annotation, END, START, StateGraph } from '@langchain/langgraph'
import { ChatOpenAI } from '@langchain/openai'

const llm = new ChatOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

## 1. Define Chat Bot

Next, we'll define our chat bot. This implementation uses the OpenAI API to generate responses.

In [6]:
interface Message {
  role: string;
  content: string;
}

async function myChatBot(messages: Message[]): Promise<AIMessageChunk> {
  const systemMessage: Message = {
    role: 'system',
    content: 'You are a customer support agent for an airline.',
  };
  const allMessages = [systemMessage, ...messages];
  
  const response = await llm.invoke(
    allMessages.map((m) => [m.role, m.content]),
  )
  return response
}

// Test the chat bot
const response = await myChatBot([{role: 'user', content: 'hi!'}]);
console.log(response);

## 2. Define Simulated User

Now we'll define the simulated user using LangChain.

In [None]:
async function createSimulatedUser(): Promise<Runnable> {
    const systemPromptTemplate = `You are a customer of an airline company. You are interacting with a user who is a customer support person 
    
    {instructions}
    
    When you are finished with the conversation, respond with a single word "FINISHED"`;
    
    const prompt = ChatPromptTemplate.fromMessages([
      ['system', systemPromptTemplate],
      new MessagesPlaceholder('messages'),
    ]);
    
    const instructions = "Your name is Harrison. You are trying to get a refund for the trip you took to Alaska. \
    You want them to give you ALL the money back. This trip happened 5 years ago.";
    
    const partialPrompt = await prompt.partial({ name: 'Harrison', instructions });
    
    const simulatedUser = await partialPrompt.pipe(model);
    return simulatedUser
}

// Test the simulated user
const messages = [new HumanMessage('Hi! How can I help you?')];
const simulatedUser= await createSimulatedUser()
const simulatedUserResponse = await simulatedUser.invoke({ messages });
console.log(simulatedUserResponse);

## 3. Define the Agent Simulation

The code below creates a LangGraph workflow to run the simulation. The main components are:

1. The two nodes: one for the simulated user, the other for the chat bot.
2. The graph itself, with a conditional stopping criterion.

Read the comments in the code below for more information.


**Nodes**

First, we define the nodes in the graph. These should take in a list of messages and return a list of messages to ADD to the state.
These will be thing wrappers around the chat bot and simulated user we have above.

**Note:** one tricky thing here is which messages are which. Because both the chat bot AND our simulated user are both LLMs, both of them will resond with AI messages. Our state will be a list of alternating Human and AI messages. This means that for one of the nodes, there will need to be some logic that flips the AI and human roles. In this example, we will assume that HumanMessages are messages from the simulated user. This means that we need some logic in the simulated user node to swap AI and Human messages.

First, let's define the chat bot node

In [None]:
function chatBotNode(state: { messages: any[] }): { messages: AIMessage[] } {
  const messages = state.messages
  const chatBotResponse = await myChatBot(
    messages.map((m) => ({ role: m._getType(), content: m.content })),
  )
  return { messages: [new AIMessage({ content: chatBotResponse.content })] }
}

Next, let's define the node for our simulated user. This will involve a little logic to swap the roles of the messages.

In [None]:
function swapRoles(messages: any[]): any[] {
  return messages.map((m) =>
    m instanceof AIMessage
      ? new HumanMessage({ content: m.content })
      : new AIMessage({ content: m.content }),
  )
}

async function simulatedUserNode(state: { messages: any[] }): Promise<{
  messages: HumanMessage[]
}> {
  const messages = state.messages
  const newMessages = swapRoles(messages)
  const simulateUser = await simulatedUser()
  const response = await simulateUser.invoke({ messages: newMessages })

  return { messages: [new HumanMessage({ content: response.content })] }
}

**Edges**

We now need to define the logic for the edges. The main logic occurs after the simulated user goes, and it should lead to one of two outcomes:

- Either we continue and call the customer support bot
- Or we finish and the conversation is over

So what is the logic for the conversation being over? We will define that as either the Human chatbot responds with `FINISHED` (see the system prompt) OR the conversation is more than 6 messages long (this is an arbitrary number just to keep this example short).

In [None]:
function shouldContinue(state: { messages: any[] }): string {
  const messages = state.messages;
  if (messages.length > 6) {
    return 'end';
  } else if (messages[messages.length - 1].content === 'FINISHED') {
    return 'end';
  } else {
    return 'continue';
  }
}

**Graph**

We can now define the graph that sets up the simulation!

In [None]:
interface State {
  messages: any[];
}

function createSimulation(){
  const State = Annotation.Root({
    messages: Annotation<BaseMessage[]>({
      reducer: (x, y) => x.concat(y),
      default: () => [],
    }),
  })

  const workflow = new StateGraph(State)
    .addNode('user', simulatedUserNode)
    .addNode('chatbot', chatBotNode)
    .addEdge('chatbot', 'user')
    .addConditionalEdges('user', shouldContinue, {
      end: END,
      continue: 'chatbot',
    })
    .addEdge(START, 'chatbot')

  const simulation = workflow.compile()
  return simulation
}

## 4. Run Simulation

Now we can evaluate our chat bot! We can invoke it with empty messages (this will simulate letting the chat bot start the initial conversation)

In [None]:
async function runSimulation() {
  const simulation = createSimulation()
  for await (const chunk of await simulation.stream({})) {
    console.log(chunk)
    console.log('\n---\n')
  }
}


await runSimulation();