# Tool Usage

Let's see how to evaluate an agent's ability to use tools.

In [1]:
from langchain_benchmarks.tool_usage import registry
from langchain_benchmarks import clone_public_dataset

For this code to work, please configure LangSmith environment variables with your credentials.

In [2]:
registry

ID,Name,Dataset ID,Description
0,Tool Usage - Alpha,e95d45da-aaa3-44b3-ba2b-7c15ff6e46f5,"Environment with fake data about users and their locations and favorite foods. The environment provides a set of tools that can be used to query the data. The object is to evaluate the ability of an agent to use the tools to answer questions about the data. The dataset contains 21 examples of varying difficulty. The difficulty is measured by the number of tools that need to be used to answer the question. Each example is composed of a question, a reference answer, and information about the sequence in which tools should be used to answer the question. Success is measured by the ability to answer the question correctly, and efficiently."


In [3]:
alpha = registry[0]

In [4]:
alpha

0,1
ID,0
Name,Tool Usage - Alpha
Dataset ID,e95d45da-aaa3-44b3-ba2b-7c15ff6e46f5
Description,Environment with fake data about users and their locations and favorite foods. The environment prov...


In [5]:
clone_public_dataset(alpha.dataset_id, dataset_name=alpha.name)

Dataset Tool Usage - Alpha already exists. Skipping.
You can access the dataset at https://smith.langchain.com/o/e081f11e-fbd2-41b4-9fa8-5d76c76ef854/datasets/9b745a89-c06a-4602-a258-f94e9e292dde.


## Define an agent

Let's build an agent that we can use for evaluation.

In [22]:
from langchain.schema.runnable import RunnablePassthrough
from dateutil.parser import parse
from langchain.agents import AgentExecutor
from langchain.agents.format_scratchpad import format_to_openai_functions
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.tools.render import format_tool_to_openai_function


TOOLS = alpha.tools_factory()

def agent_factory() -> AgentExecutor:
    """Agent Executor"""
    llm = ChatOpenAI(
        model="gpt-3.5-turbo-16k",
        temperature=0,
    )
    
    llm_with_tools = llm.bind(
        functions=[format_tool_to_openai_function(t) for t in TOOLS]
    )
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a helpful assistant. Use the given tools to answer the question. Keep in mind that an ID is distinct from a name for every entity."),
            MessagesPlaceholder(variable_name="agent_scratchpad"),
            ("user", "{input}"),
        ]
    )

    runnable_agent = (
        {
            "input": lambda x: x["question"],
            "agent_scratchpad": lambda x: format_to_openai_functions(
                x["intermediate_steps"]
            ),
        }
        | prompt
        | llm_with_tools
        | OpenAIFunctionsAgentOutputParser()
    )
    
    
    def _ensure_output_exists(inputs):
        """Make sure that the output key is always present."""
        if 'output' not in inputs:
            return {
                'output': "",
                **inputs
            }
        return inputs

    return AgentExecutor(
        agent=runnable_agent,
        tools=TOOLS,
        handle_parsing_errors=True,
        return_intermediate_steps=True,
    ) | _ensure_output_exists

Let's test that our agent works

In [23]:
agent_factory().invoke({'question': "who is bob?"})

{'question': 'who is bob?',
 'output': 'Bob is a user with the ID 21.',
 'intermediate_steps': [(AgentActionMessageLog(tool='find_users_by_name', tool_input={'name': 'bob'}, log="\nInvoking: `find_users_by_name` with `{'name': 'bob'}`\n\n\n", message_log=[AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n"name": "bob"\n}', 'name': 'find_users_by_name'}})]),
   [{'id': 21, 'name': 'Bob'},
    {'id': 41, 'name': 'Donna'},
    {'id': 1, 'name': 'Alice'},
    {'id': 35, 'name': 'Charlie'},
    {'id': 42, 'name': 'Eve'},
    {'id': 43, 'name': 'Frank The Cat'}]),
  (AgentActionMessageLog(tool='get_user_name', tool_input={'user_id': 21}, log="\nInvoking: `get_user_name` with `{'user_id': 21}`\n\n\n", message_log=[AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n"user_id": 21\n}', 'name': 'get_user_name'}})]),
   'Bob')]}

## Eval

Let's evaluate an agent now

In [24]:
from langchain_benchmarks.tool_usage import STANDARD_AGENT_EVALUATOR
from langsmith.client import Client

In [25]:
client = Client()

In [None]:
test_run = client.run_on_dataset(
    dataset_name=alpha.name,
    llm_or_chain_factory=agent_factory,
    evaluation=STANDARD_AGENT_EVALUATOR,
    verbose=True,
    tags=["openai-functions"],
)