# Typewriter: 26 Tools

This is a variation of the typewriter task in which the agent has access to 26 parameterless tools.

Each tool represents a letter of the alphabet (e.g., 'a', 'b', 'c').

The agent can use each tool to "print" the corresponding letter on a piece of virtual paper.

The objective for the agent is to "print" the user's input on the paper exactly.

---------

For this code to work, please configure LangSmith environment variables with your credentials.

```python
import os

os.environ["LANGCHAIN_API_KEY"] = "sk-..."  # Your api key.
```

In [1]:
from langchain_benchmarks import clone_public_dataset, registry

In [2]:
task = registry["Tool Usage - Typewriter (26 tools)"]
task

0,1
Name,Tool Usage - Typewriter (26 tools)
Type,ToolUsageTask
Dataset ID,128af05e-aa00-4e3b-a958-d166dd450581
Description,"Environment with 26 tools each tool represents a letter of the alphabet. The objective of this task is to evaluate the model's ability the use tools for a simple repetition task. For example, if the string is 'abc', the tools 'a', 'b', and 'c' must be invoked in that order. The dataset includes examples of varying difficulty. The difficulty is measured by the length of the string. This is a variation of the typer writer task, where 26 parameterless tools are given instead of a single tool that takes a letter as an argument."


Clone the dataset associaetd with this task

In [3]:
clone_public_dataset(task.dataset_id, dataset_name=task.name)

Dataset Tool Usage - Typewriter (26 tools) already exists. Skipping.
You can access the dataset at https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/2f462c7a-f9b9-46e7-b96b-7469e965f478.


Let's build an agent that we can use for evaluation.

## The Environment

The environment consists of 26 tools and a virtual paper.

Each tool is responsible for printing a letter on the paper that corresponds to it.

In [4]:
env = task.create_environment()

In [5]:
env.tools[:5]

[StructuredTool(name='a', description='a() -> str - Run to Type the letter "a".', args_schema=<class 'pydantic.v1.main.aSchemaSchema'>, func=<function _create_typing_func.<locals>.func at 0x7f6cd20e6520>),
 StructuredTool(name='b', description='b() -> str - Run to Type the letter "b".', args_schema=<class 'pydantic.v1.main.bSchemaSchema'>, func=<function _create_typing_func.<locals>.func at 0x7f6cd20e65c0>),
 StructuredTool(name='c', description='c() -> str - Run to Type the letter "c".', args_schema=<class 'pydantic.v1.main.cSchemaSchema'>, func=<function _create_typing_func.<locals>.func at 0x7f6cd20e6660>),
 StructuredTool(name='d', description='d() -> str - Run to Type the letter "d".', args_schema=<class 'pydantic.v1.main.dSchemaSchema'>, func=<function _create_typing_func.<locals>.func at 0x7f6cd20e6700>),
 StructuredTool(name='e', description='e() -> str - Run to Type the letter "e".', args_schema=<class 'pydantic.v1.main.eSchemaSchema'>, func=<function _create_typing_func.<loca

In [6]:
env.tools[0].invoke({})

'OK'

In [7]:
env.tools[3].invoke({})

'OK'

In [8]:
env.read_state()

'ad'

## Agent Factory

For evaluation, we need an agent factory that will create a new instance of an agent executor for every evaluation run.

We'll use an `OpenAIAgentFactory` provided with LangChain Benchmarks -- look at the `intro` section to see how to define your own.

In [9]:
from langchain_benchmarks.tool_usage import agents

agent_factory = agents.OpenAIAgentFactory(task, model="gpt-3.5-turbo-16k")

# Let's test that our agent works
agent = agent_factory()
agent.invoke({"question": "hello"})

{'input': 'hello',
 'output': 'hello\nhello',
 'intermediate_steps': [(AgentActionMessageLog(tool='h', tool_input={}, log='\nInvoking: `h` with `{}`\n\n\n', message_log=[AIMessage(content='', additional_kwargs={'function_call': {'name': 'h', 'arguments': ''}})]),
   'OK'),
  (AgentActionMessageLog(tool='e', tool_input={}, log='\nInvoking: `e` with `{}`\n\n\n', message_log=[AIMessage(content='', additional_kwargs={'function_call': {'name': 'e', 'arguments': ''}})]),
   'OK'),
  (AgentActionMessageLog(tool='l', tool_input={}, log='\nInvoking: `l` with `{}`\n\n\n', message_log=[AIMessage(content='', additional_kwargs={'function_call': {'name': 'l', 'arguments': ''}})]),
   'OK'),
  (AgentActionMessageLog(tool='l', tool_input={}, log='\nInvoking: `l` with `{}`\n\n\n', message_log=[AIMessage(content='', additional_kwargs={'function_call': {'name': 'l', 'arguments': ''}})]),
   'OK'),
  (AgentActionMessageLog(tool='o', tool_input={}, log='\nInvoking: `o` with `{}`\n\n\n', message_log=[AIMess

## Eval

Let's evaluate an agent now.

Eval code below has not been run yet.

In [None]:
import uuid

from langsmith.client import Client

from langchain_benchmarks.tool_usage import get_eval_config

experiment_uuid = uuid.uuid4().hex[:4]

client = Client()

models = ["gpt-3.5-turbo-16k"]

for model in models:
    print()
    # The eval config will evaluate the state, but not the output which is meaningless for this task.
    eval_config = get_eval_config(output_evaluation="none")
    agent_factory = agents.OpenAIAgentFactory(task, model=model)
    test_run = client.run_on_dataset(
        dataset_name=task.name,
        llm_or_chain_factory=agent_factory,
        evaluation=eval_config,
        verbose=False,
        concurrency_level=1,
        project_name=f"typewriter-26-{model}-{experiment_uuid}",
        tags=[model],
        project_metadata={
            "model": model,
            "arch": "openai-functions-agent",
            "id": experiment_uuid,
        },
    )