## Artefact Case - Junior AI Engineer
Author: Mariama Oliveira

**Case Objective**

Build an AI assistant that decides whether to answer a user question directly or call an external tool. If the question requires an exact arithmetic calculation, the assistant must use a calculator tool. Otherwise, it must respond using the language model.

### Decided Approach

I implemented the solution using LangChain because it simplifies building agents and integrating tool-calling with different LLM providers.

For the language model, I opted for OpenAI GPT-4 (via the OpenAI API), since it provides reliable reasoning and supports the tool-calling pattern used in this project.

**Agent Architecture**

I used a single agent responsible for:

1. receiving the user question,

2. deciding whether it requires an exact arithmetic calculation, and

3. either calling the calculator tool (for math) or answering directly (for non-math).

* *A multi-agent design was also considered (e.g., one agent for classification and another two specialized agents for answering), but that would add unnecessary complexity for a small, straightforward task.*

### Importing tools and prompts

I decided to store the tools and the agent prompt in separate files rather than embedding them in the notebook. While it would work either way, separating these components improves organization and maintainability, and makes it easier to reuse or modify prompts/tools as the project grows.

* `prompts`: The agent prompt that explicitly encodes the routing rule from the case: if the user request requires an exact arithmetic calculation, the agent must call the calculator tool; otherwise, it answers directly. *The prompt was generated by ChatGPT provided the expected behavior/goal of the agent for the task.*

* `calculator` function: To keep the implementation simple, I used Python’s `eval()` to compute arithmetic expressions. However, for production it would be recommended a more robust implementation.

* `get_exchanged_amount` function: This is another function that can be used as a tool by the agent. Unlike the local calculator, this tool fetches the current exchange rate between USD and BRL from an external API. It was included as a bonus to demonstrate how the assistant can integrate additional external tools.

In [1]:
from prompts.prompts import PROMPT_AGENT, PROMPT_AGENT_BONUS
from tools.calculator import calculator
from tools.exchange_rate import get_exchanged_amount

### Create agent and setup agent

In [None]:
# Making sure the .env file is loaded
from dotenv import load_dotenv
load_dotenv()
# Importing auxiliary and langchain modules
from dataclasses import dataclass
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
from langchain.agents.structured_output import ToolStrategy

##### Initialize model

Here I set up the configuration for the selected model. I set **temperature = 0** to reduce randomness and keep responses as deterministic as possible, since this task does not require creativity. I set **max_tokens = 500** to cap the maximum length of the generated answer, which should be sufficient for the expected question sizes in this notebook.

In [3]:
# Initialize the chat model
model = init_chat_model(
    "gpt-4",
    temperature=0,
    max_tokens=500
)

##### Define structured response

To make the assistant’s output easier to understand and easier to process programmatically, I defined a structured response format. This enables reliable parsing and future extensions (e.g., saving results, displaying them in a UI, or adding more tools).

In [4]:
@dataclass
class ResponseFormat:
    """Response schema for the agent."""
    # Type of question: 'calculation' or 'general'
    question_type: str
    # Question answer
    answer: str
    # Optional mathematical expression used to compute the answer (required for calculation questions)
    expression: str | None = None

#### Create agent

Based on the previous defined variables, it is possible to create the agent/assistant responsible for executing the task.

In [5]:
# Create the agent with the model, prompt, and tools
agent = create_agent(
    model=model,
    system_prompt=PROMPT_AGENT,
    tools=[calculator],
    response_format=ToolStrategy(ResponseFormat),
)

#### Asking and answering questions

In [6]:
# I decided to create a function to avoid code repetition on to answer the questions.

def answer_question(question: str, agent=agent) -> dict:
    """Function to answer a question using the agent."""
    response = agent.invoke(
        {"messages": [{"role": "user", "content": question}]}
    )
    return response

Example 1

In [7]:
question_1 = "Quem foi Albert Einstein?"
answer_1 = answer_question(question_1)
print(answer_1['structured_response'])


ResponseFormat(question_type='NON-MATH', answer="Albert Einstein foi um físico teórico alemão. Ele é mais conhecido por desenvolver a teoria da relatividade, que revolucionou o conceito de física teórica. Seu trabalho mais famoso é a fórmula de equivalência massa-energia, E=mc^2, que tem sido chamada de 'a equação mais famosa do mundo'.", expression=None)


Example 2

In [8]:
question_2 = "Quanto é 12 vezes 46 mais 10?"
answer_2 = answer_question(question_2)
print(answer_2['structured_response'])

ResponseFormat(question_type='MATH', answer='12 vezes 46 mais 10 é igual a 562.', expression='(12*46)+10')


⚠️ If we observe bellow the list of messages exchanged that is part of the response, it is possible to observe that the tool `calculator` was called with the expression `'(12*46)+10'` as parameter.

In [9]:
answer_2

{'messages': [HumanMessage(content='Quanto é 12 vezes 46 mais 10?', additional_kwargs={}, response_metadata={}, id='8646ccf5-0a2b-4833-a064-ca72b5afcf73'),
  AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 660, 'total_tokens': 677, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4-0613', 'system_fingerprint': None, 'id': 'chatcmpl-D0Ij3DzkCHVFFNqK66mFXGWigapsc', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='lc_run--019bde7d-eb85-7650-a8a2-611e7b6b3c9b-0', tool_calls=[{'name': 'calculator', 'args': {'expression': '(12*46)+10'}, 'id': 'call_tgr2LhAjao6HIPX5gS5NxvTI', 'type': 'tool_call'}], invalid_tool_calls=[], usage_metadata={'input_tokens': 660, 'output_tokens': 17, 't

Example 3

In [10]:
question_3 = "What is the cube root of twenty seven?"
answer_3 = answer_question(question_3)
print(answer_3['structured_response'])

ResponseFormat(question_type='MATH', answer='The cube root of twenty seven is 3.', expression='pow(27, 1/3)')


Example 4

In [11]:
question_4 = "What is the capital of France?"
answer_4 = answer_question(question_4)
print(answer_4['structured_response'])

ResponseFormat(question_type='NON-MATH', answer='The capital of France is Paris.', expression=None)


**BONUS EXAMPLE**

It shows the value calculated value as USD and also the converted values in BRL currency.

In this bonus example, I added a call to a currency exchange API to convert the numeric result from USD to Brazilian Real (BRL). This example is included to demonstrate how the assistant can integrate and use external tools beyond the local calculator.

In [12]:
# Create the agent with the model, prompt, and tools
agent_bonus = create_agent(
    model=model,
    system_prompt=PROMPT_AGENT_BONUS,
    tools=[calculator, get_exchanged_amount],
    response_format=ToolStrategy(ResponseFormat),
)

In [14]:
question_bonus = "How much is ten divided by 2?"
answer_bonus = answer_question(question_bonus, agent=agent_bonus)
print(answer_bonus['structured_response'])

ResponseFormat(question_type='MATH', answer='5.0 USD ≈ R$ 26.95', expression='10/2')


### Final Thoughts and Limitations

1. LangChain, as the chosen framework, allows different types of configurations that can be selected according to the project’s needs. Because this is the MVP of an assistant, the configuration chosen was minimal.

2. Even though the assistant is working, there are some limitations. Mainly, for example, the ambiguity of how a mathematical expression can be written in natural language. For example: *“How much is the cube root of four multiplied by two?”* could be parsed as `(4 ** (1/3)) * 2` or as `(4 * 2) ** (1/3)`, depending on how the grouping is interpreted.

3. The main focus of the implementation was tool calling (function invocation); therefore, the calculator implementation is limited and needs improvements if this solution were to go to production.

4. Before putting this kind of solution into production, it would also be important to monitor token usage and latency under different configurations in order to choose the best approach.

