Authored by: Aryan Mistry

# Agents 101: Reasoning and Acting

An **agent** coordinates reasoning and actions to answer complex queries. Instead of simply prompting a language model, we can build a loop where the model plans what to do, calls tools, looks at results, reflects, and continues until it's ready to answer. This notebook builds a toy ReAct (Reasoning and Acting) agent with a calculator and weather tool to illustrate the core ideas. [18]

## 1. Recap: Tools

We'll reuse the `calculator` and `weather_service` functions defined earlier.
If you're starting fresh, run the cell below to define them along with the
`safe_eval` helper.


In [None]:

import re
import ast

def safe_eval(expr: str) -> float:
    """Safely evaluate an arithmetic expression using Python's AST.

    Only arithmetic operators and parentheses are allowed. Returns a float.
    """
    parsed = ast.parse(expr, mode='eval')
    def eval_node(node):
        if isinstance(node, ast.Expression):
            return eval_node(node.body)
        elif isinstance(node, ast.BinOp):
            left = eval_node(node.left)
            right = eval_node(node.right)
            if isinstance(node.op, ast.Add):
                return left + right
            elif isinstance(node.op, ast.Sub):
                return left - right
            elif isinstance(node.op, ast.Mult):
                return left * right
            elif isinstance(node.op, ast.Div):
                return left / right
            else:
                raise ValueError("Unsupported operator")
        elif isinstance(node, ast.UnaryOp):
            return eval_node(node.operand) if isinstance(node.op, ast.UAdd) else -eval_node(node.operand)
        elif isinstance(node, ast.Constant):
            return node.value
        else:
            raise ValueError("Unsupported expression component")
    return eval_node(parsed)


def calculator(expression: str) -> dict:
    if not re.fullmatch(r'[0-9+\-*/(). ]+', expression):
        return {'error': 'Invalid characters in expression'}
    try:
        result = safe_eval(expression)
        return {'result': result}
    except Exception as e:
        return {'error': str(e)}


def weather_service(city: str) -> dict:
    database = {'London': 18.5, 'New York': 22.0, 'Paris': 20.0, 'Tokyo': 25.0}
    city_norm = city.title()
    if city_norm not in database:
        return {'error': f"I don't have weather data for {city_norm}."}
    return {'city': city_norm, 'temperature': database[city_norm]}


## 2. Planning: Which Tools Do I Need?

The planning step analyses the question and decides which tool(s) to call. In
production this is typically done by the language model itself, using
instructions and few-shot examples. Here we'll implement a simple heuristic:

- If the query contains arithmetic operators or keywords like "plus", call the
  calculator.
- If the query mentions weather or temperature, call the weather service.
- Otherwise, the agent will say it doesn't know how to answer.


In [None]:

from typing import List

def plan_tools(question: str) -> List[str]:
    """Given a question, decide which tools are needed.

    This naive planner looks for arithmetic operators to choose the calculator and keywords like 'weather' for the weather service.
    """
    q = question.lower()
    tools_needed = []
    if any(op in q for op in ['+', '-', '*', '/', 'plus', 'minus', 'times', 'divided']):
        tools_needed.append('calculator')
    if 'weather' in q or 'temperature' in q:
        tools_needed.append('weather')
    return tools_needed

# Test the planner
tests = ['What is 2 + 2?', 'What is the weather in Paris?', 'Add 3 and the temperature in London']
for t in tests:
    print(f"Question: {t}Plan: {plan_tools(t)}")


Question: What is 2 + 2?Plan: ['calculator']
Question: What is the weather in Paris?Plan: ['weather']
Question: Add 3 and the temperature in LondonPlan: ['weather']


## 3. Acting: Calling Tools

Once the agent has decided which tools are needed, it must call them with
properly formatted inputs. We'll write a helper function that takes a tool
name and the question, extracts arguments, and returns the tool's result.

In the ReAct pattern, the agent might call multiple tools in sequence,
interleaving calls with reasoning steps. For our simple agent we'll call each
needed tool once and combine the results into the final answer.


In [None]:

def call_tool(tool_name: str, question: str) -> dict:
    """Call a tool with the appropriate inputs extracted from the question.

    Returns a dict representing the tool's output.
    """
    q = question
    if tool_name == 'calculator':
        expr = ''.join(ch for ch in q if ch.isdigit() or ch in '+-*/(). ')
        return calculator(expr)
    if tool_name == 'weather':
        city = q.split()[-1]
        return weather_service(city)
    return {'error': f"Unknown tool '{tool_name}'"}

# Test calling tools directly
print(call_tool('calculator', 'Compute 3 + 4 * 2'))
print(call_tool('weather', 'Tell me the weather Tokyo'))


{'error': 'unexpected indent (<unknown>, line 1)'}
{'city': 'Tokyo', 'temperature': 25.0}


## 4. Building the ReAct Loop

Putting it all together, the agent will:

1. **Think** about the question and decide which tools are needed (`plan_tools`).
2. **Act** by calling the tools with appropriate inputs (`call_tool`).
3. **Observe** the results returned by each tool.
4. **Conclude** by composing the final answer.

The simple agent below prints each step of its reasoning. In a real system,
the language model's thoughts would be implicit and the agent would decide
whether to call more tools or stop. [18]


In [None]:

def run_agent(question: str) -> str:
    """Execute the full ReAct loop for a question.

    The agent first plans which tools to use, calls them, observes the results, and returns a formatted answer.
    """
    print(f"Question: {question}")
    tools_needed = plan_tools(question)
    if not tools_needed:
        return "I'm not sure how to answer that."
    observations = []
    for tool in tools_needed:
        print(f"Planning to use tool: {tool}")
        result = call_tool(tool, question)
        print(f"Observation from {tool}: {result}")
        observations.append(result)
    answers = []
    for obs in observations:
        if 'result' in obs:
            answers.append(str(obs['result']))
        elif 'temperature' in obs:
            answers.append(f"{obs['temperature']}°C in {obs['city']}")
        else:
            answers.append(obs.get('error', ''))
    return ' and '.join(answers)

# Test the agent
questions = [
    'What is 3 + 5?',
    'What is the weather Paris?',
    'Add 2 and the temperature in London',
    'How old is the universe?'
]
for q in questions:
    print("--")
    print(run_agent(q))


--
Question: What is 3 + 5?
Planning to use tool: calculator
Observation from calculator: {'error': 'unexpected indent (<unknown>, line 1)'}
unexpected indent (<unknown>, line 1)
--
Question: What is the weather Paris?
Planning to use tool: weather
Observation from weather: {'error': "I don't have weather data for Paris?."}
I don't have weather data for Paris?.
--
Question: Add 2 and the temperature in London
Planning to use tool: weather
Observation from weather: {'city': 'London', 'temperature': 18.5}
18.5°C in London
--
Question: How old is the universe?
I'm not sure how to answer that.


## 5. Reflection and Limitations

Our toy agent is entirely rule-based and doesn't truly reason. Real ReAct
agents rely on the language model to produce thoughts and decide when to
invoke tools. They may also reflect on intermediate results: for example,
checking whether a calculation seems plausible before continuing.

Consider how you might add a reflection step that verifies whether the
calculator's output makes sense (e.g. checking for division by zero) or
whether the weather result is within a realistic range. [18]


## 6. Exercises

1. **Add a new tool.** Implement a `dictionary_service(word)` that returns a
   definition from a small built-in dictionary. Update `plan_tools` and
   `call_tool` to support questions like "Define photosynthesis".
2. **Improve planning.** Use regular expressions to parse mathematical
   expressions more robustly and to detect city names that consist of more
   than one word (e.g. "New York").
3. **Stateful reasoning.** See if you can modify `run_agent` so that it can perform multiple
   sequential tool calls. For example, to answer "Multiply the result of 2+3
   by the temperature in Paris", it should first compute 2+3, then call
   the weather tool, then multiply the two results. We will cover stateful reasoning in more depth in the next notebook.
4. **Memory.** Try and store observations in a history list and print the full
   sequence of steps after answering. How might you expose that history to
   a user?


Foundational LLMs & Transformers
1. Vaswani, A., et al. (2017). Attention is All You Need. Advances in Neural Information Processing Systems (NIPS 2017).
2. Brown, T. B., et al. (2020). Language Models are Few-Shot Learners. NeurIPS 2020.
3. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT 2019.
4. OpenAI (2023). GPT-4 Technical Report. arXiv:2303.08774.
5. Touvron, H., et al. (2023). LLaMA 2: Open Foundation and Fine-Tuned Chat Models. Meta AI.

Generative AI & Sampling

6. Goodfellow, I., et al. (2014). Generative Adversarial Nets. NeurIPS 2014.
7. Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
8. Neal, R. M. (1993). Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical Report CRG-TR-93-1, University of Toronto.

Retrieval-Augmented Generation (RAG) & Knowledge Grounding

9. Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP. NeurIPS 2020.
10. deepset ai (2023). Haystack: Open-Source Framework for Search and RAG Applications. https://haystack.deepset.ai
11. LangChain (2023). LangChain Documentation and Cookbook. https://python.langchain.com

Evaluation & Safety

12. Papineni, K., et al. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. ACL 2002.
13. Lin, C.-Y. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. ACL Workshop 2004.
14. OpenAI (2024). Evaluating Model Outputs: Faithfulness and Grounding. OpenAI Docs.
15. Guardrails AI (2024). Open-Source Guardrails Framework. https://github.com/shreyar/guardrails

Prompt Engineering & Instruction Tuning

16. White, J. (2023). The Prompting Guide. https://www.promptingguide.ai
17. Ouyang, L., et al. (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS 2022.

Agents & Tool Use

18. Yao, S., et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629.
19. LangChain (2024). LangChain Agents and Tools Documentation.
20. Microsoft (2023). Semantic Kernel Developer Guide. https://learn.microsoft.com/en-us/semantic-kernel/
21. Google DeepMind (2024). Gemini Technical Report. arXiv:2312.11805.

State, Memory & Orchestration

22. LangGraph (2024). Stateful Agent Orchestration Framework. https://langchain-langgraph.vercel.app
23. Park, J. S., et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442.

Pedagogical and Course Design References

24. fast.ai (2023). fast.ai Deep Learning Course Notebooks. https://course.fast.ai
25. Ng, A. (2023). DeepLearning.AI Short Courses on Generative AI.
26. MIT 6.S191, Stanford CS324, UC Berkeley CS294-158. (2022–2024). Course Materials and Public Notebooks for ML and LLMs.