<img align="right" width="400" src="https://drive.google.com/thumbnail?id=1rPeHEqFWHJcauZlU82a4hXM10TUjmHxM&sz=s4000" alt="FHNW Logo">


# Build an Agent

by Fabian Märki

## Summary
The aim of this notebook is to showcase how to build an agent - i.e. how to combine a LLM with tools to create a system that can reason about tasks, decide which tools to use and iteratively work towards solutions.

In its most fundamental form, an agent consists of three core components:
- **Model:** The LLM powering the agent’s reasoning and decision-making
- **Tools:** External functions or APIs the agent can use to take action
- **Instructions:** Explicit guidelines and guardrails defining how the agent behaves

Tools extend your agent's capabilities by using APIs from underlying applications or systems. Broadly speaking, agents need three types of tools:
- **Data:** Enable agents to retrieve context and information necessary for executing the workflow (e.g. query databases, read PDF documents, or search the web).
- **Action:** Enable agents to interact with systems to take actions such as adding new information to databases, updating records, or sending messages (e.g. send emails and texts, hand-off a customer service ticket to a human).
- **Orchestration:** Agents themselves can serve as tools for other agents (research agent (web), writing agent (summarize), refund agent, translation agent)

High-quality instructions are essential for any LLM-powered app, but especially critical for agents. Clear instructions reduce ambiguity and improve agent decision-making, resulting in smoother workflow execution and fewer errors.
- **Prompt agents to break down tasks:** Providing smaller, clearer steps from dense resources helps minimize ambiguity and helps the model better follow instructions.
- **Define clear actions:** Make sure every step in your routine corresponds to a specific action or output. For example, a step might instruct the agent to ask the user for their order number or to call an API to retrieve account details. Being explicit about the action (and even the wording of a user-facing message) leaves less room for errors in interpretation.
- **Capture (anticipate) edge cases:** Real-world interactions often create decision points such as how to proceed when a user provides incomplete information or asks an unexpected question. A robust routine anticipates common variations and includes instructions on how to handle them with conditional steps or branches such as an alternative step if a required piece of info is missing.


## Links
- [LangChain Agents](https://docs.langchain.com/oss/python/langchain/agents)
- [LangChain Tools](https://docs.langchain.com/oss/python/langchain/tools): Components that agents call to perform actions
- [A Practical Guide to Building Agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf) by OpenAI

## Next steps:

- Explore predefined [Tools](https://docs.langchain.com/oss/javascript/integrations/tools)
- Learn about the [Model Context Protocol](https://de.wikipedia.org/wiki/Model_Context_Protocol) (MCP)
- Investigate how to build a [custom MCP server](https://docs.langchain.com/oss/python/langchain/mcp#custom-mcp-servers) where you can provide your own tools

This notebook contains assigments: <font color='red'>Questions are written in red.</font>

<a href="https://colab.research.google.com/github/markif/NLP_LAB_CAS/blob/master/Build_an_Agent.ipynb">
  <img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [1]:
#OLLAMA_HOST='https://XYZ.trycloudflare.com'
OLLAMA_HOST='http://localhost:11434'
OPENAI_BASE_URL=OLLAMA_HOST+"/v1/"

In [2]:
%%capture

!pip install 'fhnw-nlp-utils>=0.11.0,<0.12.0'

from fhnw.nlp.utils.storage import download
from fhnw.nlp.utils.transformers import get_compute_device

import pandas as pd
import numpy as np

In [3]:
%%capture

# ensure we use latest langchain (needed for google colab)
!pip uninstall -y langchain
!pip uninstall -y langchain-classic
!pip uninstall -y langchain-community
!pip uninstall -y langchain_core
!pip uninstall -y langchain_openai

In [4]:
%%capture

!pip install pdfplumber

!pip install langchain
!pip install langchain-classic
!pip install langchain-community
!pip install langchain_core
!pip install langchain_openai

Enable verbose/debug to see detailed output (see [here](https://python.langchain.com/docs/how_to/debugging/#set_debug-and-set_verbose)).

In [5]:
from langchain_classic.globals import set_verbose
from langchain_classic.globals import set_debug

set_verbose(False)
set_debug(False)

In [6]:
pdf_folder = "data/pdfs"

invoice_01 = pdf_folder + "/file_01.pdf"
invoice_02 = pdf_folder + "/file_02.pdf"

download("https://drive.switch.ch/index.php/s/TgZD1UtpTtSyP4h/download", invoice_01)
download("https://drive.switch.ch/index.php/s/9Hx5HclCxbS7Lzz/download", invoice_02)

<font color='red'>**TASK: Provide code that can load PDF content (e.g. use [PDFPlumber](https://docs.langchain.com/oss/python/integrations/document_loaders/pdfplumber) or see [here](https://docs.langchain.com/oss/python/integrations/document_loaders#pdfs) for alternatives provided by LangChain that works out of the box).**</font>

In [7]:
# from langchain_community.document_loaders import ...

from langchain_community.document_loaders import PDFPlumberLoader

loader = PDFPlumberLoader(invoice_01)
documents = loader.load()

print(documents[0].page_content)

John Smith
4490 Oak Drive
Albarry, NY 12210
Albarry 15.11.2025
Jessie Home
4312 Wood Road
New York, NY 10031
Invoice
Date Description QTY Unit Price Amount
14.11.2025 Front and rear brake cables 1 $15.00 $15.00
14.11.2025 New set of pedal arms 2 $25.00 $50.00
14.11.2025 Labor 3 $60.00 $180.00
Tax 8.10% $19.85
Total Payment due within 30 Days (15.12.2025) $264.85
Thank you
John Smith
Seite 1



In [8]:
print(documents[0])

page_content='John Smith
4490 Oak Drive
Albarry, NY 12210
Albarry 15.11.2025
Jessie Home
4312 Wood Road
New York, NY 10031
Invoice
Date Description QTY Unit Price Amount
14.11.2025 Front and rear brake cables 1 $15.00 $15.00
14.11.2025 New set of pedal arms 2 $25.00 $50.00
14.11.2025 Labor 3 $60.00 $180.00
Tax 8.10% $19.85
Total Payment due within 30 Days (15.12.2025) $264.85
Thank you
John Smith
Seite 1
' metadata={'source': 'data/pdfs/file_01.pdf', 'file_path': 'data/pdfs/file_01.pdf', 'page': 0, 'total_pages': 1, 'Creator': 'Calc', 'Producer': 'LibreOffice 24.2', 'CreationDate': "D:20251115142215+01'00'"}


In [9]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

In [10]:
MODEL_TEXT = "qwen3:4b"

llm = ChatOpenAI(
    model=MODEL_TEXT,
    api_key="ollama",
    base_url=OPENAI_BASE_URL,
)

<font color='red'>**TASK: Come up with a prompt that asks the LLM to answer a question about a PDF.**</font>

In [11]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "..."
        ),
        (
            "user",
            "..."
        ),
    ]
)

chain = prompt | llm

In [12]:
response = chain.invoke(
    {
        "question": "What is the following PDF about?",
        "pdf": documents[0].page_content,
        "instructions": "Answer the question with one sentence only. Do not make things up and say so if you cannot answer the question.",
    }
)
    
print(response.content)

The PDF is an invoice for vehicle repairs, listing parts (front and rear brake cables, pedal arms) and labor services with a total payment due of $264.85 by December 15, 2025.


In [13]:
documents[0].metadata["source"]

'data/pdfs/file_01.pdf'

In [14]:
response = chain.invoke(
    {
        "question": "What is the following PDF about?",
        "pdf": documents[0].metadata["source"],
        "instructions": "Answer the question with one sentence only. Do not make things up and say so if you cannot answer the question.",
    }
)
    
print(response.content)

I cannot access or analyze the PDF file "file_01.pdf" as it is not provided in the current context.


Usually, the LLM answers with something like `I cannot determine the content of the PDF without viewing it.` but from time to time it also makes things up (especially when the filename happens to be something like `invoice_01.pdf` it infers/invents some content).

Let's be more specific...

In [15]:
response = chain.invoke(
    {
        "question": "What is the total amount of the PDF invoice?",
        "pdf": documents[0].page_content,
        "instructions": "Answer the question and do not make things up. If you cannot answer the question say 'I cannot answer the question'.",
    }
)
    
print(response.content)

The total amount of the PDF invoice is **$264.85**.


In [16]:
response = chain.invoke(
    {
        "question": "What is the total amount of the PDF invoice?",
        "pdf": documents[0].metadata["source"],
        "instructions": "Answer the question and do not make things up. If you cannot answer the question say 'I cannot answer the question'.",
    }
)
    
print(response.content)

I cannot answer the question.


It becomes obvious that the LLM cannot answer the question because it cannot access the content of the PDF file. How about a system that can decide about tools to use to fulfill individual task in order to iteratively work towards an answer - i.e. let's build an agent.

Let's provide the LLM (resp. Agent) with the functionality to read the content of a PDF file.

<font color='red'>**TASK: Program a `Tool` that provides the LLM the possibility to read the content of a PDF (see [Tools](https://docs.langchain.com/oss/python/langchain/agents#tools) for additional input).**</font>

In [17]:
from langchain.tools import tool

@...
def read_pdf_content...

In [18]:
read_pdf_content.args_schema.model_json_schema()
#print(read_pdf_content.name)
#print(read_pdf_content.description)
#print(read_pdf_content.args)

{'description': 'Reads the content of a PDF file.\n\nArgs:\n    pdf_file_path: The path of the PDF file to read',
 'properties': {'pdf_file_path': {'title': 'Pdf File Path', 'type': 'string'}},
 'required': ['pdf_file_path'],
 'title': 'read_pdf_content',
 'type': 'object'}

<font color='red'>**TASK: Create an `Agent` that can use your `Tool` to access the content of a PDF in order to be able to answer your question(s) about this PDF (see [Agents](https://docs.langchain.com/oss/python/langchain/agents) for additional input).**</font>

In [19]:
from langchain.agents import create_agent

agent =

In [20]:
agent_chain = prompt | agent

In [21]:
result = agent_chain.invoke(
    {
        "question": "What is the total amount of the PDF invoice?",
        "pdf": documents[0].metadata["source"],
        "instructions": "Answer the question and do not make things up. If you cannot answer the question say 'I cannot answer the question'.",
    }
)

In [22]:
print(result["messages"][-1].content)

The total amount of the PDF invoice is $264.85.


Let's try to build an Agent that provides structured output...

<font color='red'>**TASK: Create an `Agent` that provides structured output about the total amout of an invoice PDF (see [Advanced Concepts](https://docs.langchain.com/oss/python/langchain/agents#advanced-concepts) for additional input).**</font>

Unfortunately, I have not been able to get this up and running (probably because I use Ollama and not OpenAI). In case you are successful, please let me know...

In [23]:
from pydantic import BaseModel, Field

class Invoice(BaseModel):
    """An invoice with different properties."""
    
    total_amount: str = Field(description="The total amount of the invoice.")

In [24]:
from langchain.agents import create_agent
# ToolStrategy, ProviderStrategy does not seem to work (yet?)
from langchain.agents.structured_output import ToolStrategy, ProviderStrategy

# ToolStrategy, ProviderStrategy does not seem to work (yet?)
structured_agent = create_agent(llm, tools=[read_pdf_content], response_format=ToolStrategy(Invoice))
#structured_agent = create_agent(llm, tools=[read_pdf_content], response_format=Invoice)

In [25]:
structured_agent_chain = prompt | structured_agent

In [29]:
result = structured_agent_chain.invoke(
    {
        "question": "What is the total amount of the PDF invoice?",
        "pdf": documents[0].metadata["source"],
        "instructions": "Answer the question and do not make things up. If you cannot answer the question say 'I cannot answer the question'.",
    }
)

In [30]:
print(result)

{'messages': [SystemMessage(content='You are a helpful assistant that analyzes a PDF. Be concise and accurate.', additional_kwargs={}, response_metadata={}, id='694cc6e8-6e0b-4941-8701-3a458d644539'), HumanMessage(content="Please answer the question 'Extract the total amount of the PDF invoice.' about following PDF 'data/pdfs/file_01.pdf'. Adhere to following additional instructions: 'Answer the question and do not make things up. If you cannot answer the question say 'I cannot answer the question'.'", additional_kwargs={}, response_metadata={}, id='841b81ac-486b-452e-8418-472884814a5a'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 1045, 'prompt_tokens': 295, 'total_tokens': 1340, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'qwen3:4b', 'system_fingerprint': 'fp_ollama', 'id': 'chatcmpl-390', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run-

In [31]:
# this does not work :-(

print(result["structured_response"])

KeyError: 'structured_response'