Local ReAct Agent

A minimal, "hand-built" ReAct agent that runs entirely on your computer using Ollama + Llama 3.2 3B. Four short Python files, no SDK abstractions over the loop — every Action is parsed from text and dispatched manually.

No API key. No paid subscription. ~3 GB disk for the model.

Pseudo Code

What this teaches

Chat API roles — every message is tagged system / user / assistant. Tool observations are injected as user messages with an Observation: prefix (the original 2022 ReAct paper convention).
Stop sequences — Ollama's stop parameter halts generation before the model writes Observation:, so each assistant message contains exactly one Action. A regex catches any stragglers client-side.
Prompt-as-history — the LLM is stateless. The full messages list is re-sent every turn. That's literally the "context window" filling up.
Inner vs outer loop — the outer loop is multi-turn conversation (repl.py). The inner loop is the ReAct iteration within one user turn (react.py). They're two distinct loops doing different jobs.
Tool dispatch — tools.TOOLS is a single dict mapping name → function. The system prompt is generated from it. Adding a tool is one entry.
No SDK tool-use — the agent loop is hand-built on top of a plain chat API. Every Action is parsed from text and dispatched manually.

Files

File	Lines	What it does
`llm.py`	~40	One pure function: `chat(messages, stop)` → text
`tools.py`	~95	`calculate` (AST eval), `web_search` (DuckDuckGo, no key), `dispatch`
`react.py`	~95	`SYSTEM_PROMPT`, `parse_action`, `agent_turn`
`repl.py`	~55	Multi-turn REPL with `/clear`/`/history` commands
`requirements.txt`	2	`httpx`, `ddgs`

Prereqs

Python 3.10+
Linux / macOS with Homebrew (other installers work too)
~3 GB free disk (Llama 3.2 3B is ~2 GB)

One-time setup

brew install ollama
brew services start ollama          # daemon on localhost:11434
ollama pull llama3.2:3b             # ~2 GB, one-time download

Per-session setup

cd /path/to/agent-loop
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run

python -m repl

REPL commands

Typed at the user> prompt. Handled by repl.py, never sent to the model.

Command	Effect
`quit` / `exit` / Ctrl-D	Leave the REPL
`/clear`	Drop the conversation history (keeps the system prompt)
`/history`	Dump every message in the current context, indexed and role-tagged

Anything else is treated as a user question and triggers one agent_turn.

Sample session:

user> add 100 to the year of the last presidential election in Chile               
  [step 1] thought: I need to find the year of the last presidential election in Chile.
  [step 1] action:  web_search[last presidential election in Chile]
  [step 1] observ:  - 2025 Chilean general election - Wikipedia: General elections were held in Chile on 16 November 2025. Voters went to th...
  [step 2] thought: The last presidential election in Chile was in 2025, so I need to add 100 to that year.
  [step 2] action:  calculate[2025 + 100]
  [step 2] observ:  2125
  [step 3] thought: The result of adding 100 to the year of the last presidential election in Chile is indeed 2125.
  [step 3] action:  finish[2125]

assistant> 2125

user> divide that by 5
  [step 1] thought: I need to divide 2125 by 5 to get the final answer.
  [step 1] action:  calculate[2125 / 5]
  [step 1] observ:  425.0
  [step 2] thought: The result of dividing 2125 by 5 is indeed 425.0, which makes sense as it's a simple division problem.
  [step 2] action:  finish[425.0]

assistant> 425.0

user> /clear
(history cleared)

Customize

Add a tool — add one entry to tools.TOOLS. The system prompt picks it up automatically via docs().
Swap the model — change MODEL in llm.py. Tiny models that work (sometimes): llama3.2:1b (faster, less reliable format), qwen2.5:3b, qwen2.5:1.5b.
More steps — raise max_steps in agent_turn (default 8).
Less deterministic — raise temperature in the chat call (default 0).

Troubleshooting

httpx.ConnectError: Ollama isn't running. brew services start ollama.
First turn is slow (~10 s): model loading into memory. Subsequent turns use the warm cache.
Model emits text instead of Action: ...: format drift. The loop catches it and prints (no action - using reply as final). Llama 3.2 3B is reliable; smaller models drift more.
Tool raises an exception: caught and reported as Observation: Error from <tool>: <message> so the model can react.
Answers are not logical: we are not using a Frontier model here.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
llm.py		llm.py
pseudo_code.png		pseudo_code.png
react.py		react.py
repl.py		repl.py
requirements.txt		requirements.txt
tools.py		tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local ReAct Agent

Pseudo Code

What this teaches

Files

Prereqs

One-time setup

Per-session setup

Run

REPL commands

Customize

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local ReAct Agent

Pseudo Code

What this teaches

Files

Prereqs

One-time setup

Per-session setup

Run

REPL commands

Customize

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages