Quick Start · Architecture · MCP Servers · CLI · Models · Docs
Local models are fast, free, and private. But ask one to call a tool and it falls apart. Wrong function names, broken parameters, infinite loops.
The models are capable. The framework wasn't.
OnsetLab makes 3B-7B models do reliable tool calling through a hybrid REWOO/ReAct architecture. The framework handles planning, execution, and error recovery. The model only does what it's good at: one step at a time.
onsetlab-demo.mp4
pip install onsetlabRequires Ollama running locally with a model pulled:
ollama pull phi3.5from onsetlab import Agent
from onsetlab.tools import Calculator, DateTime
agent = Agent("phi3.5", tools=[Calculator(), DateTime()])
result = agent.run("What's 15% tip on $84.50?")
print(result.answer)The agent routes the query, builds an execution plan, calls the right tool, and returns the answer. No prompt engineering required.
flowchart TD
Q["Query"] --> R["Router"]
R -->|"tools needed"| P["Planner"]
R -->|"no tools"| D["Direct Answer"]
P --> E["Executor"]
E --> S["Solver"]
P -. "plan fails" .-> RE["ReAct Fallback"]
D --> A["Answer"]
S --> A
RE --> A
style Q fill:#4a6670,stroke:#4a6670,color:#fff
style R fill:#fff,stroke:#4a6670,color:#2d3b40
style P fill:#e8f0fe,stroke:#7aa2f7,color:#3b5998
style E fill:#e8f0fe,stroke:#7aa2f7,color:#3b5998
style S fill:#e8f0fe,stroke:#7aa2f7,color:#3b5998
style D fill:#edf7ef,stroke:#9ece6a,color:#2d6a2e
style RE fill:#fdf4e7,stroke:#e0af68,color:#8a6914
style A fill:#4a6670,stroke:#4a6670,color:#fff
The Router classifies queries as tool-needed or direct-answer using the model itself. The Planner generates structured THINK -> PLAN steps with auto-generated tool rules from JSON schemas. The Executor resolves dependencies and runs tools in order. If planning fails, the ReAct Fallback switches to iterative Thought -> Action -> Observation loops to recover.
| Tool | Description |
|---|---|
Calculator |
Math expressions, percentages, sqrt/sin/log |
DateTime |
Current time, timezones, date math, day of week |
UnitConverter |
Length, weight, temperature, volume, speed, data |
TextProcessor |
Word count, find/replace, case transforms, pattern extraction |
RandomGenerator |
Random numbers, UUIDs, passwords, dice rolls, coin flips |
More tools will be added over time.
Connect any MCP-compatible server to give your agent access to external tools like GitHub, Slack, Notion, and more.
from onsetlab import Agent, MCPServer
server = MCPServer.from_registry("filesystem", extra_args=["/path/to/dir"])
agent = Agent("phi3.5")
agent.add_mcp_server(server)
result = agent.run("List all Python files in the directory")
print(result.answer)
agent.disconnect_mcp_servers()Any MCP server available via npm works too. See the docs for examples.
Built-in registry: filesystem · github · slack · notion · google_calendar · tavily
python -m onsetlab # interactive chat
python -m onsetlab --model qwen2.5:7b # specify model
python -m onsetlab benchmark --model phi3.5 --verbose # validate a model
python -m onsetlab benchmark --compare phi3.5,qwen2.5:7b # compare models
python -m onsetlab export --format docker -o ./my-agent # export as Docker
python -m onsetlab export --format config -o agent.yaml # export as YAMLExport formats: YAML (portable config), Docker (Dockerfile + compose + Ollama), vLLM (GPU-accelerated), Script (standalone .py file). See Export & Deploy docs for details.
| Model | Size | RAM | Notes |
|---|---|---|---|
phi3.5 |
3.8B | 4GB+ | Default. Good balance of speed and quality |
qwen2.5:3b |
3B | 4GB+ | Fast, good for simple tasks |
qwen2.5:7b |
7B | 8GB+ | Strong tool calling |
qwen3-a3b |
MoE, 3B active | 16GB+ | Best tool calling accuracy |
llama3.2:3b |
3B | 4GB+ | General purpose |
Works with any Ollama model. Run python -m onsetlab benchmark --model your-model to verify.
Website · Playground · Documentation · PyPI
Apache 2.0