smolBro is a local-first chat assistant and background agent with a small web UI, local llama.cpp models, SQLite-backed memory, scheduled jobs, MCP tool support, and opt-in CLI / remote escalation.
This is a very, very early-stage project. Use it at your own risk. I guarantee nothing. This is a vibecoded weekend project, not a polished product.
- Run a local-first assistant through
llama.cpp - Stream replies and reasoning from
llama.cppwhen the provider exposes them - Render assistant replies as markdown in the web UI
- Store chat logs, durable memories, and scheduled jobs in SQLite
- Let the model search, create, and update memories through constrained tools
- Load local
SKILL.mdskills and use them through tool-assisted prompt workflows - Expose external Model Context Protocol (MCP) tools from configured stdio servers
- Run a post-reply background memory reflection pass for likely durable user facts
- Use MCP-backed tools for workspace inspection, browsing, and other external integrations
- Ask for user approval before escalating to configured CLI helpers
- Use an optional OpenAI-compatible remote API provider
- Poll Telegram and route messages through the same orchestrator
- Guarantee correctness
- Guarantee stable memory behavior or perfect memory selection
- Guarantee safe autonomous operation without supervision
- Guarantee compatibility across machines, GPUs, or
llama.cpprevisions - Guarantee production-grade security, observability, or failure handling
- Guarantee that every streamed turn can also do tool-based memory writes at the same time
- The backend is Bun + TypeScript
- The frontend is React + Vite
- State is stored in
data/smolbro.sqlite - Local models default to Qwen3 GGUF builds from Hugging Face
- The app manages local
llama-serverchild processes itself - Memory writes now happen in two ways:
- direct memory tools during tool-using turns
- a separate post-reply memory reflection pass for likely durable facts
bunnpmgitcmakecurl
npm install
npm --prefix frontend installnpm run devThis will:
- clone and build repo-local
llama.cppif needed - download the default local GGUF models if missing
- start the Bun server on
http://127.0.0.1:3000
npm run llama:build
npm run models:pull
npm run dev:server
npm run ui:dev
npm run ui:buildqwen3:1.7b->Qwen/Qwen3-1.7B-GGUF/Qwen3-1.7B-Q8_0.ggufqwen3:0.6b->Qwen/Qwen3-0.6B-GGUF/Qwen3-0.6B-Q8_0.gguf
Default local runtime port:
- primary:
12434
The built-in UI can:
- send streamed chat requests
- show live request stats
- show reasoning when the provider emits it
- render assistant markdown
- inspect provider state
- inspect chat logs, memories, jobs, and job runs
- create manual memories
smolBro has durable memory, but it is intentionally constrained.
Current memory paths:
- Manual API writes through
POST /api/memories - Model tool calls:
search_memoriescreate_memoryupdate_memory
- Post-reply background memory reflection for likely durable facts
The intended memory scope is narrow:
- stable user preferences
- identity details
- recurring workflow instructions
- lasting project facts
Things it should not store:
- one-off requests
- transient context
- secrets or credentials
- speculative or weakly supported facts
This still needs supervision. Do not assume memory writes are always correct.
search_memoriescreate_memoryupdate_memory
There is no model-facing delete tool right now.
list_skillsread_skill
Skills are loaded from local directories that contain SKILL.md files.
list_mcp_tools- one generated tool per configured MCP server tool, named like
mcp_<server>_<tool>
MCP support currently uses stdio servers configured through environment variables. MCP is the path for browser-like, filesystem-like, and other external tool integrations.
smolBro is local-first, but it can escalate when needed.
Supported escalation paths:
- configured CLI presets
- OpenAI-compatible remote API
CLI escalation is approval-based:
- smolBro decides a bigger model may help
- it asks the user for permission
- if approved, it offers the configured CLI presets
Currently supported:
/model list/model <name>/model install <name>
Examples:
/model list
/model qwen3:0.6b
/model install qwen3:1.7b
/model install Qwen/Qwen3-1.7B-GGUF:Qwen3-1.7B-Q8_0.gguf
GET /healthGET /api/strategyGET /api/providersGET /api/mcpGET /api/skillsGET /api/operations/:id
POST /api/chatPOST /api/chat/streamGET /api/chat-logs
GET /api/memoriesPOST /api/memories
GET /api/jobsPOST /api/jobsGET /api/job-runsPOST /api/jobs/:id/runPOST /api/jobs/:id/startPOST /api/jobs/:id/stopDELETE /api/jobs/:id
curl -s \
-H 'content-type: application/json' \
-d '{"message":"Remember that I prefer concise answers."}' \
http://127.0.0.1:3000/api/chatPORT=3000SMOLBRO_DATA_DIR=./dataSMOLBRO_DEBUG_MODE=falseSMOLBRO_ENABLE_MOCK=trueSMOLBRO_PERSONALITY=...
SMOLBRO_SMALL_MODEL=qwen3:1.7bSMOLBRO_LOCAL_RUNTIME_HOST=127.0.0.1SMOLBRO_LOCAL_RUNTIME_PRIMARY_PORT=12434SMOLBRO_LOCAL_RUNTIME_TIMEOUT_MS=120000SMOLBRO_LOCAL_RUNTIME_STARTUP_TIMEOUT_MS=120000SMOLBRO_LOCAL_RUNTIME_CONTEXT_SIZE=8192SMOLBRO_LOCAL_RUNTIME_GPU_LAYERS=0SMOLBRO_LOCAL_RUNTIME_THREADS=0SMOLBRO_LLAMA_CPP_DIR=./.tools/llama.cppSMOLBRO_LOCAL_MODELS_DIR=./.modelsSMOLBRO_HUGGING_FACE_TOKEN=SMOLBRO_LLAMA_CPP_REPO=https://github.com/ggml-org/llama.cpp.gitSMOLBRO_LLAMA_CPP_REF=SMOLBRO_LLAMA_CPP_BUILD_JOBS=SMOLBRO_LLAMA_CPP_CMAKE_ARGS=
SMOLBRO_REMOTE_API_BASE_URL=SMOLBRO_REMOTE_API_MODEL=SMOLBRO_REMOTE_API_KEY=
SMOLBRO_CLI_PRESETS=[]SMOLBRO_CLI_MODEL_COMMAND=
SMOLBRO_MCP_SERVERS=[]SMOLBRO_MCP_REQUEST_TIMEOUT_MS=30000
Example:
export SMOLBRO_MCP_SERVERS='[
{
"id": "filesystem",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "."],
"cwd": ".",
"description": "Example filesystem MCP server"
}
]'SMOLBRO_ENABLE_SKILLS=trueSMOLBRO_SKILLS_DIRS=./skillsSMOLBRO_SKILLS_DIR=./skillsSMOLBRO_SKILLS_MAX_CATALOG_CHARS=2000
smolBro scans each configured skills directory recursively for SKILL.md files.
SMOLBRO_TELEGRAM_BOT_TOKEN=SMOLBRO_TELEGRAM_API_BASE_URL=https://api.telegram.orgSMOLBRO_TELEGRAM_POLL_INTERVAL_MS=3000SMOLBRO_TELEGRAM_ALLOWED_CHAT_IDS=
SMOLBRO_SCHEDULER_POLL_MS=5000SMOLBRO_CONTEXT_MAX_RECENT_TURNS=4SMOLBRO_CONTEXT_MAX_MEMORIES=5SMOLBRO_CONTEXT_MAX_PINNED_MEMORIES=2SMOLBRO_CONTEXT_CHAR_BUDGET=2200
Telegram support is optional and uses long polling from the main server process. It does not require a webhook.
At minimum:
export SMOLBRO_TELEGRAM_BOT_TOKEN=123456:telegram-bot-tokenOptional hardening:
export SMOLBRO_TELEGRAM_ALLOWED_CHAT_IDS=123456789,-100987654321- This project is unfinished
- Tool use can still be wrong
- Memory reflection can still save the wrong thing
- Local model routing is heuristic-based
- Browser access is constrained but not battle-tested
- API and storage formats may change without warning
- There is no migration or upgrade story I would call stable yet
- expect breakage
- expect rough edges
- inspect memory state yourself
- keep backups if you care about the data
- do not trust it with anything critical
If you still want to use it, that is on you.