A C++17 framework anyone can use to build AI agents that talk to their own services — no llama.cpp, JSON-Schema, or template-engine knowledge required.
easyai turns llama.cpp into an agent engine you can drop into any program in a dozen lines. You give it C++ functions; it gives the model the ability to call them. That's the whole pitch.
It ships two libraries you can find_package(easyai) and link
against, plus six ready-to-run binaries:
| Library | Purpose |
|---|---|
libeasyai |
Local llama.cpp engine — easyai::Engine, easyai::Tool, built-in tools, presets, easyai::Plan. Linked via easyai::engine. |
libeasyai-cli |
OpenAI-protocol client — easyai::Client mirrors Engine but the model runs on a remote /v1/chat/completions endpoint while tools execute locally. Linked via easyai::cli. |
| Binary | What it gives you |
|---|---|
easyai-local |
Local-only REPL: loads a GGUF in-process via easyai::Engine. Drop-in llama-cli replacement — one-shot scripting (-p), tools, presets, optional <think> strip, sandboxed fs_* tools, opt-in bash tool. |
easyai-cli |
Agentic OpenAI-protocol client built on libeasyai-cli — no local model. REPL, --shell (hybrid AI shell), or -p one-shot. Full sampling control (--temperature, --top-p, --top-k, --min-p, --repeat-penalty, --frequency-penalty, --presence-penalty, --seed, --max-tokens, --stop), plan tool, server-management subcommands (--list-models, --list-tools, --health, --props, --metrics, --set-preset). HTTPS via OpenSSL; --insecure-tls / --ca-cert for dev/internal CAs. Full doc: easyai-cli.md. |
easyai-server |
Drop-in llama-server replacement: OpenAI-compat HTTP with full SSE streaming, embedded SvelteKit webui, Bearer auth, Prometheus /metrics, KV-cache controls, flash-attn, mlock. Speaks MCP, OpenAI, Ollama from one process. Full doc: easyai-server.md. |
easyai-mcp-server |
Standalone Model Context Protocol provider — no model loaded. Same tool catalogue as easyai-server (built-ins + the memory tool + external-tools), exposed over POST /mcp with a configurable cpp-httplib worker pool (--threads) and an in-flight tools/call cap (--max-concurrent-calls) for thousands-of-clients deployments. Full doc: easyai-mcp-server.md. |
easyai-agent |
A demo agent showing every built-in tool plus an inline custom tool. |
easyai-recipes |
Tutorial agent paired with manual.md — implements today_is and weather (HTTP-calling) from scratch. |
easyai-chat |
A bare-bones REPL with no tools — useful as a sanity check. |
Status — used in production on a Linux Vulkan box (Radeon 680M) as a self-hosted ChatGPT-style assistant. Apple Silicon (Metal), Linux/Windows Vulkan, NVIDIA CUDA, and AMD ROCm are all wired up out of the box.
scripts/install_easyai_server.shhandles the whole Debian/Ubuntu deployment in one command (systemd-coredump, hardened unit, optional--enable-verbose, drop-in compat withinstall_llama_server.sh).
A running log of user-facing changes. Latest first — keep this list current as features land so anyone returning to the repo (or landing on it for the first time) sees what shipped recently.
llama.cpp's Multi-Token Prediction merged upstream on 2026-05-16; we
bumped our vendored llama.cpp checkout to 39cf5d619 (same-day HEAD,
all 262 commits since the previous pin) and wired the MTP path
through the three layers in one go.
Library API (include/easyai/engine.hpp):
engine.spec_type("draft-mtp") // or: none (default), draft-simple,
// draft-eagle3, ngram-simple,
// ngram-map-k, ngram-map-k4v,
// ngram-mod, ngram-cache
.spec_draft_n_max(6); // max draft tokens per stepUnknown strings land in Engine::last_error() and leave speculation
off (no silent default switch).
Server CLI:
easyai-server -m /path/to/mtp-model.gguf \
--spec-type draft-mtp --spec-draft-n-max 6INI keys: [ENGINE] spec_type and [ENGINE] spec_draft_n_max.
Installer shortcut:
./install_easyai_server.sh --mtp # n_max=6 (default)
./install_easyai_server.sh --mtp --mtp-n-max 8 # overrideThe installer bakes the two flags into the systemd ExecStart so the
service inherits MTP without systemctl edit.
Caveat: MTP needs a model TRAINED with MTP heads (DeepSeek V3,
MimoVL, and similar). Plain models will refuse to load with
--spec-type draft-mtp. The installer's --mtp flag is the operator
saying "I know what I'm doing"; there's no validation.
Classic standalone-draft-model speculative decoding (the
--draft-model PATH path) is not yet wired — only MTP, which doesn't
need a separate model file. The old installer compat lines for
--draft-model / --draft-max / --draft-min still warn and skip.
Every binary that loads --memory <dir> now auto-injects a compact
keyword-vocabulary block into the system prompt so the model knows
what it has tagged without having to call memory(action="keywords")
first. The block looks like:
# MEMORY VOCABULARY (the keywords your private memory currently
has tagged — the FIRST place to look for anything you might
already know)
12 entries (most-common first; call memory(action="search",
keywords=["<name>", ...]) to recall):
easyai(8) claude(5) bitnet(3) build(3) iteration(2) …
Sorted count desc / name asc, capped at top 40. Empty store → block omitted, no wasted tokens.
| Binary | When the vocab is computed |
|---|---|
easyai-server |
Every request (fresh disk scan, ~10-50ms — rounding error vs. inference). New saves visible on the next request. |
easyai-local |
Once at startup, appended to the system prompt. New saves visible after restart. |
easyai-cli |
Once when building the system prefix sent to the remote server. |
The AUTHORITATIVE preamble used to live as a build_authoritative_ preamble inside examples/server.cpp with parallel partial
copies in local.cpp and nothing in cli.cpp. That drift is gone:
the builder is now public in libeasyai —
// include/easyai/preamble.hpp
namespace easyai::preamble {
struct Options {
bool inject_datetime = true;
std::string knowledge_cutoff = "2024-10";
std::string memory_root; // empty → vocab block omitted
};
std::string build(const Options & opt);
}— and all three binaries call it. Change the renderer once, every binary updates. Third-party hosts of libeasyai get the same behaviour out of the box.
See RAG.md §5 "Automatic vocabulary injection" and design.md
§5c for the full design.
Same-day follow-up to the morning's --tools-mode landing: split
is now the out-of-the-box default, not unified.
Reason: smaller / quantised tool-callers (Llama 3 8B, Qwen 2.5 7B,
Phi-3.5, GPT-OSS-20B) dispatch much more reliably against flat
one-verb-per-tool schemas than against a fs(action="...")
discriminated-union dispatcher. Large models handle either shape
fine. The split surface costs ~15-20% extra system-prompt tokens for
a 30-50% reduction in retry / "unknown action" hops in practice —
worth it for everyone, surprising for nobody.
| Surface | Registered out of the box | Old behaviour | New default |
|---|---|---|---|
| Multi-action families | fs, web, memory |
3 dispatchers | fs_read, fs_write, fs_append, fs_edit, fs_list, fs_glob, fs_grep, fs_check_path, fs_cwd, fs_sandbox, web_search, web_fetch, memory_save, memory_append, memory_search, memory_load, memory_list, memory_delete, memory_keywords — 19 focused tools |
# new default (no flag)
easyai-cli --url http://ai.local:8080 --sandbox ~/proj
# opt back in to the legacy dispatcher (3 tools instead of 19)
easyai-cli --tools-mode unified --url ai.local:8080 --sandbox ~/proj
# best of both worlds — costs more tokens, lets the model pick
easyai-cli --tools-mode both --url ai.local:8080 --sandbox ~/projLibrary callers: Toolbelt::tool_mode_ now defaults to
ToolMode::Split; pass ToolMode::Unified explicitly if your prompt
relies on the legacy tool names.
INI: [cli] tools_mode = unified|split|both (default split).
fs, web, and memory ship as unified dispatchers with an
action parameter (e.g. fs(action="read", ...)). That shape keeps
the system prompt small and lets a large model batch many actions, but
smaller / quantised tool-callers (Llama 3 8B, Qwen 2.5 7B, Phi-3.5,
GPT-OSS-20B) gravitate toward one-purpose tools — fs_read, fs_edit,
etc. — because the verb IS the tool name and the parameter schema is
flat.
Three modes, selected by the new flag (defaults flipped to split in
the same-day follow-up entry above):
easyai-cli --tools-mode unified # legacy: one dispatcher per family
easyai-cli --tools-mode split # one focused tool per action
easyai-cli --tools-mode both # register both surfaces side-by-side
| Mode | Tools registered (with --sandbox + --memory) |
|---|---|
unified |
fs, web, memory — 3 dispatchers |
split (new default) |
fs_read, fs_write, fs_append, fs_edit, fs_list, fs_glob, fs_grep, fs_check_path, fs_cwd, fs_sandbox, web_search, web_fetch, memory_save, memory_append, memory_search, memory_load, memory_list, memory_delete, memory_keywords — 19 focused tools |
both |
unified + split, same handlers under both names |
Same handlers under the hood — behaviour is identical to the unified surface; only the registration shape changes. Library API:
easyai::cli::Toolbelt()
.sandbox("/srv/data")
.tool_mode(easyai::cli::ToolMode::Split) // or Both, or Unified
.apply(client);INI: [cli] tools_mode = unified|split|both.
Reverts the 2026-05-12 default flip: loading the existing
.easyai_session is opt-in again via --continue. Without the
flag, any file in cwd is ignored and overwritten on the first turn
— matching the behaviour shipped originally on 2026-05-12 morning
before the auto-on flip.
Why: the auto-on default surprised operators who opened a project directory expecting a fresh agent and instead picked up history from a previous experiment. An explicit opt-in matches the rest of the cli's surface (nothing else implicitly carries state across invocations) and removes the silent action-at-a-distance.
| Previous (2026-05-12 → 2026-05-13) | Now | |
|---|---|---|
| Resume on launch | default ON | opt-in via --continue |
| Start fresh | opt-in via --no-continue |
default |
--compress without --continue |
no-op (warning) | no-op (warning) |
Saving is unchanged: every turn (and every tool round-trip) still
rewrites .easyai_session atomically. --no-continue stays as the
explicit form of the default — useful for scripts overriding an
operator's [cli] auto_continue = on INI line.
Default for [cli] auto_continue flips to false. Operators who
prefer the auto-on behaviour can opt in once via INI:
[cli]
auto_continue = trueFull doc: easyai-cli.md §10.
The systemd unit now carries StartLimitBurst=2 +
StartLimitIntervalSec=60 in [Unit], so the service attempts to
start at most twice in any 60-second window before giving up and
leaving the unit in the failed state.
Before, Restart=on-failure + RestartSec=10 with no burst cap
would retry indefinitely — a missing model file, a bad CLI flag, or
a GPU that wasn't exposed to the container produced an infinite
restart loop that filled journald and never surfaced the real
problem.
Now:
| State | Behaviour |
|---|---|
| Initial start fails | Wait RestartSec=10, retry once |
| Retry also fails | Unit enters failed state; no further attempts |
| Long-running service fails after running > 60 s | Burst counter has reset → still gets one retry (not penalised for late failures) |
Recovery: journalctl -u easyai-server to inspect the two failed
attempts, fix the root cause, then
sudo systemctl reset-failed easyai-server
sudo systemctl start easyai-server.
Existing installs: re-run install_easyai_server.sh --force (or
--upgrade) to refresh the unit file. Restart=on-failure and
RestartSec=10 are unchanged.
2026-05-13 — Installer: ship only system.txt_template; default install uses the binary's built-in prompt
scripts/install_easyai_server.sh no longer drops an active
/etc/easyai/system.txt on first install. Out-of-the-box, only the
template /etc/easyai/system.txt_template ships (the canonical
"factory" copy of the Deep persona, refreshed on every --upgrade),
and SERVER.system_file is left commented out in easyai.ini — so
the server uses the binary's built-in prompt, which is already
gated on actually-registered tools: it never advertises fs /
bash if those are off in the INI.
The template file was also renamed system.txt_modelo →
system.txt_template (English-only convention).
| State | Before (≤ 2026-05-12) | Now (2026-05-13+) |
|---|---|---|
Template file at /etc/easyai/ |
system.txt_modelo (Portuguese) |
system.txt_template |
Active /etc/easyai/system.txt on first install |
dropped (Deep persona) | NOT installed |
--force rewrites system.txt |
yes | no (file isn't there) |
SERVER.system_file in easyai.ini |
commented out | commented out (unchanged) |
| Out-of-the-box prompt | active system.txt (same Deep body) |
binary's built-in, tool-gated |
To activate a custom persona — same one-liner as before:
sudo cp /etc/easyai/system.txt_template /etc/easyai/system.txt
sudoedit /etc/easyai/system.txt # tweak as needed
sudoedit /etc/easyai/easyai.ini # uncomment SERVER.system_file
sudo systemctl restart easyai-serverExisting installs are unaffected: the installer still preserves
any existing /etc/easyai/system.txt across --upgrade and --force
runs (it just no longer creates one when it doesn't exist).
Full doc: LINUX_SERVER.md §6
("/etc/easyai/system.txt (operator-supplied) + system.txt_template")
and §12 ("Upgrading").
scripts/install_easyai_server.sh used to print
ttm.pages_limit already present; skipping when /etc/default/grub
already had a ttm.pages_limit=N token — even if N differed from
the value the operator just passed via --gtt. Result: re-running
the installer with a new GTT size was silently a no-op on the
GRUB side, and the next reboot kept the stale page count.
The patch now compares the existing token's page count against the
target, rewrites it in place when they differ (via sed -i), and
runs update-grub so the change lands in /boot/grub/grub.cfg.
The reboot reminder also points at /proc/cmdline so operators
can verify the new value boots cleanly.
No flag change. Operators who pass the same --gtt value on every
run see the same idempotent "already present; skipping" message.
Tuned the aura halo on the AI Box mark so it reads as a quiet emission instead of a neon outline. The earlier tuning was described internally as "loud"; this pass cuts both stacked Gaussian blurs to subtler values:
| Layer | Before (07c2347) | Now (cc92d51) |
|---|---|---|
Outer halo stdDeviation |
14 | 10 |
Outer halo flood-opacity |
0.5 | 0.3 |
Inner halo stdDeviation |
4 | 3 |
Inner halo flood-opacity |
1.0 | 0.6 |
Gradient, mark geometry, viewBox headroom and filter cyan flood
(#00bcd4) all unchanged. Both webui/AI-brain.svg (the
canonical SVG source) and the inline constexpr kBrandSvg in
examples/server.cpp updated in lockstep,
so the favicon route serves the same softened version every
embedder sees.
The previous save points covered every interruption mode except
force-exit — triple rapid Ctrl-C triggers the force-exit handler
(_exit(130)), which bypasses atexit and the post-chat()
save in run_one(). Operators reported that a long agentic turn
that got force-exited left no .easyai_session on disk.
Fix: layer an additional save into the on_tool callback so
.easyai_session is rewritten after every tool round-trip in a
turn, not just at the end of the turn. Only the in-flight partial
reply since the last completed tool is lost; everything earlier
(file edits, bash output, plan steps, RAG queries) is on disk and
re-loadable.
Wiring: easyai::ui::Streaming::notify_tool(call, result) is now a
public forwarder for the private on_tool UI handler, so external
embedders can compose extra behaviour onto the on_tool slot
(checkpoint to disk, telemetry, audit log) without losing the
streaming output (tool indicators, dim styling, plan rendering).
The cli's binary uses it as:
cli.on_tool([&](const ToolCall & c, const ToolResult & r) {
streaming.notify_tool(c, r); // canonical UI
save_session(cli, &err); // disk checkpoint
});Pattern is documented inline in
include/easyai/ui.hpp above the
notify_tool declaration. No flag / INI change.
Iteration on yesterday's session-persistence feature: loading the
existing .easyai_session is now the default (you don't need
--continue to pick up where you left off). The semantics flip:
| Previous (2026-05-12 morning) | Now | |
|---|---|---|
| Resume on launch | opt-in via --continue |
default ON |
| Start fresh | default | opt-in via --no-continue |
--compress without --continue |
hard error | warning (no-op when combined with --no-continue) |
The cli also now exposes every session-related knob plus the raw-log
knobs through [cli] in /etc/easyai/easyai-cli.ini:
[cli]
auto_continue = true # default; load .easyai_session if present
auto_compress = false # default; recap on every load when on
log_file = # default empty; path enables --log-file equivalent
auto_log = false # default; when true, restores the library's legacy /tmp auto-log
show_bash = true # default; mirror bash subprocess output to the operator terminal
show_python = true # default; same for python3CLI flag precedence is unchanged: explicit flag > INI > hardcoded
default. All --continue / --no-continue / --compress /
--log-file flags continue to work and override the INI for that
invocation.
--continue is kept as a no-op alias for backward compat (useful in
scripts that want to force resume even when an operator's INI flipped
auto_continue off).
Full doc: easyai-cli.md §10.
Every easyai-cli invocation now writes a .easyai_session file in
the current working directory after each chat turn (atomic tempfile
- rename, mode 0600). Three control points:
| Surface | What it does |
|---|---|
| (no flag) | Start fresh, overwrite on first turn, save every turn |
--continue |
Resume the .easyai_session in cwd; warn + start fresh if none |
--continue --compress |
Resume + ask the model for one lossless recap; replace history with the recap before the first prompt |
/compress (REPL) |
Same recap flow, fired mid-session |
The file is the raw OpenAI-shape message array (greppable, diffable,
re-loadable). Two new methods on the public Client API
(dump_history() / load_history()) make the same persistence
available to library embedders.
Raw log default flipped to OFF. Prior versions created
/tmp/easyai-cli-remote-<pid>-<epoch>.log whenever --verbose was
set, AND the library opened a separate /tmp/easyai-client-<pid>-<epoch>.log
on every Client construction. Both are now opt-in:
- The binary's transaction log opens only when
--log-file PATHis given (mode 0600 at PATH).--verboseis now stderr-only. - The library's auto-log is suppressed by setting
EASYAI_NO_AUTO_LOG=1in the cli binary'smain()before the Client is constructed. Operator override (EASYAI_NO_AUTO_LOG=0in the env) still wins.
Net: a default invocation leaves nothing in /tmp. See
easyai-cli.md §9 and §10 for full docs.
A user-reported bug: fs(action="edit") was silently corrupting
files when the model passed content without a trailing \n.
The last byte of content got glued onto the first preserved line
after the edit range — turning int b = 22;\n return a + b;
into int b = 22; return a + b;. When the deleted range
happened to contain the only } between two function bodies,
this silently swallowed the brace and the file failed to compile
with "function definition is not allowed here" + "expected '}'"
on the next build.
Root cause: the tool description said "include a trailing \n
yourself" but the model consistently forgot. Fix:
make_fs_edit_handler now auto-inserts a \n separator on each
side of content if and only if one is needed to keep the seam
lines apart. Both guards no-op when content is already
correctly terminated (or empty for a pure delete), so the change
is invisible to model calls that were already doing the right
thing.
Tool description updated to drop the "include trailing \n"
advice — line semantics are now preserved automatically.
Verified against a 9-case smoke matrix (middle-replace with/without trailing newline, multi-line content lacking newline, pure delete, pure insert, append-at-EOF on files with and without trailing newline, replace-last-line on a file without trailing newline, whole-file replacement) — all nine pass.
Documented as §22.8 (post-publish correction) in
SECURITY_AUDIT.md; §22.4's "no findings"
claim for the fs.edit/append/ops batch surface has been amended
with a forward-pointer to §22.8. No CLI / INI / library API
changes; rebuild to pick up the fix.
Re-applied the standing audit on the ~5,000 LoC added since the 6th pass (2026-05-08). Three findings, all closed in this commit:
- HIGH —
run_capped_subprocessbanner sanitization. The[bash] $ …/[python3] $ …opening banner used to print the model-supplied command/code throughfprintfverbatim, so a snippet that embedded an ANSI/OSC sequence could repaint the operator's terminal (window title, screen wipe, OSC 52 clipboard write) one line before any child output arrived. The live mirror channel was already hardened in §20.1; the banner is now sanitized the same way (CR/LF/TAB pass; ESC rendered as visible^[marker; other C0/DEL dropped). Forpython3the banner now shows the user's code only — the 25-line sandbox preamble was previously included, cluttering every transcript. - MEDIUM — python3 sandbox preamble closure tightening. The
preamble that wraps
open()to pin disk access to the sandbox used to leave_e_open_orig,_e_chk, and_e_rootat module scope, so user code could trivially call the raw_e_open_origby name and bypass the check — the comment claimed "closure cell" protection that the implementation didn't actually provide. Restructured into an_e_make_wrappersfactory whose function- local names become real lexical closure cells; the wrappers still work, but the originals are no longer reachable from module scope. (Adversarial bypass viactypes/subprocess/_io.FileIOis unchanged and still documented as out-of-scope.) - LOW — installer INI-shape validation widened. §20.4 / §21.4
already validated
--temperature,--top-p,--ctx-sizeetc. viarequire_numericto defeat heredoc injection. Today extended the integer roster (--service-port,--threads,--threads-batch,--ngl) and added a newrequire_no_injectionhelper that rejects\n/\r/=/[/]in the non-numeric knobs (--service-host,--alias,--webui-title,--cache-type-k,--cache-type-v). Same operator-typo / hostile-CI threat model as §20.4.
Full narrative in SECURITY_AUDIT.md §22.
Rebuild to pick up the fixes — no INI, CLI, or library API changes.
The CLI's prompt-eval indicator no longer animates. While the server
is ingesting the prompt the spinner shows a steady thinking[ N%]
in 256-colour grayscale 244 (mid-gray, RGB 128/128/128) — bright
enough to read on a dark terminal, dim enough to clearly signal "in
progress, not the model's output." Replaces the 10 Hz spotlight
sweep that landed in d7e7202. Drops the dual-cadence heartbeat —
the heartbeat now runs at one cadence (250 ms) and skips its
repaint entirely while the thinking label is up; only
set_thinking_pct() (driven by the server's easyai.prompt_progress
SSE event) triggers a redraw when the % suffix changes.
The tool result returned by python3 now opens with a fenced
python ... block carrying the snippet that just ran, followed
by a [python3 executed] notification line, then the exit code and
captured output. Chat UIs that render markdown (the embedded webui,
typical clients) display the code with syntax highlighting, so an
operator skimming the conversation transcript can see what executed
without having to expand the raw tool-call JSON.
The model's code argument is what gets rendered — the
kPythonSandboxPreamble (the disk-restriction monkey-patch) is
deliberately stripped from the displayed source so the transcript
isn't cluttered with the same 25 lines on every call.
Result shape:
```python
<the snippet>
[python3 executed] exit=0 <captured stdout+stderr>
Spawn-side errors (pipe / fork failure — the interpreter never
ran) still surface unwrapped, so the error message stays the
actual cause and isn't dressed up with a misleading "executed"
notice.
### 2026-05-09 — METRICS line: always on, default every 5 minutes
The periodic METRICS log line in `easyai-server` is now emitted
**unconditionally** — no longer gated on `--verbose`. Operators
need the CPU / mem / GPU / TCP-state / TIME_WAIT-pressure telemetry
in journalctl whether or not they're chasing a debug session.
* `metrics_interval` default raised from `1` second to `300`
seconds (5 minutes). Low-overhead enough to leave on permanently
in production; bump **down** (60, 30, 5) when actively
troubleshooting.
* The systemd installer's `easyai.ini` template was bumped from
`metrics_interval = 60` to `metrics_interval = 300` to match.
* `--verbose` no longer claims the METRICS line in its description
or banner — only the request-level `→` / `←` lines remain
verbose-only.
Existing operators who pinned `[SERVER] metrics_interval` in their
INI keep their value; only the unspecified default shifts.
### 2026-05-09 — `python3` is default-on with a sandboxed disk surface
Promoting `python3` from explicit-opt-in (--allow-python) to
auto-on whenever the operator has signalled "the model can touch
files" — same gate as `fs`: --sandbox set OR --allow-bash on. The
embedded webui inherits this for free since the systemd unit ships
with --sandbox /var/lib/easyai/workspace.
* **`--allow-python` removed; `--no-python` is the new opt-out.**
Mirrors `--no-web` / `--no-datetime`: the tool defaults on and
operators who don't want it pass the `--no-*` flag (or set
`[SERVER] allow_python = off` in the INI).
* **Disk access auto-restricted to the sandbox root.** Every
snippet is auto-prefixed with a short Python preamble that
monkey-patches `builtins.open`, `io.open`, and `os.open` to
reject any path resolving outside the cwd Python was chdir'd
into. `open("/etc/passwd")` raises `PermissionError`;
`pathlib.Path("/etc/hostname").read_text()` raises through
`pathlib`'s internal `open()` call.
* **Description rewritten to forbid disk use.** "USE FOR: testing,
calculation, data processing, networking, information gathering.
NEVER USE FOR DISK — every disk operation has a fs(action=...)
equivalent." The preamble is defense-in-depth; the description
is the primary contract.
* **Defense-in-depth, not a real sandbox.** The model can still
escape via `import ctypes; ctypes.CDLL("libc.so.6").open(...)`,
`subprocess.run(["cat", "/etc/passwd"])`, or `os.system(...)` —
the protection is against accident, not adversarial intent. Same
threat model as `bash`: explicit operator opt-in, not a real
sandbox.
### 2026-05-09 — `python3` tool: isolated Python 3 snippet runner
A second shell-class executor alongside `bash`, gated by its own
`--allow-python` flag (off by default — same threat model as bash).
The model gets one extra tool when enabled:
* `python3(code, timeout_sec?)` — runs the snippet via
`python3 -I -S -E -c <code>`. Isolated mode: no `PYTHON*` env vars,
no `site.py` / no .pth files / no site-packages, no cwd on
`sys.path`. The standard library is available; `import requests`
fails with `ModuleNotFoundError`, by design — predictable behaviour
regardless of host Python configuration.
* Same hardening as `bash`: cwd pinned to `--sandbox`, fds 3+ closed
before exec, SIGTERM/SIGKILL deadline, 32 KB stdout+stderr cap,
optional operator-facing live mirror via `--no-show-python` to opt
out (default ON when `--allow-python` is on).
* Internally, `bash` and `python3` now share one `run_capped_subprocess`
helper — the fork/fd-close/chdir/drain/wait machinery only lives in
one place.
When to reach for `python3` vs `bash`: data manipulation (JSON, regex,
Decimal math, statistics, date arithmetic) is one Python snippet; shell
pipelines / build runners / git / package managers stay in `bash`.
`--allow-python` flag is wired through every binary (`easyai-cli`,
`easyai-local`, `easyai-server`, `easyai-mcp-server`) plus the INI
`[SERVER] allow_python` key. `EASYAI-*.tools` manifests cannot shadow
the new `python3` reserved name.
### 2026-05-09 — One tool per concept: unified `web`, unified `fs`, RAG `--split-rag` removed
A consolidation pass on the built-in tool surface. Three loose
collections (web, filesystem, rag) collapsed to one tool each, all
shaped the same way — single `Tool` with an `action` parameter and a
flat schema (every parameter optional except `action`). Pattern
mirrors the rag dispatcher introduced 2026-05-04.
* **`web` tool** — `web(action="search"|"fetch")`. Replaces the
separate `web_search`, `web_fetch`, and `web_google` tools. Search
takes an `engine` parameter (`"auto"` default — cascades through
google → brave → ddg-lite → bing → ddg, returning the first that
succeeds; explicit picks: `"google"` opt-in via `--use-google` plus
the GOOGLE_API_KEY / GOOGLE_CSE_ID env vars, `"brave"` keyless HTML
scrape with the best understanding of niche named entities,
`"ddg-lite"` keyless no-JS DDG endpoint with a Netscape UA (page 1
only — bypasses the anti-bot wall the modern DDG endpoint applies),
`"bing"` keyless RSS feed, `"ddg"` keyless HTML scrape but
increasingly blocked from server IPs). Both actions take `page` for
pagination; `fetch` takes `start` + `limit` for byte-window control.
* **`fs` tool** — `fs(action="read"|"write"|"list"|"glob"|"grep"|"check_path"|"cwd"|"sandbox")`.
Replaces seven separate factories plus `get_current_dir` and
`get_sandbox_path`. `--allow-fs` now registers one tool, not seven.
* **`--split-rag` removed.** The legacy seven `rag_*` tools and the
`--split-rag` flag are gone everywhere — CLI, INI, examples, all
four binaries. The single `rag(action=...)` dispatcher (default
since 2026-05-04) is the only RAG layout. On-disk format unchanged.
* **Public-API breakage.** Anyone consuming `libeasyai` directly: the
individual `easyai::tools::web_search()` / `web_fetch()` /
`web_google()` / `fs_read_file()` / `fs_write_file()` / `fs_list_dir()`
/ `fs_glob()` / `fs_grep()` / `fs_check_path()` / `get_current_dir()`
/ `get_sandbox_path()` / `make_rag_tools()` / `RagTools` factories
are removed. Switch to `easyai::tools::web(google_enabled)`,
`easyai::tools::fs(root)`, and `easyai::tools::make_rag_tool(root)`.
* **Why.** Three matching surfaces with the same shape make the
catalogue smaller (one entry per capability instead of nine), tool
prose can use one consolidated description style across all three,
and the model reasons about each capability as ONE thing with sub-
actions. The flat-schema-with-runtime-validation choice is the
same one the unified rag tool already validated against weak /
1-bit-quant tool callers.
### 2026-05-08 — Server observability + connection-pool fix + prompt cleanup
Driven by a real production failure: an agentic session hung mid-stream,
the cli retried six times, and we had no visibility into what the
TCP stack was doing on the server. Fixes landed across the cli's
HTTP transport, the server's verbose logging, the system prompts,
and the build.
* **Cli keep-alive bug fixed (the actual root cause).**
`stream_chat()` / `simple_get()` / `simple_post()` were each
constructing a fresh `httplib::Client` per call. The Client's
TCP socket dropped at function end, so `set_keep_alive(true)` had
nothing to keep alive — every agentic hop opened a new connection.
An N-tool-call session piled up N sockets in `TIME_WAIT`,
eventually exhausting the client's ephemeral port range or
per-process fd ceiling. **Hoisted a single persistent `httplib::Client`
onto the `Impl` struct; all three call sites now reuse it.** ONE
TCP connection per session instead of N. Cancellation and
server-restart paths are preserved (cpp-httplib reconnects
internally on dead-socket errors).
* **Server: HTTP-level `→` / `←` log per request (verbose mode).**
`set_pre_routing_handler` + `set_logger` emit arrival and
completion lines with method/path/peer/body size, status,
duration, response bytes (or `streamed` for SSE), and running
totals (req / err / tools / in_flight / bytes_in / bytes_out).
* **Server: periodic `METRICS` line with TCP state breakdown.**
Background ticker every `metrics_interval` seconds
(`--metrics-interval N` or `[SERVER] metrics_interval` to tune,
`0` disables — **default raised to 300 / always-on as of
2026-05-09**, see entry above) emits one
line with: CPU% + iowait%, load 1/5/15, process RSS + peak,
system memory total/used/%, AMD GTT used/total/% (Linux + AMD
only), in-flight requests, cumulative requests / errors / bytes,
fd usage vs RLIMIT_NOFILE, AND an explicit TCP state breakdown
(ESTABLISHED / TIME_WAIT / CLOSE_WAIT / FIN_WAIT / LISTEN)
parsed from `/proc/net/tcp{,6}` with
`TIME_WAIT N/M ephemeral ports (X.X% [elevated|HIGH|CRITICAL])`
so socket exhaustion shows up in `journalctl` long before
connections start failing. Linux-only for the deep metrics;
macOS prints `n/a` and the server runs fine — easyai-server's
deploy target is Linux.
* **Tool dispatch timing in every visible log.** Engine wraps
`tool->handler()` with `steady_clock` and writes `duration_ms`
into `ToolResult`. CLI shows `🔧 web_search (412ms)({"query":...})`
and the webui's reasoning panel shows the same. The
`easyai.tool_result` SSE event also gains a `duration_ms` field
so future external SSE consumers can render their own timing UI.
* **`allow_fs = off` in the INI is now honoured.** The server read
the flag but never propagated it to the toolbelt — a non-empty
`[SERVER] sandbox` re-enabled `fs_*` regardless. Default install
ships `allow_fs = off` + `sandbox = /var/lib/easyai/workspace`,
which hit exactly this. Now `allow_fs` and `allow_bash` are
honoured independently of `sandbox`. **Behaviour change:**
`--sandbox /foo` alone NO LONGER implies `--allow-fs`; pass
`--allow-fs` explicitly to register fs_*.
* **Built-in system prompt is tool-aware.** The hardcoded prompt
used to list `fs_*` / `bash` / `plan` / host-metric tools by name
whether or not they were registered. Models hallucinated calls to
unregistered tools (especially `bash` after the `allow_fs` fix
above). The `Tool notes:` section is now built dynamically:
each bullet is gated on the same flag that controls registration,
and the entries for tools the server NEVER registers (`plan`,
host metrics) are removed entirely. Same fix in
easyai-local's built-in prompt.
* **RAG tool descriptions spell out "model-only store".** Added a
`PRIVATE — MODEL-ONLY STORE` paragraph to `rag_save` / `rag_append`
/ the unified `rag` dispatcher, telling the model that the user
has no UI / command / API to read what's saved there. Forbids
`"check the rag for the code"` / `"I saved it to memory"` answers
and tells the model to `rag_load` and put the body inline when
the user asks for stored content.
* **Stay-in-scope replaces "PROTOTYPE FIRST".** The old 1./2./3.
ritual ("build → verify → ASK which next step") was making the
agent stop after step 1 and ask, even when the user wanted the
simplest end-to-end thing. Collapsed to a single
`## Stay strictly in scope` paragraph that keeps the no-extras /
no-defensive-scaffolding / no-while-I'm-at-it-cleanups specifics
and drops the build-then-ask dance. Updated everywhere the
wording lived: server.cpp built-in prompt, local.cpp built-in
prompt, cli.cpp [guidance] block, installer's
`/etc/easyai/system.txt` template.
* **Installer GTT default 28 → 29 GiB.** `gtt_gb=29` in
`scripts/install_easyai_server.sh`. Matches `ttm.pages_limit=7602176`.
Leaves headroom for a Q5_K_M / MXFP4_MOE 30B MoE plus a 32k KV
cache fully on the iGPU.
* **Quick-start editor section added to `LINUX_SERVER.md`.** New
section 0 with copy-paste shell snippets for VSCode + Continue.dev,
OpenCode, and VSCode + Cline, all pointing at `http://ai.local:80/v1`.
Plus a quick-reference table for other OpenAI-compatible clients.
* **No patches or derivatives of llama.cpp.** A short-lived
experiment subclassed `httplib::Server` to log per-TCP-connection
accept/close events — that needed widening the access on a
private virtual in the vendored cpp-httplib header. Backed out
entirely: no CMake patch script, no `#define private protected`
trick, no derivative copies. The HTTP `→`/`←` lines and the
periodic METRICS line (with system-wide TCP state breakdown
including TIME_WAIT pressure) cover the same diagnostic ground
using only public APIs and `/proc`.
### 2026-05-08 — `tool_lookup` builtin + tool-discipline rule
Builds on the same-day "Built-in system prompt is tool-aware" work
above with a complementary affordance: the model gets a runtime
introspection tool so it can verify what's wired up before
dispatching, and an authoritative discipline rule that points at
that tool. Driven by the same failure mode the prompt-cleanup
addressed (`write` / `read` / `ls` etc. invented by the model);
this layer makes the closure explicit and gives the model a
recovery path when it's uncertain.
* **New `tool_lookup` builtin.** Read-only introspection over the
agent's live tool registry. Call it with no args to get a numbered
catalogue of every registered tool (1..N), or pass
`name="<substring>"` to filter — case-insensitive, partial match.
Output is plain numbered text the model parses naturally; only
active tools are returned. Wired into every binary
(`easyai-cli`, `easyai-server`, `easyai-mcp-server`, `easyai-local`,
`easyai-agent`, `easyai-recipes`) and the `LocalBackend` library
wrapper. Always registered last so its snapshot covers every
other tool, including itself. Public C++ API:
`easyai::tools::tool_lookup(getter)` where `getter` is a callable
returning `std::vector<std::pair<std::string,std::string>>` of
(name, description) pairs.
* **Authoritative `[tools]` / "Tool discipline" prompt block.**
Layered on top of the closed-set rule from the prompt-cleanup
commit: *"This catalogue is the SINGLE SOURCE OF TRUTH; training
data is NOT; if a name isn't in this list IT DOES NOT EXIST;
call `tool_lookup` first when uncertain; do not retry an
unknown-tool call."* Common hallucinated names called out by
example: `write`, `read`, `ls`, `cat`, `curl`, `python`, `sed`,
`grep`, `find`, `mkdir`. Same wording in `easyai-cli` (the
`[tools]` block injected into the dynamic prefix), `easyai-server`
and `easyai-local` (the `## Tool discipline` section in their
`kBuiltinSystem` strings).
### 2026-05-08 — Fifth-pass security hardening (no behaviour change)
A fresh static review of the ~5,000 lines that landed in the last 30
commits. Two HIGH, three MEDIUM, two LOW findings — all closed in
this commit; every public interface (CLI flags, tool names, library
headers, INI keys) is unchanged.
* **bash live-mirror is now control-byte stripped and byte-capped.**
When the model calls `bash`, the merged stdout/stderr was being
mirrored verbatim to the operator's terminal. A model could emit
`\e]0;HACKED\a` to retitle the operator's window or `\e[2J` to wipe
the screen — neither showed up in the model-facing tool result.
Now: ESC is rendered as a visible `^[`, all other C0 controls are
dropped, and the mirror channel is capped at 128 KiB (model still
gets the full 32 KiB it always did). Set `[cli] show_bash = false`
or `--no-show-bash` to silence the mirror entirely.
* **`plan` tool render strips control bytes from item text.** Same
hijack class, narrower budget — a `plan add` with embedded `\e[…`
no longer reaches the operator's terminal raw.
* **`get_array` parser now caps stringified-array recursion depth.**
Tool-args parsing tolerates `"items": "[…]"` (the array escaped
into a JSON string — small models double-escape sometimes). The
unwrap path was recursive without a depth cap; a hostile model
emitting deeply-nested escapes blew the stack. Capped at depth 4
(legitimate cases stay under depth 2).
* **`get_sandbox_path` now uses `fs::weakly_canonical`.** Was using
`realpath()` with a "fall back to the unresolved input" branch
that could leak relative-path shape into the model on transient
errors. Cosmetic but correct; matches the canonicalisation the
sandbox containment check uses.
* **`--mcp <url>` rejects non-`http(s)://` schemes up front.** The
libcurl protocol filter still blocks `file://`, `gopher://` etc.
at transport time, but the operator now gets a clear error
instead of a curl diagnostic, and embedders using
`easyai::mcp::fetch_remote_tools` get the same defence-in-depth.
* **Installer validates numeric sampling/timeout flags.**
`--temperature`, `--top-p`, `--top-k`, `--min-p`,
`--repeat-penalty`, `--max-tokens`, `--http-timeout`, `--ctx-size`
must match `^-?[0-9]+(\.[0-9]+)?$` before they flow into the INI
via heredoc. Closes a defence-in-depth gap where a crafted value
containing `\n` could inject extra INI keys.
* **`/etc/easyai/easyai.ini.bak` (created by `--force`) gets
explicit `chmod 640` and `chown root:easyai`.** Previously
inherited whatever the live INI had; matches the new file's
posture so a token leak via a backup with looser perms is
impossible.
Full write-up: [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) §0 (operator
TL;DR) and §20 (this pass's findings). Read §0 if you operate easyai
in production — it's the 60-second summary of what easyai does and
doesn't protect for you.
### 2026-05-05 — Tool surface + system prompt overhaul
Driven by a production "models drift, use bash for file work, ignore
tools" report. The fix landed across the tool descriptions, the
default prompts, and the CLI flag wiring at once.
* **`--sandbox` and `--allow-bash` now imply `fs_*`.** The previous
matrix had operators passing `--allow-bash --sandbox DIR` and ending
up with bash but no file tools — so the model fell back to
`cat > file` / `cat <<EOF` / `sed -i` for everything. Bash is
strictly more permissive than `fs_*`, so requiring an extra flag
was inverted. Both flags now register the full file set (and the
new `get_sandbox_path` companion) at once. `--allow-fs` still works
for the no-sandbox / no-bash case; otherwise it's redundant.
* **New `get_sandbox_path` tool.** Returns the absolute path of the
sandbox root, pinned at registration time — distinct from
`get_current_dir` which is the live process cwd and can drift.
Lets the model resolve where its work actually lands without a
wasted `pwd` tool hop.
* **`bash` description rewritten.** Now leads with **PREFER fs tools**
and lists the exact bash anti-patterns (`cat > file`, `cat <<EOF`,
`echo > file`, `mkdir`, `sed -i`) with the dedicated tool that
replaces each. Reserves bash for shell features the dedicated
tools don't have — pipelines, `find | xargs`, build runners
(make / cmake / cargo / npm), git, package managers, sed/awk for
in-place edits.
* **System prompts inject `[environment]` + `[guidance]`.** When
any create/mutate affordance is registered (fs_* / bash / plan),
the cli prepends two short blocks to the user's `--system` content:
the absolute sandbox path (saves a "where am I" tool hop on turn 1)
and a stay-in-scope behavioral rule (build EXACTLY what the user
asked — no extras, no defensive scaffolding, no "while I'm at it"
cleanups). The same guidance lives in the server's Deep persona
and easyai-local's built-in prompt.
* **Default sampling preset → `precise`** (was `balanced`).
Temp 0.2, top_p 0.95, top_k 40, min_p 0.10. Tuned for code,
math, and factual Q&A — the dominant use case for a tool-calling
agent. Flipped across server, local, cli, webui, library
fallbacks, and the systemd installer's INI templates. README's
preset table now includes a Behaviour column and a "Pick when…"
column to make the choice explicit.
* **`--show-system-prompt`** added to all four binaries
(`easyai-cli`, `easyai-server`, `easyai-local`, `easyai-chat`).
Resolves the system prompt the binary would actually use (built-in
default → `--system-file` → `--system`, plus the cli's injected
blocks), prints, exits. No model load, no port bind, no network.
Useful for confirming the persona before bouncing a service.
* **Graceful `Ctrl-C` in `easyai-cli`.** In interactive mode (no
`--quiet`), the first `Ctrl-C` mid-turn prints
`<exiting: waiting for the ai session to be finished. Ctrl-C
again to force.>` and lets the in-flight chat finish naturally
(rc=0). Conversation isn't truncated mid-stream. Second `Ctrl-C`
is the hard-cancel escape hatch (rc=130). `--quiet` keeps the
existing immediate-cancel for batch scripts.
* **Plan tool tolerance shims.** `args::get_array` now accepts a
stringified JSON array (`"items": "[...]"`) — small/quantised
models repeatedly emit this shape. The handler infers a missing
`action` from the items' fields plus current plan state, and
maps common synonyms (`create` → `add`, `remove` → `delete`,
etc.). `add` honours an optional per-item `status` so create +
mark "working" lands in one call. Errors include the correct
shape inline so the model can copy-fix.
* **Plan re-renders coalesce.** A new `Plan::Batch` RAII guard
collapses N per-item `on_change` callbacks across one tool call
into a single fire — the UI's "── plan ──" block now prints once
per batch, not once per item.
* **New doc: [`easyai-cli.md`](easyai-cli.md)** mirrors
`easyai-server.md`. 14 sections covering connection, modes, full
flag reference, tool registration, system prompt + injection,
sampling, reasoning streams, the raw transaction log, RAG,
external tools, management subcommands, worked examples,
cross-references.
* **Tool authoring guide.** New `design.md §5 Writing tool
descriptions reliably` (architectural) and `manual.md §3.2.1`
(cookbook) document the rag-style multi-action pattern, the
per-`.param()` "Used by add / update / …" idiom, and the
lenient-handler tolerance shims. `AI_TOOLS.md` Chapter 9 has a
pointer.
### 2026-05-04 — Single-tool RAG is now the default; concise system prompt
* **Default RAG layout flipped: one `rag(action=...)` tool.** The
unified single-tool dispatcher used to be opt-in behind
`--experimental-rag`; it is now the default for every binary
(`easyai-server`, `easyai-cli`, `easyai-local`, `easyai-mcp-server`).
One catalog entry instead of seven keeps the model's tool list
short and saves a few hundred tokens per turn. On-disk format,
locking, and fix-memory rules are unchanged.
* **`--split-rag` opts back into the legacy seven `rag_*` tools.**
Replaces `--experimental-rag`. Same semantics, opposite default.
Wired as a CLI flag on every binary AND as `[SERVER] split_rag`
in the INI overlay (`easyai.ini` / `easyai-mcp.ini`). Useful for
weak / 1-bit-quant tool callers (Bonsai-class) that handle many
flat schemas more reliably than one discriminated schema.
* **Default system prompts trimmed.** `easyai-server` and
`easyai-local` now ship a much shorter built-in prompt focused on
a tight **plan → act → iterate** loop with one small concrete
next step at a time, finishing as soon as the answer is useful so
the user has room to refine. Cuts about three quarters of the old
prompt's length while keeping the no-announce-without-call rule
and the search → fetch discipline.
### 2026-05-02 (later) — RAG `rag_append` + user-focus prompts
* **`rag_append` — new RAG tool.** Adds new content to the end of
an existing memory without losing the previous body. Read-modify-
write under one `unique_lock` on the store's `shared_mutex`, so
concurrent appenders queue cleanly (no lost appendix, no torn
merge for any reader); on disk the new content is separated from
the old by a Markdown horizontal rule (`---`) so the operator
reading the `.md` file sees exactly where each appendix starts.
Refuses on titles that don't exist (use `rag_save`), on fixed
memories (`fix-easyai-*`), and when the merged size would exceed
256 KiB. Optional `keywords[]` parameter merges into the existing
keyword list (deduped, capped at 8). Wired into every consumer
(server, MCP server, CLI, local backend) and the experimental
single-tool dispatcher (`rag(action="append", ...)`). Full doc:
[`RAG.md`](RAG.md) §4.
* **User-focus prompt update.** `rag_save` and `rag_append` tool
descriptions now explicitly tell the model to prioritise notes
about the user themselves — name, role, hardware, projects,
working style, corrections, likes, dislikes — and to grow that
memory across sessions with `rag_append` instead of rewriting it
with `rag_save`. The next conversation (tomorrow, three months
from now) starts with the user already known, so they don't have
to explain themselves twice. The lib went from 5/6 RAG tools to
the canonical seven (`rag_save`, `rag_append`, `rag_search`,
`rag_load`, `rag_list`, `rag_delete`, `rag_keywords`); all CLI
help text, help comments, and docs updated to match.
### 2026-05-02 — Fourth-pass security audit + readability batch
* **`/tmp` log file hardened (security, MEDIUM).** The auto-generated
raw transaction log at `/tmp/easyai-<pid>-<epoch>.log` is now
created with `O_EXCL | O_NOFOLLOW | O_CLOEXEC` and mode `0600`. The
predictable path used to follow symlinks on `fopen("w")`, so a
local attacker on a multi-tenant host could plant a symlink
pointing at any user-writable file (`~/.bashrc`, `~/.ssh/…`) and
have the next `easyai-*` process truncate-and-overwrite it.
Mode `0644` (process umask) also leaked prompts — which can
contain API keys or PII — to other accounts on the same box.
`O_EXCL` makes the create atomic-or-fail and `0600` keeps logs
private. Caller-supplied paths (`--log-file PATH`) keep `O_TRUNC`
for log rotation but still gain `O_NOFOLLOW + 0600`. Full
write-up in [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) §19.
* **Internal readability batch (no public API change).** Three
inline patterns were lifted into named helpers so the call sites
read top-to-bottom: `file_mtime_unix()` (replaces three copies of
the C++17 file_clock→system_clock idiom in `rag_tools.cpp`),
`glob_to_regex()` + `kGlobRegexMetachars` (lifts the wildcard
state machine out of `fs_glob` in `builtin_tools.cpp`), and
`looks_like_announce_phrase()` (lifts the 30-line retry predicate
out of `Engine::chat_continue` in `engine.cpp`, where it was
used twice). All seven binaries build clean.
### 2026-05-01 — MCP CLIENT, RAG memory framing, web_google, macOS installer fix
* **`easyai-server` is now also an MCP client.** Pass `--mcp <url>`
(and `--mcp-token <token>` if needed) and at startup the server
connects to the upstream's `/mcp`, runs `tools/list`, and merges
the catalogue into its own. Each remote tool's handler proxies
`tools/call` over HTTP. Local tool names win on collision. The
implementation is `easyai::mcp::fetch_remote_tools()` in libeasyai
— public API, so anything built on the engine library can stack
remote MCP catalogues. See [`MCP.md`](MCP.md) §9.5.
* **`--no-tools` renamed to `--no-local-tools` (server only).** Now
that the server can be both an MCP server AND an MCP client, the
flag's scope had to be unambiguous: it disables only the LOCAL
built-in toolbelt. RAG, external tools, and tools fetched via
`--mcp` are unaffected. INI key `load_tools` → `local_tools` to
match. The `easyai-local` and `easyai-mcp-server` binaries keep
their `--no-tools` spelling — they have no MCP client, so the
original name is still accurate.
* **RAG reframed as memory + fixed memories.** Tool descriptions
rewritten in memory verbs (search / store / recall / update /
forget). New `fix=true` argument on `rag_save` mints an immutable
memory: title is auto-prefixed with `fix-easyai-`, and from then
on `rag_save` refuses to overwrite it and `rag_delete` refuses to
remove it. Use this to seed system designs, hard rules, ground-
truth definitions the model must not rewrite. Search / load /
list output gain a human-readable `modified` date and a `[FIXED]`
/ `fixed: yes/no` marker. See [`RAG.md`](RAG.md).
* **Single-tool RAG dispatcher is the default.** One
`rag(action=...)` tool exposes save / append / search / load /
list / delete / keywords as sub-actions. Same store, same
handlers, same on-disk format. Saves a few hundred catalog tokens
per turn and keeps the model's tool list short. Pass `--split-rag`
(or `[SERVER] split_rag = on` in the INI) to opt back into the
legacy seven separate `rag_*` tools — useful for weak / 1-bit-
quant tool callers (Bonsai-class) that handle many flat schemas
more reliably than one discriminated schema.
* **`web_google` builtin.** Google Custom Search JSON API. Gated by
`--use-google` (also `[SERVER] use_google`). Reads
`GOOGLE_API_KEY` and `GOOGLE_CSE_ID` from env at call time so a
rotation doesn't drop the tool. Free tier is 100 queries/day.
* **macOS installer fix: OpenSSL via brew.** Modern macOS no longer
ships usable libssl in `/usr/lib`, so `find_package(OpenSSL)`
half-detected and broke configure for both `easyai_cli` and the
vendored `cpp-httplib`. The installer + `build_macos.sh` now pass
`-DOPENSSL_ROOT_DIR=$(brew --prefix openssl@3)` and the cmake
guards `TARGET OpenSSL::SSL` so a half-detected OpenSSL degrades
to "HTTPS not in this build" instead of erroring out.
### 2026-04-30 — `easyai-mcp-server` (standalone MCP provider)
* **New binary `easyai-mcp-server`.** Same tool catalogue as
`easyai-server` (built-ins + RAG + operator-defined external-tools)
exposed over `POST /mcp` with **no GGUF model loaded** — designed
for high-concurrency multi-client deployments. Configurable
cpp-httplib worker pool (`--threads`, default 256) and a separate
in-flight `tools/call` cap (`--max-concurrent-calls`, default 256)
that returns 503 + `Retry-After` on saturation instead of unbounded
queueing. Full doc: [`easyai-mcp-server.md`](easyai-mcp-server.md).
* **RAG concurrency upgrade.** `RagStore::mu` is now
`std::shared_mutex`; `rag_search` / `rag_load` / `rag_list` /
`rag_keywords` take `std::shared_lock` so parallel readers don't
serialise on the write path. Benefits every consumer of libeasyai
— `easyai-server`, `easyai-cli` with `--RAG`, any third-party
program calling `make_rag_tools()`. Atomic-rename writes already
made on-disk reads tear-free; the lock relaxation is safe.
* **Doc restructure.** `INI_KFlags.md` content has moved to the top
of the new [`easyai-server.md`](easyai-server.md) so the chat
server's INI / CLI / API / persona / hardening reference lives in
one file. `LINUX_SERVER.md` is unchanged — it remains the
systemd-installer-specific operator's guide.
### 2026-04-30 — Tunable incomplete-retry budget + live retry visibility
* **`--max-incomplete-retries N` (also `[ENGINE] max_incomplete_retries`).**
Default 10 — how many times the engine discards + nudges + retries
when the model finishes a turn announcing an action ("Let me…",
"I'll…") without actually emitting the tool_call. Bump to 15-20
for weak / 1-bit-quant models (Bonsai-8B-Q1_0 frequently needs
the extra budget); set to 0 to disable retries entirely.
* **Retries now visible in the Thinking panel.** Engine fires a new
`on_incomplete_retry(attempt, max, reason)` callback per retry,
the server pipes it into the SSE `reasoning_content` channel, and
the webui renders `↻ Retry 3/10: model said: "Let me search…" (no
tool_call) — nudging.` while it happens. No more frozen UI for 10
silent retries followed by a blank bubble.
* **Engine warnings always log** (regardless of `--verbose`):
cancellation, thought-only retry, reasoning→content fallback,
incomplete-retry, empty final content. `--verbose` is for raw
per-token / per-hop diagnostic noise; actionable warnings stay on
so operators see them in `journalctl` without flipping a flag.
### 2026-04-30 — Bonsai 8B Q1_0 onboarding + security pass
* **One-shot installers for macOS and Raspberry Pi 4/5.**
`scripts/install_easyai_macos.sh` builds with Metal/AMX, drops the
model, prints the run command. `scripts/install_easyai_pi.sh` does
the full Pi appliance: systemd unit, mDNS so the box answers as
**`pi-ai.local`** on your LAN, port 80 with
`CAP_NET_BIND_SERVICE`. Both clone the **PrismML fork** of
llama.cpp (the only one with the Q1_0 kernel — upstream loads the
GGUF then fails at decode).
* **Security third-pass audit** — 3 HIGH and 7 MEDIUM findings fixed.
The INI overlay used to be silently ignored (every `[ENGINE]` /
`[SERVER]` key was a no-op); `--no-mcp-auth` was disconnected from
the gate; the sandbox could be escaped by a symlink planted via
`bash`. All closed. The `bash` tool now gets the same
fork-hardening as external tools — `PR_SET_PDEATHSIG`, fd
close-loop bounded against `RLIMIT_NOFILE = unlimited`, process-
group kill on timeout. Plus JSON-depth caps on every parser, a
bounded INI parser, mode 0600 on RAG entries, and a
body-size-bounded auth header. See [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) §18.
* **MCP server.** `easyai-server` is now a Model Context Protocol
provider on `POST /mcp` (protocol 2024-11-05). Claude Desktop,
Cursor, Continue list and dispatch every registered tool — your
built-ins, your RAG, your `--external-tools` manifests — over a
single endpoint. Bearer auth via `[MCP_USER]` in the INI; a
Python stdio bridge ships at `scripts/mcp-stdio-bridge.py` for
Claude Desktop. See [`MCP.md`](MCP.md).
* **Single INI config — `/etc/easyai/easyai.ini`.** Every CLI flag
has an INI key (FlagDef table refactor); precedence is CLI > INI
> hardcoded default. Edit the file, `systemctl restart`, done.
Full reference in [`easyai-server.md`](easyai-server.md) §1.
* **RAG: persistent memory.** Seven tools (`rag_save`, `rag_append`,
`rag_search`, `rag_load`, `rag_list`, `rag_delete`, `rag_keywords`).
Multi-keyword search (first keyword required, rest rank by overlap)
+ pagination. One Markdown file per entry — operator-readable,
hand-editable. See [`RAG.md`](RAG.md).
### 2026-04-29 — External tools v2
* **Operator-defined tool packs** via `EASYAI-<name>.tools` JSON
manifests dropped in `/etc/easyai/external-tools/`. Per-file
fault isolation, sanity warnings (shell-wrapper detection,
world-writable binaries, `LD_*` env passthrough), full
`fork`+`execve` hardening — never a shell. Give the model
focused powers without flipping `--allow-bash`. See
[`EXTERNAL_TOOLS.md`](EXTERNAL_TOOLS.md).
* **`get_current_dir` builtin** — the model can ask where it is,
so relative paths in `bash` / `fs_*` calls land where you expect.
* **Cancel-on-disconnect on the server** — closing the browser
tab actually stops the decode loop. No more zombie generation
eating tokens after the user walked away.
* **Tolerant tool output** — non-UTF-8 bytes in tool results no
longer abort the SSE stream; the bytes get a U+FFFD substitute
and the stream stays alive.
---
## All options at a glance
Every CLI flag, INI key, and library setter the project ships
today, in tables. Skim once to learn the surface; come back when
you want to tune something specific. Deeper reference is linked
per row.
This repo builds seven binaries. Two are production daemons
(`easyai-server`, `easyai-mcp-server`), two are user CLIs
(`easyai-cli`, `easyai-local`), three are example apps the lib
ships to demonstrate the API (`easyai-chat`, `easyai-agent`,
`easyai-recipes`).
### `easyai-server` — chat HTTP server (also speaks MCP)
Full reference: [`easyai-server.md`](easyai-server.md).
INI defaults under `/etc/easyai/easyai.ini` — every flag below
has a matching INI key (see [`easyai-server.md`](easyai-server.md) §1).
| Flag | Default | What it does |
|---|---|---|
| `-m, --model PATH` | (required) | GGUF model file. |
| `--config PATH` | `/etc/easyai/easyai.ini` | Central INI; CLI > INI > hardcoded. |
| `--host ADDR` | `127.0.0.1` | Bind address (`0.0.0.0` = any iface). |
| `--port N` | `8080` | TCP port. |
| `--max-body N` | 8 MiB | Cap on request body. |
| `-s, --system-file PATH` | — | Default system prompt, from file. |
| `--system TEXT` | — | Default system prompt, inline. |
| `--no-local-tools` | off | Don't expose the local built-in toolbelt. |
| `--mcp URL` | — | Connect upstream MCP server as client; merge catalogue. |
| `--mcp-token TOK` | — | Bearer for `--mcp`. |
| `--no-mcp-auth` | off | Force `/mcp` open even with `[MCP_USER]` populated. |
| `--http-retries N` | 5 | Extra attempts on transient HTTP failures (MCP client + web tools). 0 disables. Logged on stderr. |
| `--http-timeout SECONDS` | 600 | Read/write timeout for the listen socket AND the MCP-client connection. Bumped from llama-server's 60 s default to accommodate long thinking turns. |
| `--sandbox DIR` | server cwd | Root for `fs` / `bash` / `python3` / external `$SANDBOX`. |
| `--allow-fs` | off | Register the unified `fs` tool (action=read / write / list / glob / grep / check_path / cwd / sandbox). |
| `--allow-bash` | off | Register `bash` (NOT a hardened sandbox). |
| `--no-python` | python3 on | Drop the `python3` tool. By default it's auto-registered alongside `fs` whenever `--sandbox` is set or `--allow-bash` is on. Stdlib-only interpreter; disk access auto-restricted to the sandbox root. |
| `--use-google` | off | Enable engine=`"google"` inside the unified `web` tool (needs `GOOGLE_API_KEY` + `GOOGLE_CSE_ID`). |
| `--external-tools DIR` | — | Load every `EASYAI-*.tools` manifest in `DIR`. |
| `--memory DIR` | — | Enable persistent memory: registers one `memory(action=…)` tool with sub-actions save / append / search / load / list / delete / keywords — a passive RAG technique. `--RAG` is still accepted as a back-compat alias. |
| `--preset NAME` | `precise` | Ambient sampling preset. See [Sampling presets](#sampling-presets) for what each implies. |
| `--temperature F` | per preset | Override temperature (0.0–2.0). |
| `--top-p F` | per preset | Nucleus sampling p. |
| `--top-k N` | per preset | Top-k cutoff. |
| `--min-p F` | per preset | Min-p threshold. |
| `--repeat-penalty F` | 1.15 | Repetition penalty (multiplicative on recent logits) — anti-loop safety net for thinking models that lock into rephrasing their own intent. `--repeat-penalty 1.0` disables. |
| `--presence-penalty F` | 0.0 | Presence penalty (additive, fixed cost per token-already-seen, OpenAI semantics, `[-2.0, 2.0]`). Discourages topic stickiness without penalising literal tool-name repetition; pairs well with `--repeat-penalty 1.0` on long agentic flows. See [`design.md` §4b](design.md#4b-sampling-and-the-penalty-stack). |
| `--max-tokens N` | unlimited | Cap tokens per request. |
| `--seed U32` | random | RNG seed (0 = random). |
| `--max-incomplete-retries N` | 10 | Retry budget for "announce-only" turns; 0 disables. |
| `-c, --ctx N` | 8192 | Context size. |
| `--batch N` | = ctx | Logical batch size. |
| `--ngl N` | -1 (auto) | GPU layers (0 = CPU only). |
| `-t, --threads N` | hw cores | CPU threads. |
| `-ctk, --cache-type-k TYPE` | `f16` | K-cache dtype (`f32`,`f16`,`bf16`,`q8_0`,`q4_0`,`q4_1`,`q5_0`,`q5_1`,`iq4_nl`). |
| `-ctv, --cache-type-v TYPE` | `f16` | V-cache dtype (same set). |
| `-nkvo, --no-kv-offload` | off | Keep KV cache on CPU even with GPU layers. |
| `--kv-unified` | off | Single unified KV buffer across sequences. |
| `--override-kv K=T:V` | — | GGUF metadata override (`int`,`float`,`bool`,`str`); repeatable. |
| `-a, --alias NAME` | `easyai` | Public model id reported by `/v1/models`. |
| `--api-key KEY` | — | Require Bearer auth on every `/v1` route. |
| `-fa, --flash-attn` | auto | Force flash attention on. |
| `-tb, --threads-batch N` | = threads | Threads for prompt-eval batches. |
| `-np, --parallel N` | 1 | Compat-only; warns when >1. |
| `--mlock` | off | mlock model weights into RAM. |
| `--no-mmap` | off | Disable mmap (read GGUF into RAM). |
| `--numa STRATEGY` | off | `distribute`,`isolate`,`numactl`,`mirror`. |
| `--metrics` | off | Expose Prometheus `/metrics`. |
| `--reasoning on\|off` | on | Enable model thinking. |
| `--no-think` | off | Strip `<think>…</think>` from replies. |
| `--inject-datetime on\|off` | on | Append authoritative date/time to system prompt. |
| `--knowledge-cutoff YYYY-MM` | `2024-10` | Cutoff hint used by `--inject-datetime`. |
| `-v, --verbose` | off | Engine logs raw model output + parser actions. |
| `--webui MODE` | `modern` | `modern` (embedded SvelteKit) or `minimal` (inline). |
| `--webui-title TEXT` | `Box EasyAI` | Browser tab + sidebar brand. |
| `--webui-icon PATH` | — | Favicon (`.ico`,`.png`,`.svg`,`.gif`,`.jpg`,`.webp`). |
| `--webui-placeholder S` | `Type a message…` | Input box placeholder. |
### `easyai-mcp-server` — standalone MCP provider (no model)
Same tool catalogue as `easyai-server` but no GGUF loaded —
designed for high-concurrency multi-client deployments. Full
reference: [`easyai-mcp-server.md`](easyai-mcp-server.md).
| Flag | Default | What it does |
|---|---|---|
| `--config PATH` | `/etc/easyai/easyai-mcp.ini` | Central INI. |
| `--host ADDR` | `127.0.0.1` | Bind address. |
| `--port N` | `8089` | TCP port. |
| `-n, --name ID` | `easyai-mcp` | Server identity on `/health` + MCP `initialize`. |
| `--max-body N` | 1 MiB | Cap on request body. |
| `-t, --threads N` | 256 | cpp-httplib worker pool. |
| `--max-concurrent-calls N` | 256 | In-flight `tools/call` cap (503 on saturation). |
| `--sandbox DIR` | cwd | Root for `fs_*` / `bash` / `$SANDBOX`. |
| `--allow-fs` | off | Register `fs_*` tools. |
| `--allow-bash` | off | Register `bash`. |
| `--no-tools` | off | Skip the built-in toolbelt entirely. |
| `--external-tools DIR` | — | Load `EASYAI-*.tools` manifests. |
| `--memory DIR` | — | Enable the unified `memory` tool (alias `--RAG`). |
| `--api-key TOK` | — | Bearer required for `/health`, `/metrics`, `/v1/tools`. |
| `--no-mcp-auth` | off | Force `/mcp` open. |
| `--metrics` | off | Enable Prometheus `/metrics`. |
| `-v, --verbose` | off | Log every dispatch to stderr. |
### `easyai-cli` — interactive remote CLI
Talks to any OpenAI-compatible endpoint (our `easyai-server`,
upstream `llama-server`, OpenAI itself, etc.).
| Flag | Default | What it does |
|---|---|---|
| `--url URL` | `$EASYAI_URL` | OpenAI-compat endpoint. |
| `--api-key KEY` | `$EASYAI_API_KEY` | Bearer auth. |
| `--model NAME` | `$EASYAI_MODEL` | Request body `model` field. |
| `--timeout SECONDS` | 86400 (24h) | Read+write timeout — sized for multi-hour agentic sessions. Only fires on TRUE silence (every SSE delta resets it). `EASYAI_TIMEOUT` env also accepted. |
| `--http-retries N` | 5 | Extra attempts on transient HTTP failures (connect refused, read timeout, 5xx). 0 disables. Logged on stderr without `--verbose`. `EASYAI_HTTP_RETRIES` env also accepted. |
| `--insecure-tls` | off | Skip peer cert check (DEV ONLY). |
| `--ca-cert PATH` | system | Custom CA bundle (PEM). |
| `--system TEXT` | — | Inline system prompt. |
| `--system-file PATH` | — | System prompt from file. |
| `--temperature F` | server | Sampling temperature. |
| `--top-p F` | server | Nucleus top-p. |
| `--top-k N` | server | Top-k cutoff. |
| `--min-p F` | server | min-p (llama-server / easyai). |
| `--repeat-penalty F` | 1.15 | Repetition penalty — anti-loop default; pass 1.0 to disable. |
| `--frequency-penalty F` | server | OpenAI standard \[-2.0, 2.0\]. |
| `--presence-penalty F` | server | OpenAI standard \[-2.0, 2.0\]. |
| `--seed N` | random | Deterministic sampling seed. |
| `--max-tokens N` | server | Cap reply length. |
| `--stop SEQ` | — | Add a stop string (repeatable). |
| `--extra-json '{…}'` | — | Free-form JSON merged into the request body. |
| `--tools LIST` | datetime,plan,web,system_* | Comma list of locally-registered tools. |
| `--sandbox DIR` | — | Enable the unified `fs` tool (action=read/write/list/glob/grep/check_path/cwd/sandbox) scoped to `DIR`. |
| `--allow-bash` | off | Register `bash` (uses `--sandbox` as cwd, else current dir). |
| `--no-python` | python3 on | Drop the auto-registered `python3` tool (default-on whenever `--sandbox` or `--allow-bash` is set). |
| `--use-google` | off | Enable engine=`"google"` inside the unified `web` tool. |
| `--external-tools DIR` | — | Load `EASYAI-*.tools` manifests. |
| `--memory DIR` | — | Enable persistent memory (one `memory(action=…)` tool; alias `--RAG`). |
| `--tools-mode MODE` | `split` | How `fs` / `web` / `memory` are exposed. Default `split` (since 2026-05-15): one focused tool per action — `fs_read`, `fs_edit`, …, `memory_save`, …, `web_search`, `web_fetch`. `unified` registers the legacy single dispatcher per family with `action=`. `both` registers both surfaces. INI: `[cli] tools_mode`. |
| `--no-plan` | off | Don't auto-register the planning tool. |
| `-p, --prompt TEXT` | (REPL) | One-shot prompt; without it you get a REPL. |
| `--no-reasoning` | shown | Hide `delta.reasoning_content`. |
| `--max-reasoning N` | 0 (off) | Abort SSE when accumulated reasoning > N chars. |
| `--no-retry-on-incomplete` | retry on | Disable auto-retry-with-nudge. |
| `--verbose` | off | Log HTTP+SSE traffic to stderr (stderr only — no file). |
| `-q, --quiet` | off | Disable spinner glyph + ctx-fill gauge. |
| `--log-file PATH` | off | Opt in to a raw transaction log at PATH (mode 0600). Implies `--verbose`. No `/tmp` file is created by default. |
| `--continue` | off | Load `.easyai_session` from cwd before the first prompt. Default OFF (since 2026-05-13): without this flag any existing session file is ignored and overwritten on the first turn. Session is always saved per turn regardless. INI: `[cli] auto_continue`. |
| `--no-continue` | — | Explicit form of the default — ignore any existing `.easyai_session` and overwrite on the first turn. Useful to override `[cli] auto_continue = on` set in INI. |
| `--compress` | off | Ask the model for a lossless recap, replace history with it, save. No-op without `--continue` (nothing in memory to recap). Also `/compress` mid-REPL. INI: `[cli] auto_compress`. |
| `--list-tools` | — | Print local tools (no chat). |
| `--list-remote-tools` | — | `GET /v1/tools` (no chat). |
| `--list-models` | — | `GET /v1/models`. |
| `--health` | — | `GET /health`. |
| `--props` | — | `GET /props`. |
| `--metrics` | — | `GET /metrics` (Prometheus text). |
| `--set-preset NAME` | — | `POST /v1/preset {preset:NAME}`. |
### `easyai-local` — local-engine REPL
Loads a GGUF model in-process (no server). For remote endpoints
use `easyai-cli`.
| Flag | Default | What it does |
|---|---|---|
| `-m, --model PATH` | (required) | GGUF file. |
| `-p, --prompt TEXT` | (REPL) | One-shot: run prompt, print, exit. |
| `-s, --system-file PATH` | — | System prompt from file. |
| `--system TEXT` | — | Inline system prompt. |
| `--preset NAME` | `precise` | Initial preset. See [Sampling presets](#sampling-presets). |
| `--no-think` | off | Strip `<think>…</think>` from output. |
| `-q, --quiet` | off | Disable spinner glyph + ctx-fill gauge. |
| `--temperature F` | per preset | Override temperature. |
| `--top-p F` | per preset | top-p. |
| `--top-k N` | per preset | top-k. |
| `--min-p F` | per preset | min-p. |
| `--repeat-penalty F` | 1.15 | Repetition penalty — anti-loop default; pass 1.0 to disable. |
| `--max-tokens N` | unlimited | Cap tokens per turn. |
| `--seed U32` | random | RNG seed. |
| `-c, --ctx N` | 4096 | Context size. |
| `--batch N` | = ctx | Logical batch size. |
| `--ngl N` | -1 (auto) | GPU layers. |
| `-t, --threads N` | hw cores | CPU threads. |
| `--no-tools` | off | Skip the built-in toolbelt. |
| `--sandbox DIR` | — | Enable the unified `fs` tool scoped to `DIR`. |
| `--allow-bash` | off | Register `bash`. |
| `--no-python` | python3 on | Drop the auto-registered `python3` tool. |
| `--external-tools DIR` | — | Load `EASYAI-*.tools` manifests. |
| `--memory DIR` | — | Enable persistent memory (alias `--RAG`). |
| `-ctk, --cache-type-k TYPE` | `f16` | K-cache dtype. |
| `-ctv, --cache-type-v TYPE` | `f16` | V-cache dtype. |
| `-nkvo, --no-kv-offload` | off | Keep KV cache on CPU. |
| `--kv-unified` | off | Single unified KV buffer. |
| `--override-kv K=T:V` | — | GGUF metadata override (repeatable). |
### Example apps (lib API demos)
Three small binaries under `examples/` show the lib API in
context. They take minimal flags — the real config happens in
the C++ source as fluent setter chains. Read these as the
canonical "how do I use the lib?" answer.
| Binary | Min flags | Purpose |
|---|---|---|
| `easyai-chat` | `-m PATH` OR `--url BASE`, `[--system TEXT]` | One-shot chat over Engine OR Client (auto-picks). |
| `easyai-agent` | `-m PATH`, `[-c CTX]`, `[-ngl N]` | Tiny agentic-loop demo with tool registration. |
| `easyai-recipes` | `-m PATH` | Five recipes (chat, persona, REPL, tools, agent loop). |
### Library API — `easyai::Agent`
The 30-second front door. Construct, optionally chain a few
fluent setters, call `ask()`. Header:
[`include/easyai/agent.hpp`](include/easyai/agent.hpp).
| Method | Type | Default | What it does |
|---|---|---|---|
| `Agent(model_path)` | ctor | — | Local model. |
| `Agent::remote(base_url, api_key="")` | static | — | Remote endpoint. |
| `.system(prompt)` | `string` | — | System prompt. |
| `.sandbox(dir)` | `string` | — | Enable `fs_*` scoped to `dir`. |
| `.allow_bash(on=true)` | `bool` | off | Register `bash`. |
| `.preset(name)` | `string` | `precise` | Sampling profile. |
| `.remote_model(id)` | `string` | — | Remote model id (remote mode only). |
| `.temperature(t) / .top_p(p) / .top_k(k) / .min_p(p)` | scalar | per preset | Sampling overrides. |
| `.on_token(cb)` | `function` | — | Streaming-token callback. |
| `.ask(text)` | call | — | One-shot turn; runs tool dispatch inline. |
| `.reset()` | call | — | Wipe history. |
| `.last_error()` | accessor | — | Diagnostic. |
| `.backend()` | accessor | — | Escape hatch to the underlying `Backend &`. |
### Library API — `easyai::Engine` (local llama.cpp)
Full local engine. Header:
[`include/easyai/engine.hpp`](include/easyai/engine.hpp).
| Method | Type | Default | What it does |
|---|---|---|---|
| `.model(gguf_path)` | `string` | — | GGUF file. |
| `.context(n) / .batch(n)` | `int` | 4096 / = ctx | KV / logical batch size. |
| `.gpu_layers(n)` | `int` | -1 (auto) | -1 = all, 0 = CPU only. |
| `.threads(n) / .threads_batch(n)` | `int` | hw / = threads | CPU threads. |
| `.seed(u32)` | `uint32_t` | random | RNG seed. |
| `.system(prompt)` | `string` | — | System prompt. |
| `.temperature(t) / .top_p(p) / .top_k(k) / .min_p(p)` | scalar | 0.7 / 0.95 / 40 / 0.05 | Sampling. |
| `.repeat_penalty(r)` | `float` | 1.15 | Repetition penalty (multiplicative on recent logits) — anti-loop default. Set to 1.0 to disable. |
| `.presence_penalty(p)` | `float` | 0.0 | Presence penalty (additive, fixed cost per token-already-seen, OpenAI semantics, range `[-2.0, 2.0]`). Pairs well with `repeat_penalty=1.0` on long agentic flows. See [`design.md` §4b](design.md#4b-sampling-and-the-penalty-stack). |
| `.max_tokens(n)` | `int` | -1 (until ctx) | Per-turn cap. |
| `.tool_choice_auto / .tool_choice_required / .tool_choice_none` | call | auto | Tool-choice mode. |
| `.parallel_tool_calls(on)` | `bool` | off | Allow parallel tool calls. |
| `.verbose(on)` | `bool` | off | Engine debug logs. |
| `.max_tool_hops(n)` | `int` | 8 | Agentic-loop cap (bumped to 99999 with `bash`). |
| `.retry_on_incomplete(on)` | `bool` | on | Auto-retry "announce-only" turns. |
| `.max_incomplete_retries(n)` | `int` | 10 | Retry budget; 0 disables. |
| `.stop_at_ctx_pct(pct)` | `int` | 100 | Hard ceiling on context fill; 0 disables. |
| `.cache_type_k(name) / .cache_type_v(name)` | `string` | `f16` | KV-cache dtype. |
| `.no_kv_offload(on) / .kv_unified(on)` | `bool` | off | KV placement / layout. |
| `.add_kv_override(spec)` | `string` | — | GGUF metadata override (repeatable). |
| `.flash_attn(on) / .use_mlock(on) / .use_mmap(on)` | `bool` | auto/off/on | Compute / memory. |
| `.numa(strategy)` | `string` | off | `distribute` / `isolate` / `numactl` / `""`. |
| `.enable_thinking(on)` | `bool` | on | Chat-template thinking flag. |
| `.add_tool(t) / .clear_tools()` | call | — | Tool registration. |
| `.on_token(cb) / .on_tool(cb) / .on_hop_reset(cb) / .on_incomplete_retry(cb)` | callback | — | Streaming hooks. |
| `.load() / .reset() / .clear_kv()` | call | — | Lifecycle. |
| `.set_sampling(t,p,k,m)` | call | — | Re-sample mid-conversation. |
| `.push_message(role, content, [tool_name, tool_call_id])` | call | — | Append history without generating. |
| `.replace_history(messages)` | call | — | Full-fidelity history replay. |
| `.chat(text) / .chat_continue() / .generate_one() / .generate()` | call | — | Inference primitives. |
| `.request_cancel() / .clear_cancel() / .cancel_requested()` | call | — | Thread-safe cancel. |
| `.last_error() / .last_was_ctx_full() / .turns() / .tools() / .backend_summary() / .n_ctx() / .model_path() / .perf_data() / .perf_reset()` | accessor | — | Introspection. |
### Library API — `easyai::Client` (remote OpenAI-compat)
Remote counterpart of `Engine`. Tools execute LOCALLY in the
consumer process. Header:
[`include/easyai/client.hpp`](include/easyai/client.hpp).
| Method | Type | Default | What it does |
|---|---|---|---|
| `.endpoint(url)` | `string` | — | `http(s)://host[:port]`. |
| `.api_key(key)` | `string` | — | Bearer token. |
| `.timeout_seconds(s)` | `int` | 86400 (24h) | Connect+read timeout — sized for multi-hour agentic sessions. |
| `.http_retries(n)` | `int` | 5 | Extra attempts on transient HTTP failures (pre-stream only — never retries mid-stream). 0 disables. Each retry logs to stderr. |
| `.verbose(v)` | `bool` | off | Log SSE lines to stderr. |
| `.log_file(fp)` | `FILE*` | — | Tee every HTTP transaction. |
| `.max_reasoning_chars(n)` | `int` | 0 (off) | Abort SSE when reasoning > N chars. |
| `.retry_on_incomplete(v)` | `bool` | on | Auto-retry "announce-only" turns. |
| `.stop_at_ctx_pct(pct)` | `int` | 100 | Bail when server-reported `ctx_used/n_ctx` exceeds. |
| `.max_tool_hops(n)` | `int` | 8 | Agentic-loop cap. |
| `.tls_insecure(v) / .ca_cert_path(path)` | `bool` / `string` | off / system | HTTPS-only TLS knobs. |
| `.model(id)` | `string` | — | Request body `model` field. |
| `.system(prompt)` | `string` | — | System prompt(s). |
| `.temperature(t) / .top_p(v) / .top_k(v) / .min_p(v)` | scalar | server | Sampling. |
| `.repeat_penalty(v)` | float | 1.15 | Repetition penalty — anti-loop default; `1.0` disables. |
| `.frequency_penalty(v) / .presence_penalty(v)` | float | server | OpenAI-shape penalties. |
| `.seed(s)` | `long long` | -1 | -1 = randomise. |
| `.max_tokens(n)` | `int` | server | Cap. |
| `.stop(sequences)` | `vector<string>` | — | Stop strings. |
| `.extra_body_json(raw)` | `string` | — | Free-form JSON merged into request body. |
| `.add_tool(t) / .clear_tools() / .tools()` | call | — | Tool registration. |
| `.on_token(cb) / .on_reason(cb) / .on_tool(cb)` | callback | — | Streaming hooks. |
| `.chat(text) / .chat_continue() / .clear_history()` | call | — | Inference + history. |
| `.list_models / .list_remote_tools / .health / .metrics / .props / .set_preset` | call | — | Direct endpoint helpers. |
| `.request_cancel() / .clear_cancel() / .cancel_requested()` | call | — | Thread-safe cancel. |
| `.last_error() / .last_turn_was_incomplete() / .last_ctx_used() / .last_n_ctx() / .last_ctx_pct() / .last_was_ctx_full()` | accessor | — | Introspection. |
### Library API — `easyai::cli::Toolbelt`
Canonical agent toolset, fluently configured. Replaces the
"copy the same `if (sandbox.empty()) … else …` block five times"
pattern. Header: [`include/easyai/cli.hpp`](include/easyai/cli.hpp).
| Method | Default | What it does |
|---|---|---|
| `.sandbox(dir)` | `""` | Root for the unified `fs` tool (empty = no fs tool). |
| `.allow_fs(on)` | on | Register the unified `fs` tool (off in server unless `--allow-fs`). |
| `.allow_bash(on)` | off | Register `bash` (also bumps `max_tool_hops` to 99999). |
| `.with_plan(plan)` | — | Register the planning tool backed by a `Plan&`. |
| `.no_web(on)` | off | Drop the unified `web` tool. |
| `.no_datetime(on)` | off | Drop `datetime`. |
| `.use_google(on)` | off | Enable engine=`"google"` inside `web` (env vars required at apply-time). |
| `.tools()` | — | Materialise `vector<Tool>`. |
| `.apply(engine) / .apply(client)` | — | Register on the consumer + bump hops if bash. |
### Sampling — what each knob does
At every step the model emits a probability distribution over the whole
vocabulary (~100k+ tokens). These knobs decide how a token is picked
from it. They work in sequence: the *cutters* (`top_k`, `top_p`,
`min_p`) narrow the candidate pool over the raw distribution, then
`temperature` controls how randomly the final token is drawn from the
survivors.
* **`temperature`** — the focus-vs-risk dial; divides the logits before
softmax. `→ 0` is greedy (always the top token: deterministic, can
repeat). `0.2–0.5` keeps the model tight on format, syntax, and
facts. `1.0` is the model's unmodified distribution. `> 1.0` flattens
the curve so unlikely tokens get a real chance — more varied and
creative, but more prone to error and incoherence. This is the main
*behaviour* dial.
* **`top_k`** — a *fixed* cut of the tail: keep only the K
most-probable tokens, discard the rest. Non-adaptive — it always cuts
at K whether the model is certain or unsure. A cheap guardrail
against ever picking junk from the long tail.
* **`top_p`** (nucleus) — an *adaptive* cut: keep the smallest set of
top tokens whose probabilities sum to P. Adapts to confidence — when
the model is sure (one token at 0.9) the nucleus is tiny; when it's
unsure (mass spread wide) the nucleus is large. Cuts the tail
proportionally.
* **`min_p`** — also adaptive, but anchored to the *top* token instead
of cumulative mass: keep tokens with `prob ≥ min_p × prob_of_top`.
`min_p 0.1` keeps anything within 10× of the best; `min_p 0.5` keeps
only what's within 2× — aggressive, very focused output.
**How they interact.** They stack. Tightening all of them at once (low
`top_k` + low `top_p` + low `temperature`) is redundant — they do the
same job and you over-constrain into robotic output. Practical rule:
pick *one* adaptive cutter (`top_p ~0.9–0.95` **or** `min_p ~0.05–0.1`),
leave `top_k` generous as a cheap backstop, and use `temperature` as
the real behaviour dial.
**How to tune.**
* *Code, agentic / tool-calling, structured output, factual Q&A* — low
`temperature` (0.2–0.6) and a tight tail cut. High temperature on
code means syntax errors, hallucinated APIs, broken tool calls.
* *Creative writing, brainstorming* — higher `temperature` (0.8–1.2),
looser cutters.
* *Heavily quantised models* — be more conservative (lower
`temperature`, tighter cut). Quantisation already adds noise to the
logits; high temperature amplifies that noise into real errors.
The presets below are just curated combinations of these four knobs —
e.g. `precise` (the project default) encodes `temp 0.2, top_p 0.95,
top_k 40, min_p 0.10`.
### Sampling presets
Named profiles applied via `--preset NAME` (binaries) or
`Engine::set_sampling()` / `easyai::find_preset()` (lib). Numbers are
baselines; `<preset> <number>` overrides temperature only. The
project-wide **default is `precise`** — tuned for code, math, and
factual Q&A, the dominant use case for a tool-calling agent. Override
when you need looser sampling.
| Name | temp | top_p | top_k | min_p | Behaviour | Pick when… |
|---|---|---|---|---|---|---|
| `deterministic` | 0.0 | 1.0 | 1 | 0.00 | Greedy: always picks the single most likely token. Same prompt → byte-identical answer every time. No randomness, no exploration. | You need reproducibility (CI, benchmarks, eval harnesses), or when even tiny variation breaks downstream parsing. |
| `precise` (default) | 0.2 | 0.95 | 40 | 0.10 | Sticks to high-confidence tokens. Concise, follows instructions tightly, rarely contradicts itself or invents facts. min_p of 0.10 aggressively prunes low-probability tokens — good for stable tool calls and structured output. | Code generation, math, factual Q&A, the `memory` tool, tool-calling agents, structured output (JSON/SQL/cypher), anything you'd want to be "right" rather than "interesting". |
| `balanced` | 0.7 | 0.95 | 40 | 0.05 | A bit of variety while still mostly committing to the most-likely answer. Phrasing varies between runs; the substance shouldn't. | General-purpose chat, summarisation, casual Q&A, anywhere you want natural-sounding prose without surprises. |
| `creative` | 1.0 | 0.95 | 40 | 0.05 | More phrasing variety, occasional surprising word choices, willingness to take a less-obvious angle. | Brainstorming, fiction, marketing copy, ideation, anything where "interesting" beats "literal". |
| `wild` | 1.4 | 0.98 | 60 | 0.00 | Maximum entropy. Frequently picks low-probability tokens; can wander off-topic, contradict itself, hallucinate. | Pure exploration, "show me something I wouldn't have thought of", stylistic experiments. Don't ship it. |
Aliases (case-insensitive) recognised by `find_preset()`:
`exact`→`precise`, `default`→`balanced`, `fun`→`creative`,
`chaos`→`wild`, `greedy`→`deterministic`.
Switching at runtime — three paths, same effect:
```bash
# CLI flag (start or restart)
easyai-server --preset creative
easyai-local --preset balanced
# Server endpoint (live, no restart)
curl -s -X POST http://localhost:8080/v1/preset \
-H 'Content-Type: application/json' \
-d '{"preset":"creative"}'
# easyai-cli helper
easyai-cli --url $URL --set-preset creative
The webui's preset bar uses the same endpoint — clicking a button
shifts every subsequent request server-wide. INI form for persistence
is [ENGINE] preset = precise (see
easyai-server.md §1).
Header: include/easyai/presets.hpp.
Your assistant. Your tools. Your hardware. No cloud subscription, no API bill, no data leaving the box.
-
Runs on a Raspberry Pi. Bonsai 8B Q1_0 weighs in at ~1.2 GB resident. A Pi 4 (8 GB) or any Pi 5 holds it with a 4 K context comfortably — and one install script puts a chat server at
http://pi-ai.localfor everyone on your home network. -
Runs on your Mac. Same one-script flow, Metal on Apple Silicon, full webui at
http://localhost:8080. No Docker, no Conda, no Python venv. Uninstall isrm -rfof the checkout. -
Plugs into the AI apps you already use. OpenAI-compatible (
/v1/chat/completions) — Claude Code, the OpenAI SDK, LiteLLM, LangChain, LobeChat, OpenWebUI all point at it without any easyai-specific configuration. Ollama-compat shims (/api/tags,/api/show) cover clients that prefer that shape. -
Speaks MCP. Claude Desktop, Cursor, Continue and any other Model Context Protocol client auto-discovers the tool catalogue. Write one tool — every AI app on your machine can call it.
-
Long-term memory built in. The
memorytool: onememory(action=...)tool (sub-actions save / append / search / load / list / delete / keywords) the agent uses to save, append (grow what you already know about the user without losing the previous body), search, load, list, delete, and inventory its own knowledge. It's a passive RAG technique — one human-readable Markdown file per entry,cat,vim,grepit. No vector DB to babysit. -
Operator-defined tool packs. Drop a JSON manifest in
/etc/easyai/external-tools/, the agent picks it up at startup. Give the model exactly the powers it needs (a database probe, a deploy command, a metrics query) without ever flipping--allow-bash. -
Safe defaults. No filesystem, no shell, no writes — until you opt in. Every privileged opt-in is logged at startup with sanity warnings (shell wrappers, world-writable binaries, dynamic-linker env passthrough). Three rounds of security audits in
SECURITY_AUDIT.md. -
A C++17 framework, not a wrapper. Three lines wrap llama.cpp into a real agent. Fluent builder for tools, full sampling control, streaming callbacks, plan tool, named sampling presets. Link
libeasyai, ship one binary. -
Ops-ready. Prometheus
/metrics, Bearer auth, systemd unit withmlock+LimitMEMLOCK=infinity, flash-attn, KV-cache quantisation (q8_0/q4_0/iq4_nl), per-request body cap, slow-loris timeouts. The Linux installer handles the whole Debian/Ubuntu deploy in one command.
# Raspberry Pi 4 / Pi 5 (Pi OS 64-bit) — your LAN's AI appliance:
git clone https://github.com/solariun/easy && cd easy
sudo ./scripts/install_easyai_pi.sh
# → http://pi-ai.local on every device on your network
# Mac (Apple Silicon or Intel):
git clone https://github.com/solariun/easy && cd easy
./scripts/install_easyai_macos.sh
# → http://localhost:8080
# Linux server (Debian / Ubuntu):
git clone https://github.com/solariun/easy && cd easy
sudo ./scripts/install_easyai_server.sh --model /path/to/your.gguf
# → http://0.0.0.0:80 with full systemd + auth + Prometheus metricsThen open the URL in any browser, or point your favourite OpenAI client at the same address. That's it.
The pitch in three lines:
#include "easyai/easyai.hpp"
int main() {
easyai::Agent a("models/qwen2.5-1.5b-instruct.gguf");
std::cout << a.ask("What time is it in Tokyo right now?") << "\n";
}That's the whole thing. Construct an Agent, ask, print. Default
toolset (datetime + the unified web tool) is already wired in;
the fs tool and bash stay off until you opt in. Remote endpoints
work the same way:
auto a = easyai::Agent::remote("http://127.0.0.1:8080/v1");
a.system("Be terse.")
.on_token([](auto p){ std::cout << p << std::flush; });
a.ask("Summarise this commit.");When you outgrow the 3-line shape, the same library exposes every
layer below — Tier 2 fluent builders (Toolbelt, Streaming),
Tier 3 explicit composables (Engine, Client, Backend,
Tool::builder), Tier 4 raw escape hatches (Agent::backend(),
llama.cpp handles). Higher tiers are implemented on top of lower
ones — no parallel codepaths — so Tier 1 stays trustworthy as the
project evolves.
// Tier 2 example: wire the canonical toolset onto an Engine in
// three fluent lines instead of seven add_tool calls.
easyai::Engine engine;
engine.model("models/qwen2.5-1.5b-instruct.gguf").gpu_layers(99).context(4096);
easyai::cli::Toolbelt()
.sandbox ("/srv/data") // enables the unified `fs` tool (read/write/list/glob/grep)
.allow_bash() // enables bash + bumps max_tool_hops to 99999
.apply (engine);
engine.load();
engine.chat("Find all .md files larger than 1 KB and summarise them.");Engine::chat() runs the full tool-call/tool-result loop for you — up to
8 hops by default (lift the cap with engine.max_tool_hops(N) for shell-driven
flows, or just register bash and the helpers do it for you).
Tool definitions are 6 lines:
engine.add_tool(
easyai::Tool::builder("flip_coin")
.describe("Returns 'heads' or 'tails' uniformly at random.")
.handle([](const easyai::ToolCall &) {
return easyai::ToolResult::ok((std::rand() & 1) ? "heads" : "tails");
})
.build());easyai::Engine— high-level wrapper around llama.cpp's model + context + sampler + chat templates. Fluent setters, RAII-owned native resources.easyai::Tool— name + description + JSON-schema params + handler. Builder API generates the schema for you.easyai::Plan— agent-friendly checklist with one multi-action tool (add / start / done / list). Pluggable intoEngineorClient; fires a callback on every mutation so you can render live.easyai::tools::*— built-in tools:datetime(no deps)web— unified search + fetch (action="search"/"fetch"). Search engine selectable:"auto"(default; cascades google → brave → ddg-lite → bing → ddg, returning the first that succeeds — Brave carries the keyless niche-query case since Bing RSS ignores quoted phrases and rare named entities, DDG Lite picks up when Brave's burst budget is gone, and Bing carries the keyless workhorse case for ordinary queries), or pin one explicitly:"google"(Google Custom Search JSON API, opt-in via thegoogle_enabledctor flag and the GOOGLE_API_KEY + GOOGLE_CSE_ID env vars),"brave"(Brave HTML scrape, keyless),"ddg-lite"(DuckDuckGo Lite endpoint with a Netscape Communicator 4.79 UA, keyless, page 1 only),"bing"(Bing RSS feed, keyless),"ddg"(DuckDuckGo HTML scrape, keyless). Page-based pagination on search; byte-window pagination on fetch. libcurl required at build time.fs— unified filesystem (action="read"/"write"/"list"/"glob"/"grep"/"check_path"/"cwd"/"sandbox"), sandboxed to a root directory you provide; the model sees a virtual/-rooted filesystem (real sandbox path is hidden).bash— shell command runner./bin/sh -c, cwd pinned to the sandbox root, stdout/stderr merged + capped, configurable timeout. Honest about what it is: NOT a hardened sandbox — runs with your user privileges. Opt-in.
easyai::presets— named sampling profiles (deterministic / precise / balanced / creative / wild) plus a tiny parser that turns chat lines like"creative 0.9"or"/temp 0.5"into sampling overrides.easyai::ui— terminal UI helpers (Style,Spinner,StreamStats). Auto-detect TTY, honourNO_COLOR, heartbeat-driven spinner so the glyph keeps animating during long tool calls.easyai::text— small string helpers (punctuate_think_tags,slurp_file,prompt_wants_file_writeheuristic).easyai::log—set_file(FILE*)+write(fmt, ...): tee diagnostic output to stderr and an optional log file.easyai::cli— CLI infrastructure:Toolbelt— fluent builder that registers the canonical agent toolset on anEngineorClientand bumpsmax_tool_hopsto 99999 when bash is enabled.open_log_tee / close_log_tee— open/tmp/<prefix>-<pid>-<epoch>.logwith header, register as the global log sink.validate_sandbox(path, &err)— uniform "exists? is a dir?" check.client_has_tool(client, name),print_models / print_local_tools / print_remote_tools / print_health / print_props / print_metrics / set_preset— management subcommand helpers that drive aneasyai-serverfrom a one-line dispatcher.
easyai::Backend(+LocalBackend,RemoteBackend) — common interface for "give me a model, local or remote, with the same chat/reset/set_system shape". Linking onlyeasyai::enginegets you LocalBackend; addingeasyai::cliadds RemoteBackend without duplicating the abstraction.easyai::Agent— the friendly Tier-1 façade over Backend. 3-line hello-world, fluent setters for system/sandbox/allow_bash/preset, andbackend()as the escape hatch back to Tier 3 power.
easyai::Client— same fluent API shape asEngine, but the model runs on a remote/v1/chat/completionsendpoint and tools execute locally. Configures HTTP transport (endpoint,api_key,timeout_seconds,verbose) plus the full sampling/penalty surface (temperature,top_p,top_k,min_p,repeat_penalty,frequency_penalty,presence_penalty,seed,max_tokens,stop(vector),extra_body_json). Streaming callbacks (on_token,on_reason,on_tool) and an agentic multi-hop loop mirrorEngine::chat_continuesemantics.- Direct-endpoint helpers —
list_models,list_remote_tools,health,metrics,props,set_preset— let downstream apps script and introspect aneasyai-serverwithout ever touching curl.
All three example CLIs (easyai-local, easyai-cli, easyai-server)
follow the same gating model. Default is safe: no filesystem access,
no shell.
| Flag | What it enables |
|---|---|
| (no flag) | datetime and the unified web tool (action=search / fetch) only. |
--sandbox <dir> |
The unified fs tool (action=read / write / list / glob / grep / check_path / cwd / sandbox), all scoped to <dir>. The CLIs chdir into <dir> so fs(action="cwd") reports the sandbox path back to the model. |
--allow-bash |
bash (run /bin/sh -c). cwd = --sandbox <dir> if given, otherwise the binary's CWD. NOT a hardened sandbox — runs with your user privileges. Also bumps the agentic-loop max_tool_hops to 99999 (bash flows naturally span many turns). |
--use-google |
Enables engine="google" inside the unified web tool (Google Custom Search JSON API), and lets the default engine="auto" cascade try google as its first hop. Requires GOOGLE_API_KEY and GOOGLE_CSE_ID env vars. Counts against your Google quota — free tier is 100 queries/day per key. Without this flag (or without the env vars), the auto cascade silently skips google and falls through to brave → ddg-lite → bing → ddg. |
--external-tools <dir> |
Load every EASYAI-<name>.tools file in <dir> as an operator-defined tool pack. Per-file fault isolation (a bad file is logged + skipped, the agent still starts). Spawns via fork+execve — never a shell. This is the supported way to give the model focused powers without flipping --allow-bash. See EXTERNAL_TOOLS.md. |
--memory <dir> |
Enable the agent's persistent memory (search / store / append / recall / update / forget) — a passive RAG technique over keyword-indexed Markdown files. Registers ONE memory(action=...) tool with sub-actions save, append (grow an existing memory without losing its body), search, load, list, delete, keywords — each memory one Markdown file in <dir>. Memories whose title starts with fix-easyai- are immutable: pass fix=true (sub-action save) to mint one. --RAG is still accepted as a back-compat alias. The systemd-installed server passes this by default (/var/lib/easyai/rag). See RAG.md. |
--mcp <url> |
Connect to a remote MCP server as a CLIENT (e.g. another easyai-server or easyai-mcp-server). The upstream's tool catalogue is fetched via tools/list and merged into the local one; each remote tool's handler proxies tools/call back to it. Local tool names win on collision (remote dup skipped with a warning). Pair with --mcp-token <token> when the upstream requires bearer auth. |
--no-local-tools |
Skip the LOCAL built-in toolbelt entirely (datetime, web, fs, bash, ...). Useful when you want ONLY external tools, ONLY the memory tool, or ONLY tools fetched via --mcp. Does NOT disable the MCP client — that's controlled by --mcp. Renamed from --no-tools. |
The systemd-installed server reads every operator-tunable knob —
host, port, alias, sandbox, memory dir, KV cache types, mlock, flash-attn,
threads, MCP auth, the works — from one INI file. CLI flags on the
unit override INI values; INI overrides hardcoded defaults. So
tweak the file + restart, no systemctl edit cadence:
[SERVER]
host = 0.0.0.0
port = 80
alias = EasyAi
mcp_auth = on ; require Bearer on /mcp
[ENGINE]
ngl = -1 ; auto-fit GPU
flash_attn = on
mlock = on
cache_type_k = q8_0
cache_type_v = q8_0
[MCP_USER]
gustavo = REPLACE-WITH-OPENSSL-RAND-HEX-32Full key reference + worked examples: easyai-server.md §1.
easyai-server exposes its full tool catalogue (built-ins + the memory tool + every operator-defined --external-tools pack) via the Model Context Protocol at POST /mcp. Other AI applications connect, list, and dispatch:
Claude Desktop ──► [stdio bridge] ──► POST /mcp ──┐
Cursor ─────────────────────► POST /mcp ──┤── easyai-server
Continue ─────────────────────► POST /mcp ──┘ (one tool catalogue,
many consumers)
You build the tools once. Your memory tool, your deploy CLI, your monitoring queries — written ONCE for your easyai-server — become available in every AI app you already use. No plugin per app.
# List tools the server is exposing right now
curl -fsS http://localhost/mcp -H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | jq '.result.tools[] | .name'It also speaks the OpenAI (/v1/chat/completions, /v1/models) and Ollama (/api/tags, /api/show) list-models APIs so OpenAI-SDK, LangChain, LiteLLM, LobeChat, OpenWebUI, etc. auto-discover the loaded model and chat without any easyai-specific configuration.
Full guide: MCP.md. Bridge script for Claude Desktop: scripts/mcp-stdio-bridge.py.
Without long-term memory, every session starts from zero: the model
re-derives your preferences, re-learns your project, re-asks the same
questions. With --memory, the model decides what's worth remembering
and writes it to a directory of small Markdown files — a passive RAG
technique, no embedding model or vector store. Next session, it
searches by keyword, finds what its past self saved, and picks up
where you left off.
> "I prefer terse responses in PT-BR."
[model: memory(action="save", title="user-prefs", keywords=["user","prefs","locale"], content="...")]
[next session]
> "build easyai on the AI box"
[model: memory(action="search", keywords=["easyai"]) → finds your saved build recipe]
[model loads it and answers in your style]
The dir is at /var/lib/easyai/rag/ on the installed server. You
can cat, vim, grep, hand-author entries, back it up with tar
— it's a directory of plain text files. The model is the curator;
you, the operator, can read and edit anything it decided to keep.
(--RAG still works as a back-compat alias for --memory.)
Future evolution (see RAG.md): progressive recall on session start,
automatic document ingestion, per-user namespaces. The on-disk format
won't change.
Most agent frameworks force a binary choice: either you ship the model with the tools the framework's authors thought of, or you give it a generic shell. The framework's authors don't know about your internal deploy CLI, your jq wrappers, your monitoring queries — and a generic shell is a structurally unsafe surface no matter how careful you are.
easyai's --external-tools is the missing third option. Drop a JSON
file in the configured directory:
{
"version": 1,
"tools": [
{
"name": "deploy_status",
"description": "Status of one of our services in the control plane.",
"command": "/opt/internal/bin/deploy-cli",
"argv": ["status", "--", "{service}"],
"parameters": {
"type": "object",
"properties": { "service": {"type":"string"} },
"required": ["service"]
},
"timeout_ms": 10000,
"max_output_bytes": 32768,
"cwd": "$SANDBOX",
"env_passthrough": ["DEPLOY_TOKEN"]
}
]
}Restart the server. The model can now ask for deploy_status(service:"billing-api"). The framework guarantees:
- No shell.
fork+execvedirectly. The model's argument fills exactly one argv slot —; rm -rf /cannot escape it. - No PATH-hijack. Absolute command paths are mandatory and validated at load.
- No quoting bugs. Whole-element placeholders only;
--flag={x}is rejected at load (split into["--flag","{x}"]). - Schema-validated arguments. Type errors rejected before
fork(). - Bounded resources. Timeout, output size, env-var inheritance, fd inheritance — every channel capped.
- Per-file fault isolation. A typo in
EASYAI-experimental.toolsdoesn't preventEASYAI-system.toolsfrom loading. - Operator/user collaboration. Drop additional
EASYAI-*.toolsfiles in the dir and they appear after a restart. Different teams can own different files.chmod o-wenforced at the directory level. - Sanity-check warnings at load. Wrap a shell? Let the model influence
LD_PRELOAD? Manifest world-writable? You'll see it in the startup log.
The default install creates /etc/easyai/external-tools/ empty — drop your first .tools file in and you're live. Full guide and ten worked recipes in EXTERNAL_TOOLS.md.
easyai-local -m model.gguf [-s system.txt] [--ngl 99] [--no-tools]
[--sandbox DIR] [--allow-bash]
Local-only REPL. Type any line to talk; type any of these to control the engine:
| Command | Effect |
|---|---|
precise |
Switch to the precise preset |
creative 0.9 |
Switch to creative, override temperature to 0.9 |
/temp 0.5 |
Set temperature only |
/system <text> |
Replace system prompt and clear history |
/reset |
Clear conversation history |
/tools |
List currently-registered tools |
/help |
Show all presets |
/quit |
Leave |
Loads a system.txt if you pass -s; this is the server-default system
prompt (in the CLI's case, just the system prompt for that REPL session).
easyai-server -m model.gguf [-s system.txt] [--port 8080] [--ngl 99]
[--sandbox DIR] [--allow-bash]
OpenAI-compatible HTTP server. Endpoints:
| Verb | Path | Notes |
|---|---|---|
| GET | / |
Embedded single-file webui (chat + preset bar) |
| GET | /health |
JSON status (model, backend, tool count, ambient preset) |
| GET | /v1/models |
Lists the loaded model in OpenAI format |
| POST | /v1/chat/completions |
OpenAI-shape request, including optional tools, temperature, top_p, top_k overrides |
| POST | /v1/preset |
{"preset":"creative"} — change the ambient preset for the webui |
The killer feature — when a client (Claude Code, an OpenAI SDK, LiteLLM,
LangChain…) posts its own system message and/or own tools to
/v1/chat/completions, those win for that single request:
- Client provides
tools→ easyai forwards generated tool calls back to the client and does not dispatch them locally. The client controls the loop. - Client provides no
tools→ easyai uses its own toolbelt and runs the multi-hop loop server-side, returning the final assistant message.
Either way the server-supplied system.txt is used only when the request
doesn't already include a system message.
This makes easyai-server look like a real OpenAI-compatible backend to any
client that expects one.
A fresh easyai-server boots up as Deep — an expert system
engineer who answers from CHECKED FACTS, not impressions. Built into
the default system prompt so a small open-weights model behaves like
an engineer instead of a chatbot from minute one.
Deep's operating loop is: TIME → THINK → PLAN → EXECUTE → VERIFY.
- Time first. Any question that touches "now", "today", a
deadline, a release version, or a fact that could have changed
since training cutoff →
datetimeis the first tool call. Anchors the rest of the turn to the real wall clock. - Think. State the goal, identify what's known vs. needs lookup, what could go wrong.
- Plan. Multi-step tasks call
plan(action='add', text=…)first so the user can see and intervene live. The model usesplan(action='update', id=…, status='working'|'done'|'error')to advance steps andaction='delete'to retire abandoned ones (rendered struck through, not removed). Statuses:pending | working | done | error | deleted. Batch via theitemsarray (max 20). - Execute. Every registered tool is fair game.
- Verify. Before claiming success — does the file exist? does the test pass? does the URL really say that? When in doubt, run another tool instead of guessing.
Old behaviour rules carry over: RULE 1 (execute or answer, never
just announce), web(action="search") → web(action="fetch")
mandatory, citations stick to the URL actually fetched.
Operators who want a different persona pass --system "<text>" or
-s persona.txt — Deep is the default, not a hardcoded identity.
develop/
├── easyai/ # this project
└── llama.cpp/ # cloned next to it (https://github.com/ggml-org/llama.cpp)
cd easyai
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release # see "Build for your hardware" below
cmake --build build -j
# Local REPL with everything wired up
./build/easyai-local -m models/qwen2.5-1.5b-instruct-q4_k_m.gguf
# Agentic REPL talking to a remote OpenAI-compatible endpoint
./build/easyai-cli --url http://127.0.0.1:8080
./build/easyai-cli --url https://api.openai.com/v1 \
--api-key $OPENAI_API_KEY --model gpt-4o-mini
# One-shot mode (great in scripts — banners on stderr, model text on stdout)
./build/easyai-local -m models/qwen2.5-1.5b-instruct-q4_k_m.gguf -p "What is 2+2?"
result=$(./build/easyai-cli --url http://127.0.0.1:8080 --no-reasoning -p "summarise this commit")
# Open http://127.0.0.1:8080 in a browser
./build/easyai-server -m models/qwen2.5-1.5b-instruct-q4_k_m.ggufPoint any OpenAI client at it:
curl http://127.0.0.1:8080/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"easyai","messages":[{"role":"user","content":"Hi!"}]}'For Claude Code (or any tool that takes an OpenAI-compatible base URL), set
http://127.0.0.1:8080/v1 as the base. Any tools the client declares will
be forwarded; any tools it doesn't declare will use the server's toolbelt.
Every target is independent. Configure once, then build whichever subset matters for your situation:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release # configure once
# Just the engine library (libeasyai.so + headers):
cmake --build build -j --target easyai
# Just the OpenAI-protocol client library (libeasyai-cli.so):
cmake --build build -j --target easyai_cli
# Just the agentic remote CLI (links libeasyai-cli):
cmake --build build -j --target easyai-cli
# Just the local-only REPL (links libeasyai):
cmake --build build -j --target easyai-local
# Just the server:
cmake --build build -j --target easyai-server
# Drop the examples entirely (lib-only consumers):
cmake -S . -B build -DEASYAI_BUILD_EXAMPLES=OFF
cmake --build build -j
# Drop the embedded webui from easyai-server (smaller binary):
cmake -S . -B build -DEASYAI_BUILD_WEBUI=OFF
cmake --build build -j
# Drop libcurl-using tools (the `web` tool's search and fetch actions):
cmake -S . -B build -DEASYAI_WITH_CURL=OFF
cmake --build build -j
# Clean rebuild from scratch:
rm -rf build
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
# Just delete object files but keep configuration:
cmake --build build --target cleanAfter cmake --install build --prefix /usr/local, downstream projects
can find_package(easyai 0.1 REQUIRED):
# Your CMakeLists.txt:
find_package(easyai 0.1 REQUIRED)
add_executable(myapp main.cpp)
target_link_libraries(myapp PRIVATE
easyai::engine # libeasyai.so — local llama.cpp wrapper
easyai::cli # libeasyai-cli.so — OpenAI-protocol client
)Both targets export their public include directory and cxx_std_17
feature, so consumers don't need any extra include flags.
Pick the matching configure command for your machine; rebuild with
cmake --build build -j.
| Hardware | Configure command | Notes |
|---|---|---|
| Apple Silicon / Intel Mac (Metal) | cmake -S . -B build -DCMAKE_BUILD_TYPE=Release |
Metal is auto-detected on macOS — nothing extra to set. |
| NVIDIA GPU (CUDA) | cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON |
Needs the CUDA Toolkit (nvcc). Optionally pin GPU arch with -DCMAKE_CUDA_ARCHITECTURES=89. |
| AMD / Intel / cross-vendor (Vulkan) | cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_VULKAN=ON |
Needs the Vulkan SDK on Linux/Windows. Works on AMD RX/Pro, Intel Arc, NVIDIA too. |
| AMD on Linux (ROCm/HIP) | cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1100 |
Replace the gfx ID with your card's. Requires ROCm 6+. |
| CPU-only (any OS) | cmake -S . -B build -DCMAKE_BUILD_TYPE=Release |
Then run with -ngl 0 (CLI/server) or .gpu_layers(0) (lib). |
Add -DGGML_OPENBLAS=ON (Linux) or -DGGML_BLAS=ON (macOS uses Accelerate
automatically) for a faster CPU prompt-eval path.
If both Metal and CUDA libraries are present (rare), keep one and disable
the other explicitly with -DGGML_METAL=OFF / -DGGML_CUDA=OFF.
web(action="search") with the default engine="auto" cascades through
five backends and returns the first one that succeeds:
- Google CSE — only if
--use-googleis passed ANDGOOGLE_API_KEYGOOGLE_CSE_IDare set; if any are missing this hop is silently skipped (not a failure).
- Brave HTML —
search.brave.com/search?q=…. Keyless HTML SSR, ~20 results per page. The keyless engine that best understands the full query — unlike Bing RSS (which strips quoted phrases and rare named entities, returning Wikipedia about Santiago de Compostela for"Santiago Cavalcante" PNUD), Brave honours the whole query. Downside: throttles single IPs aggressively (HTTP 429 after a small burst), and its Svelte CSS classes rotate between deploys (the scraper anchors on stable hooks, so hash rotation alone won't break it; a structural markup rewrite will). - DDG Lite —
lite.duckduckgo.com/lite/?q=…accessed with a Netscape Communicator 4.79 (Windows NT 5.0) User-Agent. Keyless, ~10 results per query, page 1 only. The Netscape UA matters: DDG Lite is the no-JS endpoint maintained for old browsers and accessibility, so DDG serves it without the anti-bot challenge when the UA obviously can't run JS. Result quality is comparable to Brave for entity queries (returns the actual LinkedIn / Google Scholar / Brazilian profile hits for"Santiago Cavalcante" PNUD) and isn't rate-limited the way Brave is — so it's the workhorse when Brave's burst budget runs out. - Bing RSS —
www.bing.com/search?q=…&format=rss. Keyless, captcha-free XML feed maintained for legitimate feed consumers. Caps at ~10 results per query and ignores pagination, but stable and fast for ordinary keyword queries. - DuckDuckGo HTML scrape —
html.duckduckgo.com/html/. Keyless, the historical default, kept as last resort because DDG's anti-bot heuristics now return an "anomaly" page (HTTP 202, no results) for most server IPs (the modern endpoint is gated even though the Lite endpoint isn't).
Pin a specific backend with engine="google" / "brave" /
"ddg-lite" / "bing" / "ddg" when you want to bypass the cascade
(useful for diagnosis: "does ddg still work from this box?"). The
output's engine: <name> header line tells the model which backend
actually answered.
manual.md— hands-on developer manual. Includes a step-by-step "Recipe book — write your first tools" chapter (section 3.8) that walks throughexamples/recipes.cppline by line in a friendly, accessible style. Best place to start if you want to extend easyai with your own services.design.md— architecture, data flow, why we build on top ofcommon/instead of justinclude/llama.h.scripts/install_easyai_server.sh— one-shot Debian/Ubuntu installer; drop-in replacement for theinstall_llama_server.shworkflow. Clones llama.cpp + easyai, builds with the right backend (auto-detects Vulkan / CUDA / ROCm / CPU), creates a system user +/var/lib/easyai, drops a hardened systemd unit with mlock + flash-attn + q8_0 KV cache + Bearer auth + Prometheus/metrics. Accepts every flag the original took (--with-mcp,--draft-model,--webui-title, etc.) — built-in features become no-ops with a friendly warning so existing automation keeps working.
Every native resource is owned by a smart pointer or a value type with a custom destructor:
Engine—std::unique_ptr<Impl>pImpl pattern. TheImpldestructor frees the sampler explicitly; the model, context, and chat-templates are unique-pointer-owned.easyai-server— singlestd::unique_ptr<ServerCtx>lives for the process lifetime. Astd::mutexserialises the engine across httplib's worker threads.- HTTP handlers cap request bodies at 8 MiB (configurable via
--max-body) and catch everystd::exceptionat the boundary so a malformed request cannot tear down the server. - No raw
new/deleteanywhere insrc/orexamples/.
Inherits the MIT license of llama.cpp. See LICENSE.