Peripheral

A small, local model that manages a knowledge base, so a frontier model doesn't have to.

Peripheral is a fine-tuned 8B model that does the unglamorous admin around a filesystem knowledge base: it routes a query to the right files, notices when stored knowledge has gone stale, and decides whether a recent change is safe to serve. It runs locally, costs nothing per call, and on these knowledge-management tasks it beats a frontier API at a fraction of the latency and cost.

It's a knowledge management agent (KMA): a specialist, not a better frontier model. On KMA-Bench it scores 81% to Claude Sonnet 4.6's 76%, at ~760ms per call (roughly 5x faster and 5x cheaper).

Note

Peripheral was trained on ~7,000 task examples and beats a zero-shot generalist, but it transfers only partially to unseen domains. See the model card for the honest limitations.

Features

Local and cheap: an 8B Q4_K_M GGUF (~4.8 GB VRAM) you run in LM Studio, at zero API cost.
Three jobs, one model: routing, change evaluation, and serve/annotate/block gating, each a tagged prompt that returns a single JSON object.
Cheap heuristics do the easy work: a topic-keyword index for routing and git diffs for staleness; the model is reserved for the hard call: is this change legitimate, corrupt, or somewhere in between?
Reproducible: generate the training data and fine-tune on your own knowledge base, no frontier API required.

How it works

A query passes through a pipeline of cheap heuristics before the model is asked anything, so the model only does the work that genuinely needs judgement.

User query
    │
    ▼
┌──────────────────────────────────────────────────────────┐
│                     Peripheral pipeline                  │
│  ┌────────┐    ┌────────┐    ┌──────────────────┐        │
│  │ Router │──▶│  PVL   │──▶│   Block Gate     │         │
│  │ topic  │    │ git    │    │ serve / annotate │        │
│  │ index  │    │ diff   │    │ / block          │        │
│  └────────┘    └────────┘    └────────┬─────────┘        │
│      │                                │                  │
│  relevant files                ┌──────┴──────┐           │
│  (5-6x fewer tokens)           │ Eval (diff  │           │
│                                │ classifier) │           │
│                                └─────────────┘           │
└──────────────────────────────────────────────────────────┘
    │
    ▼
Response from verified, routed context

Router: maps a query to the relevant wiki files via a hand-built topic-keyword index (100% hit rate vs ~30% for embedding search on a non-English KB).
PVL (Peripheral Vision Layer): an append-only change log from git diffs that flags recently modified files, at ~226 tokens per query.
Block Gate: instead of injecting a "this might be stale" warning (which the model ignores), it withholds the content and serves the diff plus the previous version, reframing the task as evaluate this change.
Peripheral (the model): Qwen3-8B fine-tuned with QLoRA, the judgement layer that classifies a change as accept / partial / reject.

See docs/architecture.md for the full design.

Results

On KMA-Bench, averaged over multiple runs across three knowledge bases (% correct):

Model	Diff Eval	Routing	Gate	Overall	Latency
Heuristic	44%	75%	100%	64%	0 ms
Base Qwen3-8B	46%	13%	16%	30%	~3,500 ms
Peripheral (8B)	71%	88%	96%	81%	~760 ms
Claude Sonnet 4.6 (zero-shot)	60%	96%	83%	76%	~4,100 ms

Fine-tuning carried the base model from 30% (it couldn't reliably emit valid JSON) to 81%. The public benchmark ships 166 cases (French + PostHog); ClickHouse was evaluated too but isn't redistributed.

Getting Started

Prerequisites: Python 3.11+, uv, and LM Studio (for the local model) or an Anthropic API key.

uv sync
cp .env.example .env        # add your Anthropic key only if benchmarking Claude

Run the benchmark against the local model (loaded in LM Studio) or Claude:

uv run python scripts/benchmark/benchmark_core.py --backend lmstudio --model-name peripheral-8b
uv run python scripts/benchmark/benchmark_core.py --backend anthropic

Reproduce the training data and fine-tune split on your own knowledge base, generated locally, zero API cost:

uv run python scripts/datagen/gen_eval.py    --wiki-path ./french-kb/wiki --output ./training-data/eval
uv run python scripts/datagen/gen_routing.py --wiki-path ./french-kb/wiki --output ./training-data/routing
uv run python scripts/datagen/gen_gate.py    --wiki-path ./french-kb/wiki --output ./training-data/gate
uv run python scripts/training/prep_finetune.py --data-dir ./training-data --output ./finetune-data

Fine-tune the prepared split with Unsloth (QLoRA on a single GPU) and export a Q4_K_M GGUF.

Using the model

The fastest path is LM Studio: search malgamves/peripheral-8b, download the GGUF, and give the model a task instruction ([EVAL], [ROUTE], or [GATE]) plus your input. It replies with a single JSON object.

Tip

Keep the temperature low (~0.1). Peripheral is trained to emit short, deterministic JSON. Full prompt formats are in the model card.

Project layout

peripheral/
├── french-kb/       French language-learning wiki (the author's notes)
├── scripts/
│   ├── benchmark/   KMA-Bench runners
│   ├── datagen/     Training-data generation
│   ├── measurement/ Token-measurement experiments
│   ├── pvl/         Peripheral Vision Layer + block gate
│   ├── training/    Fine-tuning prep (train/val split)
│   └── validation/  Index-validation experiments
└── docs/            Architecture notes + model card

The benchmark lives in its own repo: KMA-Bench.

Note

An MCP Server is coming!!

Documentation

Architecture: the pipeline, the four tasks, and the data flow.
Model card: prompts, results, training details, and limitations.
Write-up: the full story behind the project.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
french-kb		french-kb
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Peripheral

Features

How it works

Results

Getting Started

Using the model

Project layout

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Peripheral

Features

How it works

Results

Getting Started

Using the model

Project layout

Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages