Skip to content

malgamves/peripheral

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Peripheral

A small, local model that manages a knowledge base, so a frontier model doesn't have to.

License Model Hugging Face Python

Model · KMA-Bench · Write-up

Peripheral is a fine-tuned 8B model that does the unglamorous admin around a filesystem knowledge base: it routes a query to the right files, notices when stored knowledge has gone stale, and decides whether a recent change is safe to serve. It runs locally, costs nothing per call, and on these knowledge-management tasks it beats a frontier API at a fraction of the latency and cost.

It's a knowledge management agent (KMA): a specialist, not a better frontier model. On KMA-Bench it scores 81% to Claude Sonnet 4.6's 76%, at ~760ms per call (roughly 5x faster and 5x cheaper).

Note

Peripheral was trained on ~7,000 task examples and beats a zero-shot generalist, but it transfers only partially to unseen domains. See the model card for the honest limitations.

Features

  • Local and cheap: an 8B Q4_K_M GGUF (~4.8 GB VRAM) you run in LM Studio, at zero API cost.
  • Three jobs, one model: routing, change evaluation, and serve/annotate/block gating, each a tagged prompt that returns a single JSON object.
  • Cheap heuristics do the easy work: a topic-keyword index for routing and git diffs for staleness; the model is reserved for the hard call: is this change legitimate, corrupt, or somewhere in between?
  • Reproducible: generate the training data and fine-tune on your own knowledge base, no frontier API required.

How it works

A query passes through a pipeline of cheap heuristics before the model is asked anything, so the model only does the work that genuinely needs judgement.

User query
    │
    ▼
┌──────────────────────────────────────────────────────────┐
│                     Peripheral pipeline                  │
│  ┌────────┐    ┌────────┐    ┌──────────────────┐        │
│  │ Router │──▶│  PVL   │──▶│   Block Gate     │         │
│  │ topic  │    │ git    │    │ serve / annotate │        │
│  │ index  │    │ diff   │    │ / block          │        │
│  └────────┘    └────────┘    └────────┬─────────┘        │
│      │                                │                  │
│  relevant files                ┌──────┴──────┐           │
│  (5-6x fewer tokens)           │ Eval (diff  │           │
│                                │ classifier) │           │
│                                └─────────────┘           │
└──────────────────────────────────────────────────────────┘
    │
    ▼
Response from verified, routed context
  • Router: maps a query to the relevant wiki files via a hand-built topic-keyword index (100% hit rate vs ~30% for embedding search on a non-English KB).
  • PVL (Peripheral Vision Layer): an append-only change log from git diffs that flags recently modified files, at ~226 tokens per query.
  • Block Gate: instead of injecting a "this might be stale" warning (which the model ignores), it withholds the content and serves the diff plus the previous version, reframing the task as evaluate this change.
  • Peripheral (the model): Qwen3-8B fine-tuned with QLoRA, the judgement layer that classifies a change as accept / partial / reject.

See docs/architecture.md for the full design.

Results

On KMA-Bench, averaged over multiple runs across three knowledge bases (% correct):

Model Diff Eval Routing Gate Overall Latency
Heuristic 44% 75% 100% 64% 0 ms
Base Qwen3-8B 46% 13% 16% 30% ~3,500 ms
Peripheral (8B) 71% 88% 96% 81% ~760 ms
Claude Sonnet 4.6 (zero-shot) 60% 96% 83% 76% ~4,100 ms

Fine-tuning carried the base model from 30% (it couldn't reliably emit valid JSON) to 81%. The public benchmark ships 166 cases (French + PostHog); ClickHouse was evaluated too but isn't redistributed.

Getting Started

Prerequisites: Python 3.11+, uv, and LM Studio (for the local model) or an Anthropic API key.

uv sync
cp .env.example .env        # add your Anthropic key only if benchmarking Claude

Run the benchmark against the local model (loaded in LM Studio) or Claude:

uv run python scripts/benchmark/benchmark_core.py --backend lmstudio --model-name peripheral-8b
uv run python scripts/benchmark/benchmark_core.py --backend anthropic

Reproduce the training data and fine-tune split on your own knowledge base, generated locally, zero API cost:

uv run python scripts/datagen/gen_eval.py    --wiki-path ./french-kb/wiki --output ./training-data/eval
uv run python scripts/datagen/gen_routing.py --wiki-path ./french-kb/wiki --output ./training-data/routing
uv run python scripts/datagen/gen_gate.py    --wiki-path ./french-kb/wiki --output ./training-data/gate
uv run python scripts/training/prep_finetune.py --data-dir ./training-data --output ./finetune-data

Fine-tune the prepared split with Unsloth (QLoRA on a single GPU) and export a Q4_K_M GGUF.

Using the model

The fastest path is LM Studio: search malgamves/peripheral-8b, download the GGUF, and give the model a task instruction ([EVAL], [ROUTE], or [GATE]) plus your input. It replies with a single JSON object.

Tip

Keep the temperature low (~0.1). Peripheral is trained to emit short, deterministic JSON. Full prompt formats are in the model card.

Project layout

peripheral/
├── french-kb/       French language-learning wiki (the author's notes)
├── scripts/
│   ├── benchmark/   KMA-Bench runners
│   ├── datagen/     Training-data generation
│   ├── measurement/ Token-measurement experiments
│   ├── pvl/         Peripheral Vision Layer + block gate
│   ├── training/    Fine-tuning prep (train/val split)
│   └── validation/  Index-validation experiments
└── docs/            Architecture notes + model card

The benchmark lives in its own repo: KMA-Bench.

Note

An MCP Server is coming!!

Documentation

  • Architecture: the pipeline, the four tasks, and the data flow.
  • Model card: prompts, results, training details, and limitations.
  • Write-up: the full story behind the project.

About

a fine-tuned 8B model that manages a filesystem knowledge base: routing, staleness detection, and change evaluation, run locally and cheaply.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages