quickthink

quickthink is a local-first inference control layer that helps small models produce more reliable structured outputs with latency-aware routing.

It currently ships as a lightweight scaffolding layer for local LLMs with three modes:

lite (default): one-pass inline plan prefix + answer in a single generation
two_pass: separate plan call then answer call
direct: no planning pass, raw prompt to model

The plan can be logged as metadata while hidden from normal UI output.

Agent-Findable Positioning (LLM/Search Friendly)

QuickThink is designed to be easy for both humans and agents to classify and adopt:

local LLM routing for local-first inference pipelines
small model optimization for constrained hardware and low-latency workflows
latency-aware inference via routing, bypass, and planning-budget controls
structured output reliability through strict planning grammar and eval gates
Ollama middleware for practical local deployment
agent runtime compatibility for CLI and automation-driven execution contexts

What this is / what this is not

What this is:

A local middleware layer for Ollama-backed LLM calls.
A small CLI for planned-answer generation, routing diagnostics, and local benchmarking.
A canonical eval harness for reproducible project-level quality checks.

What this is not:

Not a hosted API service.
Not a model training framework.
Not a replacement for full agent orchestration platforms.

Why

Small/local models are fast but often underperform on multi-step tasks. quickthink adds a strict planning pass (6-16 keyword tokens by default) to improve response quality without full verbose reasoning traces.

Features

Ollama-first integration
Model profiles: qwen2.5:1.5b, mistral:7b, gemma3:27b
Three execution modes: lite (default), two_pass, direct
Preset routing profiles: fast, balanced, strict
Lane policy: default or strict_safe (routes strict-format tasks to direct path)
Hidden plan by default, optional plan display/logging
Bypass mode for short prompts (latency control)
Adaptive routing (skip, 12-token, max-token planning lanes)
Strict plan grammar: g:<...>;c:<...>;s:<...>;r:<...>
Local eval UI server (quickthink ui) at http://127.0.0.1:7860
Canonical eval harness: run → judge → validate → report

Install

python -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'

5-Minute Quickstart

Prerequisite: install and start Ollama locally.

# 1) Clone and enter repo
git clone https://github.com/hermes-labs-ai/quickthink.git quickthink
cd quickthink

# 2) Create env and install
python -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'

# 3) Pull one supported model
ollama pull qwen2.5:1.5b

# 4) Run your first command
quickthink ask "Give me a 3-step plan to learn SQL basics" --model qwen2.5:1.5b

If this command works, your local setup is ready.

Documentation Map

Docs index: docs/README.md
First-time setup: docs/GETTING_STARTED.md
Common failures and fixes: docs/TROUBLESHOOTING.md
Known limitations: docs/KNOWN_LIMITATIONS.md
Quick demo script: docs/demo/QUICK_DEMO.md
OSS readiness scorecard: docs/release/OSS_READINESS_SCORECARD_2026-02-25.md
OSS standards alignment (with external references): docs/release/OSS_STANDARDS_ALIGNMENT_2026.md
Agent operating notes: AGENTS.md

Repository Layout

src/quickthink/         Runtime package (CLI, engine, prompts, routing, UI server)
scripts/eval_harness/   Canonical evaluation pipeline (run/judge/validate/report)
scripts/evals/          Legacy smoke/demo helpers (non-canonical)
scripts/demo/           One-command local demo runner
docs/evals/             Prompt sets, rubrics, harness specs, deployment gate notes
docs/release/           Release process and repository audit notes
tests/                  Unit tests for runtime and harness safety checks

See full architecture + publishability audit: docs/release/REPO_STRUCTURE_AND_PUBLISHABILITY_AUDIT_2026-02-20.md.

Canonical vs Legacy Scripts

Canonical project workflows:

scripts/eval_harness/*: maintained evaluation pipeline for run/judge/validate/report.
scripts/demo/quickstart.sh: canonical end-to-end local smoke/demo flow.

Legacy helpers (kept for compatibility and ad-hoc smoke checks):

scripts/evals/*: non-canonical helpers; do not treat as release gate source of truth.

When in doubt, use scripts/eval_harness/* and scripts/demo/quickstart.sh.

Usage

List supported profiles:

quickthink list-models

List preset routing profiles:

quickthink list-presets

Show officially supported compatibility models:

quickthink compatibility

Ask with compressed planning:

quickthink ask "How would a cow round up a border collie?" --model qwen2.5:1.5b --preset balanced

Show plan in terminal:

quickthink ask "How would a cow round up a border collie?" --model mistral:7b --show-plan

Switch to two-pass mode:

quickthink ask "How would a cow round up a border collie?" --mode two_pass --show-route --show-plan

Show routing diagnostics:

quickthink ask "Design a robust parser with tradeoffs and a JSON output schema" --show-route --show-plan

Optional continuity hint (tiny, off by default):

quickthink ask "Continue the previous structure" --continuity-hint "ctx:prior_goal,format_json"

Strict-format-safe lane policy (routes strict format tasks to direct path first):

quickthink ask "json only: {\"ok\":true,\"why\":\"short\"}" --lane-policy strict_safe --show-route

Benchmark with strict-safe lane policy:

quickthink bench "Answer with YES or NO only: Is 2+2=4?" --lane-policy strict_safe --runs 3

Log plan + metrics as JSONL metadata:

quickthink ask "Design a tiny retry strategy" --log-file ./logs/quickthink.jsonl

Benchmark all three modes (lite, two_pass, direct):

quickthink bench "Design a robust parser for CSV with malformed quotes" --model qwen2.5:1.5b --runs 3

One-Command Quickstart Demo

Run full local demo setup and artifact generation:

bash scripts/demo/quickstart.sh

It does:

Python env + package install
ollama pull for supported models
Sample A/B/C eval run
Result validation
Markdown/HTML report generation
Compatibility snapshot update

For a one-minute terminal walkthrough command set, see docs/demo/QUICK_DEMO.md.

Optional environment flags:

QUICKTHINK_PRESET=fast|balanced|strict
QUICKTHINK_LIMIT=<n> (number of prompts from canonical set)
QUICKTHINK_RUNS=<n>
QUICKTHINK_RUN_JUDGE=1 (switch judge backend from rule to ollama)
QUICKTHINK_JUDGE_MODEL=<model>

Troubleshooting

For common setup/runtime failures and fixes, see docs/TROUBLESHOOTING.md.

Reports

Canonical report flow:

python3 scripts/eval_harness/run_suite.py \
  --prompt-set docs/evals/prompt_set.jsonl \
  --out docs/evals/results/run-<timestamp>.jsonl \
  --manifest-out docs/evals/results/manifest-<timestamp>.json \
  --runs 3

python3 scripts/eval_harness/judge_suite.py \
  --prompt-set docs/evals/prompt_set.jsonl \
  --results docs/evals/results/run-<timestamp>.jsonl \
  --out docs/evals/results/judged-<timestamp>.jsonl \
  --backend rule

python3 scripts/eval_harness/validate_judged_results.py \
  --path docs/evals/results/judged-<timestamp>.jsonl

python3 scripts/eval_harness/report_suite.py \
  --runs docs/evals/results/run-<timestamp>.jsonl \
  --judged docs/evals/results/judged-<timestamp>.jsonl \
  --out-json docs/evals/results/report-<timestamp>.json \
  --out-md docs/evals/results/report-<timestamp>.md \
  --out-html docs/evals/results/report-<timestamp>.html

Legacy helpers in scripts/evals/* remain available for smoke/demo use only.

Compatibility Matrix

Supported models are fixed to:
- qwen2.5:1.5b
- mistral:7b
- gemma3:27b
Experimental evaluations may include additional models (for example llama3.2:latest) in deployment-gate or variant-gate workflows. Treat those as research lanes unless promoted into SUPPORTED_MODELS in runtime config.
Regenerate matrix + snapshot with:

python3 scripts/evals/compat_matrix_snapshot.py

Launch local web UI (for eval/scaffolding testing):

quickthink ui

Then open http://127.0.0.1:7860 if it does not open automatically.

UI lane control:

Lane policy dropdown supports default and strict_safe for single-prompt runs and 3-mode comparisons.

UI eval safety gates:

Preflight is required before any eval run (validate_prompt_set.py must return status=OK).
Run-file ingestion is blocked unless validate_results.py returns status=OK.
UI displays validator output and dataset SHA256 for reproducible/comparable runs.

Latency goals

p50 overhead target: <80ms
p95 overhead target: <200ms

Tune by reducing plan budgets and enabling prompt bypass.

Productization path

Free/Open source:

Local middleware + SDK + CLI

Paid:

Hosted eval dashboards
Team policy/profile management
Managed observability and support

Public Repo Scope

Included in this public repository:

runtime source code (src/quickthink)
reusable evaluation harness (scripts/eval_harness, docs/evals prompt/spec files)
tests and release process notes

Excluded from public tracking:

internal multi-agent comms logs
generated eval result dumps and ad-hoc local traces
private experiment workspaces under experiments-local/

Branching

Keep version tracks isolated in codex/* branches.
Merge to main only after benchmarks and notes are updated.
See docs/VERSION_NOTES.md for version-to-version differences.

Maintainer Commands

Install (editable + dev):

python -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'

Test:

PYTHONPATH=src .venv/bin/pytest -q

Lint (basic syntax/import sanity):

python -m compileall src tests scripts

Release docs + checklist:

make release-check VERSION=x.y.z

Follow:

docs/release/RELEASE_CHECKLIST.md
docs/release/RELEASE_PROCESS.md
docs/release/SUPPLY_CHAIN_BASELINE_2026.md

Caveats

This does not guarantee better answers for every prompt.
Gains are model/task dependent; run evals before claiming improvements.
Hidden planning should remain auditable in logs for transparency.

License

Apache-2.0

About Hermes Labs

Hermes Labs builds AI audit infrastructure for teams deploying AI agents in regulated environments. All tools are released as open-source software — MIT or Apache-2.0, no SaaS tier. The audit work is paid; the code is not.

hermes-labs.ai

OSS audit stack

Layer	Tool	Description
Static audit	lintlang	Agent-config static lint (HERM + H1-H7)
Static audit	rule-audit	Rule-logic audit: contradictions + gaps
Static audit	scaffold-lint	Scaffold budget + technique stacking
Static audit	intent-verify	Spec-drift checks
Runtime observability	little-canary	Prompt injection detection
Runtime observability	suy-sideguy	Runtime policy guard
Runtime observability	colony-probe	Prompt confidentiality audit
Regression & scoring	hermes-jailbench	Jailbreak regression benchmark
Regression & scoring	agent-convergence-scorer	N-agent output consistency
Supporting infra	claude-router	Model-tier + scaffold router
Supporting infra	quickthink	Compressed planning scaffold for local LLMs
Supporting infra	langstate	Scaffold-aware context compression
Supporting infra	agent-gorgon	Tool-fabrication defense for Claude Code
Supporting infra	zer0dex	Dual-layer agent memory
Supporting infra	forgetted	Mid-conversation incognito
Dev tools	repo-audit	Launch-readiness auditor
Dev tools	quick-gate-python	Python quality gate
Dev tools	quick-gate-js	JS/TS quality gate
Dev tools	csv-quality-gate	CSV preflight validation

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.githooks		.githooks
.github		.github
docs		docs
experiments-local/registry		experiments-local/registry
research		research
scripts		scripts
src/quickthink		src/quickthink
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
FOLDER_MAP.md		FOLDER_MAP.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
llms.txt		llms.txt
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

quickthink

Agent-Findable Positioning (LLM/Search Friendly)

What this is / what this is not

Why

Features

Install

5-Minute Quickstart

Documentation Map

Repository Layout

Canonical vs Legacy Scripts

Usage

One-Command Quickstart Demo

Troubleshooting

Reports

Compatibility Matrix

Latency goals

Productization path

Public Repo Scope

Branching

Maintainer Commands

Caveats

License

About Hermes Labs

OSS audit stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

quickthink

Agent-Findable Positioning (LLM/Search Friendly)

What this is / what this is not

Why

Features

Install

5-Minute Quickstart

Documentation Map

Repository Layout

Canonical vs Legacy Scripts

Usage

One-Command Quickstart Demo

Troubleshooting

Reports

Compatibility Matrix

Latency goals

Productization path

Public Repo Scope

Branching

Maintainer Commands

Caveats

License

About Hermes Labs

OSS audit stack

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages