Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
81a1dc6
chore: restructure LLM sandbox into sandbox/llm-box/
quiet-node Apr 17, 2026
6ddf43b
feat: added search box
quiet-node Apr 17, 2026
cc7067c
feat(search): refactor UI Polish with LoadingStage, sources list, and…
quiet-node Apr 18, 2026
8a19d43
fix(search): inject today's date into synthesis prompt to prevent sta…
quiet-node Apr 18, 2026
9d29d3d
feat(search): rerank SearXNG results with BM25F + RRF
quiet-node Apr 18, 2026
9980f0f
feat(sandbox): add Trafilatura reader service
quiet-node Apr 18, 2026
48d14e2
chore: gitignore .superpowers brainstorm working dir
quiet-node Apr 18, 2026
4c8ea02
chore(sandbox): split reader requirements into prod vs dev
quiet-node Apr 18, 2026
29951f0
chore(sandbox): add reader service to search-box compose
quiet-node Apr 18, 2026
542ddca
fix(sandbox): align reader healthcheck urllib timeout with Docker out…
quiet-node Apr 18, 2026
e6bd38a
feat(search): add agentic-loop content types and event variants
quiet-node Apr 18, 2026
2986d6f
feat(search): add config module with loop and timeout constants
quiet-node Apr 18, 2026
5ef45bf
feat(prompts): add universal sufficiency judge and update synthesis
quiet-node Apr 18, 2026
96595a5
feat(search): add retry-once helper and transient error classifier
quiet-node Apr 18, 2026
3dd1318
feat(search): add chunker for reader output
quiet-node Apr 18, 2026
9d43f78
feat(search): add chunk-level BM25 rerank
quiet-node Apr 18, 2026
4a1b3a3
feat(search): add judge verdict parser and normalizer
quiet-node Apr 18, 2026
f7102fa
feat(search): add Trafilatura reader HTTP client
quiet-node Apr 18, 2026
7ab3cac
feat(search): add merged router+judge and universal judge calls
quiet-node Apr 18, 2026
614385c
feat(search): add parallel SearXNG query fanout with dedup
quiet-node Apr 18, 2026
7beac28
refactor(search): add agentic pipeline skeleton with trait seams
quiet-node Apr 18, 2026
3ec6782
feat(search): wire initial search round in run_agentic
quiet-node Apr 18, 2026
3b2c181
fix(search): add reader_base_url param and coverage tests for Task 14
quiet-node Apr 18, 2026
d0a22fc
feat(search): bounded gap-analysis loop in run_agentic
quiet-node Apr 18, 2026
54b4049
feat(search): swap pipeline to agentic run, retire legacy router
quiet-node Apr 18, 2026
6521246
feat(db): persist search warnings and metadata with each turn
quiet-node Apr 18, 2026
f5ed46f
feat(types): mirror agentic-search event variants and warnings
quiet-node Apr 19, 2026
7917bdb
feat(frontend): add warning copy and severity maps
quiet-node Apr 19, 2026
bb0913d
feat(frontend): update stage labels to Analyzing query and refining c…
quiet-node Apr 19, 2026
f8502aa
feat(frontend): add search warning icon with tooltip
quiet-node Apr 19, 2026
9a0a78f
feat(frontend): wire warning accumulation and icon into search turns
quiet-node Apr 19, 2026
b5198b8
test(search): end-to-end integration tests for agentic loop
quiet-node Apr 19, 2026
37a0426
test(chunker): cover all-whitespace input and flush-before-oversized
quiet-node Apr 19, 2026
4b8ed50
test(judge): cover unbalanced opening brace in JSON extraction
quiet-node Apr 19, 2026
d008355
test(llm): cover non-success HTTP status from Ollama chat endpoint
quiet-node Apr 19, 2026
577f255
test(searxng): cover empty-queries guards in search_all variants
quiet-node Apr 19, 2026
99b2711
test(reader): cover failed branch, non-JSON decode, thin wrappers
quiet-node Apr 19, 2026
76b82dc
chore: ignore Python dev artifacts in sandbox services
quiet-node Apr 19, 2026
5955308
docs(sandbox): add READMEs for searxng and reader plus main.py docstr…
quiet-node Apr 19, 2026
930cb85
test(pipeline): close all coverage gaps to reach 100% line coverage
quiet-node Apr 19, 2026
6b98364
fix(search): router retries then falls back on malformed JSON
quiet-node Apr 19, 2026
2ffc3e1
fix(frontend): warning icon uses shared Tooltip with pointer cursor
quiet-node Apr 19, 2026
cb6f714
fix(search): align Sources footer with synthesis citation indices
quiet-node Apr 19, 2026
c39bce5
feat(frontend): round-aware stage labels during gap refinement
quiet-node Apr 19, 2026
a2fae29
feat(prompts): synthesis prompt encourages substantive answers
quiet-node Apr 19, 2026
b2deb5e
feat(prompts): escalate reader and show few-shot example to fight laz…
quiet-node Apr 19, 2026
fdf99f8
fix(prompts): router must use web search unless history literally con…
quiet-node Apr 19, 2026
20c7bc9
docs(search): add comprehensive agentic-search reference and wire int…
quiet-node Apr 19, 2026
8c1ee7a
refactor(prompts): rename search_router_merged to search_plan
quiet-node Apr 19, 2026
3efc335
fix(search): IterationCapExhausted fires only after completing MAX_IT…
quiet-node Apr 19, 2026
00936e2
chore: ignore coverage profraw artifacts and format one-line constant
quiet-node Apr 19, 2026
37da938
feat(search): sandbox health check with setup error UI
quiet-node Apr 19, 2026
72fe768
feat: show search traces to UI
quiet-node Apr 20, 2026
4f1f64f
fix(search): raise TOP_K_URLS to 10 to match MAX_RESULTS and prevent …
quiet-node Apr 20, 2026
2b03849
fix: udate thinking process
quiet-node Apr 20, 2026
6df1040
fix(search): add retry+fallback to call_judge for LLM JSON hallucination
quiet-node Apr 21, 2026
709a1eb
fix(search): correct reader_base_url and update trace messaging
quiet-node Apr 21, 2026
7822295
docs: rewrite agentic-search.md
quiet-node Apr 21, 2026
691e768
feat(ui): add tooltips to command suggestions, new icons for /search …
quiet-node Apr 21, 2026
33dacc4
test: achieve 100% backend coverage for search pipeline and fix front…
quiet-node Apr 21, 2026
838b9a0
test(coverage): achieve 100% line coverage in commands.rs - Replace m…
quiet-node Apr 21, 2026
fb15418
fix(lint): turn off @eslint-react/dom/no-flush-sync in ESLint config
quiet-node Apr 21, 2026
3841d44
fix(lint): remove flushSync to eliminate @eslint-react/dom-no-flush-s…
quiet-node Apr 21, 2026
b770039
fix(lint): remove flushSync to eliminate @eslint-react/dom-no-flush-s…
quiet-node Apr 21, 2026
a9b9d5d
fix: prevent empty /search history short-circuit
quiet-node Apr 21, 2026
acc42ff
test: stabilize backend cancellation stream test
quiet-node Apr 21, 2026
91fb959
test: cover blank router history placeholder
quiet-node Apr 21, 2026
1585566
docs: reorganize search setup and improve consumer error UX
quiet-node Apr 22, 2026
d46d4d0
fix: tighten search router clarify handling
quiet-node Apr 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,19 @@ coverage
# Rust build output
target/

# Python dev environments (reader sandbox pytest venv, etc.)
.venv/
__pycache__/
*.pyc
.pytest_cache/

# Git worktrees
.worktrees
.claude/worktrees
.gstack/

# Superpowers generated docs (design specs, implementation plans) never commit
# Superpowers generated docs (design specs, implementation plans): never commit
docs/superpowers/
# Superpowers brainstorming visual-companion working files
.superpowers/
*.profraw
33 changes: 26 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@


<h1 align="center">
Thuki - WIP
</h1>
Expand All @@ -16,7 +14,6 @@
A floating AI secretary for macOS. Fully local, completely free, zero data ever leaves your machine.
</p>


<p align="center">
<img src="https://img.shields.io/badge/status-beta-yellow.svg" alt="Beta" />
<a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" alt="License" /></a>
Expand Down Expand Up @@ -48,7 +45,6 @@ Double-tap Control <kbd>⌃</kbd> to summon Thuki from anywhere. Ask a question,

https://github.com/user-attachments/assets/57df0efe-24eb-4875-a83d-e605e0c6f8b4


### Overlay Mode

Thuki floats above every app, including fullscreen ones. Highlight text anywhere, double-tap Control <kbd>⌃</kbd>, and Thuki opens with your selection pre-filled as a quote, ready to ask about.
Expand Down Expand Up @@ -137,7 +133,30 @@ bun run sandbox:stop

For the full architecture and security philosophy behind the sandbox, see [`sandbox/README.md`](sandbox/README.md).

### Step 2: Install Thuki
### Step 2: Setup the search sandbox (Optional, required for /search)

The `/search` command uses an agentic search pipeline that depends on two local Docker containers: a **SearXNG** meta-search engine and a **Trafilatura** reader. This setup ensures that your search queries and the content you read remain entirely local.

**Prerequisite:** [Docker Desktop](https://www.docker.com/get-started) must be running.

1. **Start the search services**

```bash
bun run search-box:start
```

2. **Verify services (Optional)**

```bash
# Search Engine check:
curl "http://127.0.0.1:25017/search?q=thuki&format=json"
```

Without this service running, the `/search` command will be disabled in the chat, but all other features will remain available.

For more details on the agentic search pipeline, see [docs/agentic-search.md](docs/agentic-search.md).

### Step 3: Install Thuki

#### Download (Recommended)

Expand Down Expand Up @@ -210,8 +229,8 @@ Contributions are welcome! Read [CONTRIBUTING.md](CONTRIBUTING.md) to get starte

Thuki is macOS-only, but the community has been busy bringing it to other platforms. Huge shoutout to these contributors 🎊🚀!

| Platform | Repo | Author |
|----------|------|--------|
| Platform | Repo | Author |
| ------------- | -------------------------------------------------- | -------------------------------------------- |
| Windows 10/11 | [ThukiWin](https://github.com/ayzekhdawy/thukiwin) | [@ayzekhdawy](https://github.com/ayzekhdawy) |

> Each port is independently maintained by its author. For issues or questions about a specific port, head to that repo directly.
Expand Down
329 changes: 329 additions & 0 deletions docs/agentic-search.md

Large diffs are not rendered by default.

6 changes: 4 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,10 @@
"format": "prettier --write \"src/**/*.{ts,tsx,css}\" && cd src-tauri && cargo fmt",
"format:check": "prettier --check \"src/**/*.{ts,tsx,css}\" && cd src-tauri && cargo fmt -- --check",
"typecheck": "tsc --noEmit",
"sandbox:start": "docker compose -f sandbox/docker-compose.yml up -d",
"sandbox:stop": "docker compose -f sandbox/docker-compose.yml down -v",
"llm-box:start": "docker compose -f sandbox/llm-box/docker-compose.yml up -d",
"llm-box:stop": "docker compose -f sandbox/llm-box/docker-compose.yml down -v",
"search-box:start": "docker compose -f sandbox/search-box/docker-compose.yml up -d --build",
"search-box:stop": "docker compose -f sandbox/search-box/docker-compose.yml down",
"test": "vitest run",
"test:watch": "vitest",
"test:coverage": "vitest run --coverage",
Expand Down
6 changes: 3 additions & 3 deletions sandbox/README.md → sandbox/llm-box/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ The sandbox separates model initialization from the inference runtime, keeping c
| **Breakout Mitigation** | Active | `cap_drop: ALL` strips every Linux kernel capability |
| **Privilege Control** | Active | `no-new-privileges: true` blocks setuid/setgid escalation |
| **Read-Only Filesystem** | Active (inference only) | `sandbox-server` root filesystem is read-only; only `/tmp` is writable. `sandbox-init` requires write access to pull the model. |
| **Ephemeral Lifecycle** | Active | `bun run sandbox:stop` runs `down -v`, permanently destroying all model weights |
| **Ephemeral Lifecycle** | Active | `bun run llm-box:stop` runs `down -v`, permanently destroying all model weights |
| **Non-Executable Weights** | Active | GGUF format is math-only; no Python/Pickle code execution risk |

> **Note on network egress:** The sandbox does not use `internal: true` on the Docker network. On macOS, Docker Desktop's networking layer does not support `internal: true` alongside host port binding, so the isolation strategy relies on `127.0.0.1` ingress restriction, `cap_drop: ALL`, and the read-only filesystem instead. Outbound connections from the container are not hard-blocked at the network level.
Expand All @@ -38,15 +38,15 @@ The sandbox is intended for:
**Start the sandbox:**

```bash
bun run sandbox:start
bun run llm-box:start
```

The first run pulls the model inside the init container, which may take several minutes depending on your connection. Subsequent starts are instant.

**Stop and wipe the sandbox:**

```bash
bun run sandbox:stop
bun run llm-box:stop
```

This runs `docker compose down -v`, which destroys the Docker volume and permanently removes all downloaded model weights from disk. Nothing persists after this command.
File renamed without changes.
65 changes: 65 additions & 0 deletions sandbox/search-box/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# ==============================================================================
# Search Sandbox: SearXNG Local Search Engine
#
# Provides a privacy-respecting, locally-hosted meta-search engine for Thuki's
# /search command. Aggregates results from Google, Bing, DuckDuckGo, Brave,
# and many specialized engines without rate limiting (local use only).
#
# SECURITY CHECKLIST:
# [x] NETWORK INGRESS: 127.0.0.1 binding - no external access
# [x] PRIVILEGE ESCALATION: no-new-privileges enforced
# [x] CAPABILITY RESTRICTION: only CHOWN/SETGID/SETUID retained (required by uwsgi)
# [ ] RATE LIMITING: intentionally disabled for local performance
# ==============================================================================

services:
searxng:
image: searxng/searxng:latest
container_name: thuki-searxng
restart: unless-stopped
ports:
- "127.0.0.1:25017:8080"
volumes:
- ./searxng:/etc/searxng:rw
environment:
- SEARXNG_BASE_URL=http://127.0.0.1:25017/
cap_drop:
- ALL
cap_add:
- CHOWN
- SETGID
- SETUID
security_opt:
- no-new-privileges:true
networks:
- search_net

reader:
build:
context: ./reader
image: thuki-reader:local
container_name: thuki-reader
restart: unless-stopped
ports:
- "127.0.0.1:25018:8000"
networks:
- search_net
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
read_only: true
tmpfs:
- /tmp:size=16m
mem_limit: 512m
cpus: 1.0
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request,sys;sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8000/healthz',timeout=4).status==200 else 1)"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s

networks:
search_net:
driver: bridge
25 changes: 25 additions & 0 deletions sandbox/search-box/reader/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
FROM python:3.12-slim AS base

ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
&& rm -rf /var/lib/apt/lists/* \
&& addgroup --system --gid 10001 reader \
&& adduser --system --uid 10001 --gid 10001 --home /nonexistent --no-create-home reader

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY main.py .

USER reader:reader

EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
132 changes: 132 additions & 0 deletions sandbox/search-box/reader/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Reader service

Trafilatura-based URL-to-markdown extractor. Second stop of Thuki's agentic `/search` pipeline.

## What it does

Takes a URL, fetches the page, strips boilerplate (navigation, ads, footers, cookie banners), and returns clean markdown the synthesis LLM can cite against.

```
POST /extract { "url": "https://example.com/article" }
-> { "url": "...",
"title": "Page title",
"markdown": "# Article text\n\nCleaned body...",
"status": "ok" | "empty" }
```

## Why Thuki needs it

SearXNG returns URLs plus short snippets (usually the first 150-200 chars of the page). For many queries, snippets are enough. For questions like "compare tokio vs async-std benchmarks in 2026," the answer lives deep inside blog posts and docs pages that snippets never surface.

The pipeline's judge decides snippet sufficiency after the initial SearXNG round. When the judge returns `Partial` or `Insufficient`, the reader is called to fetch the top URLs in full and hand rich text to the next judge round. This is the classic "RAG reader" pattern from Perplexity, Exa, and the CRAG / Self-RAG literature.

## Why Trafilatura

HTML boilerplate removal is a surprisingly hard problem. Naive approaches (strip `<nav>`, `<footer>`) fail on modern SPAs where everything is `<div>`. Getting it right requires heuristics built over years of research. Two parallel research agents independently landed on Trafilatura as the best-in-class open-source solution:

- **F1 ~0.95** on the ScrapingHub article extraction benchmark, top of the field.
- Apache 2.0 license.
- Production use at HuggingFace, IBM, Microsoft Research, Stanford, EU Parliament.
- Pure Python, no browser, tiny attack surface.

We considered and rejected: Firecrawl (AGPL-3.0 blocks bundling), Jina Reader cloud (proxies every URL through Jina's servers, violates privacy), Crawl4AI (Chromium in container, 4 GB RAM, CVE history), ScrapeGraphAI / ReaderLM-v2 (LLM per page, wrong shape), DIY Playwright (SSRF surface without extraction value), most Rust readability crates (weaker extraction, Jan 2025 benchmark showed many return empty strings on real pages).

## How it fits into the pipeline

```
snippets judge returns Partial / Insufficient
-> reader.fetch_batch_cancellable(&top_urls, &cancel)
-> POST /extract for each URL in parallel (semaphore-bounded, 5 in flight)
-> Trafilatura extraction per page
-> chunker splits markdown into ~500-token chunks
-> BM25 rerank picks top chunks for the query
-> chunks judge decides sufficiency
-> synthesis OR gap-query loop
```

The Rust `search::reader::ReaderClient` calls this service over HTTP. It races each call against a cancellation token and degrades gracefully when the reader container is unreachable (emits `ReaderUnavailable` warning, pipeline falls back to snippets).

## Architecture

Single-file FastAPI app (`main.py`, ~90 lines). One endpoint (`/extract`) and a healthz probe. Entire service fits in your head:

```
main.py
├── _validate_url -> SSRF guard (scheme + private-host blocklist)
├── fetch_html -> httpx stream with 8s timeout + 2MB byte cap
├── trafilatura.extract -> boilerplate removal, markdown conversion
└── trafilatura.extract_metadata -> page title
```

The Dockerfile is standard Python-slim hardening: non-root user, minimum install, no build tools in the final layer.

## Security posture

Enforced at three layers:

**App layer (`main.py`):**
- SSRF guard rejects non-http(s) schemes plus private, loopback, link-local, multicast, and reserved IP ranges (both IPv4 and IPv6) plus the literal string `"localhost"`.
- Byte cap: upstream fetch aborts once 2 MB is buffered. Prevents hostile servers from exhausting memory.
- Timeout: 8s hard ceiling on upstream fetch.
- Request body limits (URL max length 2048 chars, validated via Pydantic).

**Container layer (`docker-compose.yml`):**
- `cap_drop: ALL` (no capabilities, not even the reduced set SearXNG needs)
- `no-new-privileges: true`
- `read_only: true` root filesystem
- `tmpfs: /tmp:size=16m` for the minimal writable scratch area
- `mem_limit: 512m`, `cpus: 1.0`
- Bound to `127.0.0.1:25018` only

**Image layer (`Dockerfile`):**
- Runs as `reader:reader` (uid/gid 10001, system user, no home directory)
- Only `main.py`, `requirements.txt`, and pinned runtime deps land in the image
- No pytest, no dev tools, no compilers

## Files in this directory

| File | Purpose | Shipped? |
|---|---|---|
| `main.py` | The service code (FastAPI app) | Yes (production) |
| `Dockerfile` | Container build recipe | Yes (production) |
| `requirements.txt` | Pinned runtime deps (6 packages) | Yes (production) |
| `requirements-dev.txt` | Pinned test deps (pytest only) | No (local-only) |
| `test_main.py` | Unit tests (5 cases) | No (local-only) |

Dev artifacts like `.venv/` and `.pytest_cache/` are gitignored and never enter the image.

## Local development

```bash
# Bring up the reader container (also pulls the image on first run):
bun run sandbox:start

# Exercise the endpoint:
curl -sS -X POST http://127.0.0.1:25018/extract \
-H 'Content-Type: application/json' \
-d '{"url":"https://example.com/"}' | jq

# Healthcheck:
curl -sS http://127.0.0.1:25018/healthz

# Tear down:
bun run sandbox:stop
```

### Running pytest without Docker

```bash
cd sandbox/search-box/reader
python -m venv .venv
.venv/bin/pip install -r requirements.txt -r requirements-dev.txt
.venv/bin/python -m pytest test_main.py -v
```

`.venv/` and `.pytest_cache/` are gitignored.

## What the reader is not

- Not a browser. It does not render JavaScript. Pages that rely on client-side rendering come back as `status: "empty"`. This is tracked in pipeline telemetry; if empty-body rate gets high in production we add a Playwright fallback in v2.
- Not a crawler. One URL in, one markdown blob out. No link following, no sitemap parsing, no depth-limited traversal.
- Not a cache. Every call fetches fresh. Caching belongs upstream in the Rust pipeline if we ever need it.
- Not a general-purpose service. The endpoint accepts only http(s) URLs pointing at public hosts. Private networks and non-web schemes are rejected 400.
Loading