Mantis

A unified perception-reasoning-action agent for computer use. Given a structured plan, Mantis drives a real browser (or any Xvfb-rendered application), takes actions, extracts structured data, and produces both a JSON result and an optional polished video walkthrough.

       ┌──────────────────────┐         ┌─────────────────────────┐
3p ──► │ Mantis CUA service   │ ──────► │ Target app (Chrome,     │
caller │ Holo3 + Claude       │         │ file manager, terminal, │
       │ /v1/predict          │         │ LibreOffice, …)         │
       └──────────┬───────────┘         └─────────────────────────┘
                  │
                  ▼
       ┌──────────────────────┐
       │ Result + lead CSV +  │
       │ polished screencast  │
       └──────────────────────┘

What you get

Reliable multi-step plans. A structured MicroPlanRunner enforces section / gate / loop semantics so even small models behave on long workflows.
Cheap inference at click latency. Holo3-35B-A3B (GGUF on a single GPU) for tactical click / scroll / type / drag actions; Anthropic Claude only for surgical reasoning steps (extract / verify / ground a click).
Real browser, real desktop. Xvfb + Chrome + xdotool. No Playwright fingerprints. Works against sites with bot detection.
Cloud-portable. Same image runs on Baseten, Modal, EKS, GKE, or your own Docker host.
Multi-tenant out of the box. Per-key auth, per-tenant rate limits, idempotency keys, HMAC-signed webhooks, URL allowlists, Prometheus metrics.
Screencast included. Every run can produce a title-card → captioned-run-with-action-overlays → outro video that's ready to share.

Verified end-to-end

Path	Run	Result
Modal	3-listing extraction	2 / 3 leads with phone, ~$0.42, 13 min
Baseten	3-listing extraction	3 / 3 leads with phone, ~$0.42, 9.5 min

Both deployments produce structured JSON rows (year / make / model / price / phone / url) for every successfully extracted listing.

Documentation

The full docs site is at mercurialsolo.github.io/mantis (or mkdocs serve from this checkout).

If you want to…	Go here
Try it in 5 minutes	Quickstart
See what Mantis is good at	Use cases — listings, jobs, real-estate, products, news, CRM edits, refunds, social posting, multi-app workflows
Copy-paste a working plan	Recipes — 11 worked examples
Understand the architecture	Concepts · Architecture
Deploy your own instance	Hosting — Baseten / Modal / EKS / GKE / local
Integrate from your app	Client — auth, plans, polling, recordings
Run a multi-tenant fleet	Operations — tenant keys, rate limits, webhooks, metrics
Look up an HTTP endpoint	API reference
Replace a Claude-CUA-style backend	Any-agent integration

Quick start (no deploy needed)

The reference deployment is live on Baseten. With a tenant token from your operator:

export ENDPOINT="https://model-qvvgkneq.api.baseten.co/production"
export BASETEN_API_KEY="..."
export MANTIS_API_TOKEN="..."

curl -fsS -X POST "$ENDPOINT/v1/predict" \
  -H "Authorization: Api-Key $BASETEN_API_KEY" \
  -H "X-Mantis-Token: $MANTIS_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "detached": true,
    "micro": "plans/example/extract_listings.json",
    "state_key": "first-run",
    "max_cost": 2,
    "max_time_minutes": 20,
    "record_video": true
  }'

Poll for {action: "status"} until the run terminates, then {action: "result"} to fetch leads, then GET /v1/runs/<run_id>/video for the screencast. Full walkthrough in the Quickstart.

Install footprint

The base install is intentionally slim. Pick the extras for your use case:

pip install -e .                  # ~5 MB — Pillow only
pip install -e ".[orchestrator]"  # ~10 MB — MicroPlanRunner + remote Holo3 client
pip install -e ".[server]"        # ~15 MB — FastAPI server + Prometheus
pip install -e ".[local-cua]"     # ~2 GB  — torch + transformers + pyautogui
pip install -e ".[full]"          # everything
pip install -e ".[docs]"          # mkdocs + material theme

Repository layout

src/mantis_agent/             core library
  api_schemas.py              PredictRequest + plan validation + caps
  baseten_server.py           FastAPI: /v1/predict + /v1/chat/completions + /metrics
  brain_holo3.py              Holo3 inference client
  brain_claude.py             Claude inference client
  extraction.py               ClaudeExtractor (structured data)
  grounding.py                ClaudeGrounding (refine click coordinates)
  gym/
    base.py                   GymEnvironment ABC
    runner.py                 GymRunner — sync agent loop
    micro_runner.py           MicroPlanRunner — structured plan executor
    xdotool_env.py            Xvfb + Chrome + xdotool driver
  presentation.py             Cards, captions, action overlays for the polished video
  recorder.py                 ffmpeg x11grab wrapper
  rate_limit.py               Token bucket + concurrency gauge per tenant
  idempotency.py              Idempotency-Key cache (24h TTL, file-backed)
  webhooks.py                 HMAC-signed run-completion callbacks
  metrics.py                  Prometheus counters / gauges / histograms
  tenant_auth.py              JSON keys file → TenantConfig

deploy/                       cloud paths
  modal/                      Modal entry-points (modal_cua_server.py, ...)
  baseten/                    Baseten Truss configs (holo3, gemma4, ...)
  aws/                        EKS — Terraform + k8s manifests + runbook
  gke/                        GKE — Terraform + k8s manifests + runbook

docker/
  cua.Dockerfile              Local CUA loop (Xvfb + xdotool + Chromium)
  server.Dockerfile           Production FastAPI server (CUDA + llama.cpp + Holo3)

docs/                         MkDocs site source
  index.md  api.md  architecture.md
  getting-started/  hosting/  client/  operations/  integrations/  reference/  appendix/
  diagrams/                   FigJam-rendered architecture diagrams

scripts/                      CLI helpers (run_*.py, monitor_*.sh, baseten_workload.py)
plans/                        plan files (.txt, .json)
tasks/                        task descriptors
benchmarks/                   OSWorld / VWA benchmark harnesses
training/                     distillation + fine-tuning configs
tests/                        pytest suite

Development

# Install dev + docs extras
pip install -e ".[dev,docs]"

# Run tests
pytest tests/ -q

# Lint
ruff check .

# Build the docs site locally
mkdocs serve   # → http://127.0.0.1:8000

License

Apache License 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 538 Commits
.github		.github
benchmarks		benchmarks
deploy		deploy
docker		docker
docs		docs
examples		examples
scripts		scripts
src/mantis_agent		src/mantis_agent
tests		tests
training		training
.env.example		.env.example
.gitignore		.gitignore
.hud_eval.toml		.hud_eval.toml
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
env.py		env.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mantis

What you get

Verified end-to-end

Documentation

Quick start (no deploy needed)

Install footprint

Repository layout

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mantis

What you get

Verified end-to-end

Documentation

Quick start (no deploy needed)

Install footprint

Repository layout

Development

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages