Two artifacts in one repo, both built around a fine-tuned vision-language model that detects illegal small-scale gold mining (galamsey) in Sentinel-2 imagery:
app/: a browser-native dashboard that runs the VLM via WebGPU on an enforcement officer's laptop. Live at galamseywatch.vercel.app. Click the map; nothing leaves the device.orchestrator/: a FastAPI service that runs a two-layer agentic-EO pipeline (VLM perception + tool-calling LLM policy) over a simulated satellite pass and decides, per tile, what's worth downlinking.
The contribution of the orchestrator isn't the galamsey detector. That's the worked example. The contribution is the architecture: a four-interface contract (VLMProvider, AgentPolicy, ImagerySource, Task) that any fork can reskin in a single day for wildfire detection, illegal fishing, oil spills, etc.
A follow-up result from May 2026: a single fine-tuned 450M LFM2.5-VL picks the action directly, removing the description-string bottleneck between the perception VLM and the LFM2 policy. A multitask SFT mixture lets one weight set serve all three jobs (action policy, grounding, scene description).
| System | Total params | 99-tile action-match accuracy |
|---|---|---|
| Two-layer (bare) | 3.05 B | 65.7 % |
| Two-layer (rich-context) | 3.05 B | 63.6 % |
Unified v3 (action-only LoRA) (samwell/galamsey-unified-v3) |
450 M | 76.8 % |
Unified v4.1 (multitask LoRA) (samwell/galamsey-unified-v4-1) |
450 M | 77.8 % |
+12.1 pp over the strongest baseline at 6.8× fewer parameters, with grounding mIoU and description BLEU both within noise of the specialist perception model. The win comes from three design choices: (a) the action LoRA stacks on top of the samwell/galamsey-v9-e3 perception fine-tune (frees capacity for action selection), (b) the assistant target for action examples is action-only (concentrates LM loss on the prediction), (c) the multitask mixture (327 action + 250 perception examples) preserves v9-e3's grounding and description ability inside the same LoRA.
Resources:
- Blog post (Cookbook-style recipe):
docs/blog_unified_vlm.md - Headline model (multitask):
samwell/galamsey-unified-v4-1 - Predecessor (action-only):
samwell/galamsey-unified-v3 - Dataset (250 hand-labeled tiles):
samwell/galamsey-unified-decisions
flowchart LR
subgraph User["User"]
Officer[Enforcement officer<br/>laptop]
end
subgraph Browser["app/ (browser, WebGPU)"]
Dashboard[Click-to-detect<br/>dashboard]
BrowserVLM[LFM2.5-VL-450M<br/>ONNX fp16]
end
subgraph Orchestrator["orchestrator/ (FastAPI)"]
Loop[Pass loop<br/>SSE streaming]
OrchVLM[LFM2.5-VL-450M<br/>transformers fp16]
Agent[LFM2-2.6B<br/>tool-calling]
end
SimSat[(DPhi SimSat<br/>Sentinel-2 imagery)]
Officer -->|click map| Dashboard
Dashboard -->|run inference| BrowserVLM
Dashboard -->|fetch tile| SimSat
Dashboard -.->|Agent Mode tab<br/>SSE| Loop
Loop -->|fetch tile| SimSat
Loop -->|perception| OrchVLM
Loop -->|policy| Agent
Loop -.->|live events| Dashboard
style BrowserVLM fill:#3b82f6,color:#fff
style OrchVLM fill:#3b82f6,color:#fff
style Agent fill:#a855f7,color:#fff
Both artifacts use the same fine-tuned LFM2.5-VL-450M (samwell/galamsey-v9-e3-onnx in browser, samwell/galamsey-v9-e3 in Python). Same model, same prompts, two runtimes.
# 1. Pull the v9-e3 weights (one-time, ~1 GB)
huggingface-cli download samwell/galamsey-v9-e3 \
--local-dir orchestrator/checkpoints/galamsey-v9-e3
# 2. Run the orchestrator (port 8765)
cd orchestrator && uv sync
uv run uvicorn agentic_eo.main:app --port 8765
# 3. Run the dashboard in another terminal (port 3000)
cd app && npm install && npm run dev
# 4. Open localhost:3000/dashboard → "Agent Mode" tab → Initiate PassA typical 6-tile pass over a galamsey hotspot completes in ~5 minutes wall-clock.
- orchestrator/README.md for the orchestrator runbook (architecture, swap interfaces, env vars, event schema, fork instructions).
- app/README.md for the browser dashboard runbook.
- Live demo: galamseywatch.vercel.app
MIT. See LICENSE.