Evidence-aware reasoning layer for offensive security work.
Quarry ingests the artifacts security work actually produces — recon output, HTTP traffic, JavaScript bundles, scan results — into a local index, then lets a language model reason over your engagement instead of a generic methodology.
Status: alpha (0.1.0-alpha.8). The architecture and the core ingestion / retrieval / analytical surface are working; the CLI shape is still moving and may change between alpha releases until 1.0.
The analytical commands (
quarry analyze,quarry finding) introduced in M2.5 are the most likely to shift — seeROADMAP.mdanddocs/adr/0001-analytical-layer.md.
Anyone whose work shape involves generating recon and traffic artifacts across multiple engagements and needing an LLM to reason over those artifacts rather than hallucinate from training data:
- Bug bounty hunters — public programs, often many in parallel.
- Pentesters and red teamers — internal engagements where data cannot leave the operator's machine.
- Security researchers — labs, CTFs, training environments, vulnerability research where the same target gets revisited.
The default target kind is bug-bounty because that's the most
common public-facing use case, but --kind private (consulting /
internal pentest) and --kind lab (CTF / HTB / training) are
first-class — the storage, retrieval, and MCP layers don't care
which.
- A local-first CLI that ingests recon and traffic artifacts (subfinder, httpx, katana, nuclei, Burp/Caido project exports, raw JS bundles) into a structured corpus.
- A retrieval layer over that corpus — full-text search with
structured filters (per-target, per-source, per-time) — so an
LLM can answer questions like "which endpoints handle auth,"
"what changed since last week," "which JS bundles reference
admin paths" against your actual evidence. Hybrid FTS + vector
retrieval lands in M2 (see
ROADMAP.md). - BYOK to whatever model you want (Anthropic, OpenAI, local Ollama, any OpenAI-compatible endpoint). Nothing is sent to a Quarry-operated service. There is no Quarry-operated service.
- Not a scanner. Quarry reads what your tools produced; it doesn't send packets.
- Not an autonomous agent. It will not file reports, submit findings to a bounty platform, or take actions against targets. The reasoning loop is human-in-the-loop by design. The line on ML-driven recommendations is codified as ADR 0003: ML output that lands as Candidate or Evidence text is fine; ML output that ranks, filters, or routes the operator's attention is not.
- Not a methodology framework. If you want a phase-by-phase walkthrough, use a different tool. Quarry assumes you already know what you're doing and want leverage on the parts that don't scale: reading, cross-referencing, and remembering.
- Not a SaaS. The architecture is local-first and stays that way.
Four commands, all read-only, all deterministic, all over your local corpus. None of them call out to a model.
quarry analyze regression— diffs the latest two runs of httpx / katana / burp per Target and flags every URL whose status, content-type, server, or title changed. The200 → 401flip is signal; the401 → 200flip is the signal.quarry analyze jsdelta— pairs the latest two ingestions of each JS bundle and surfaces the endpoints, role tokens, and suspected secrets that appeared (or disappeared) between them. New endpoints in a fresh bundle are where the offensive surface grew; the module weights added tokens accordingly.quarry analyze interesting— single-snapshot heuristics: 5xx hosts, precise webserver-version disclosures (nginx/1.21,Apache/2.4.62), non-production name patterns (*-stage*,*-uat*). Earns its keep on a fresh corpus, before there's a second run to compare against.quarry coverage— assets some discovery source surfaced but no live-probe source ever touched. Catches the silently-stranded-subdomain failure mode ("1,400 found, 300 probed, the rest dropped") before it eats an engagement.
Everything these modules emit is a Candidate, not a Finding —
a fact about the corpus, not a recommendation. Promote the ones
that hold up with quarry finding promote. The Finding lifecycle
is CLI-driven and never crosses Targets implicitly. See
docs/analyze.md and
docs/findings.md.
The tools already produce all the evidence the work needs; the gap is that nothing remembers it across sessions, engagements, or weeks. Quarry slots in as the layer that makes the loop close.
That's the across-time value proposition. There's a second one
that lives entirely inside a single session: a corpus that
adapters have triaged is queryable in ways the eyeball pass
isn't. Pattern queries (%console%, %mgmt%, name-cluster
expansion) and anomaly heuristics (alive hosts with no <title>,
single-instance webserver fingerprints, off-baseline tech stacks)
surface assets the volume triage scrolled past.
The shape is triage → mine → drill: run adapters, then
query the corpus for the structural signals the eye missed,
then probe the highest-value mining hits with single deliberate
requests. docs/workflow.md walks the
flow with synthesized examples.
Offensive security work involves data that is often sensitive, sometimes contractually restricted, and occasionally legally radioactive — bounty scope rules, NDA-bound pentest engagements, lab environments under acceptable-use terms. Sending your recon corpus to a third-party indexing service is a non-starter for serious operators. Quarry runs on your machine, talks to whatever model endpoint you configure, and stores everything in a local SQLite database you can inspect, back up, or shred.
The ninety-second demo — ingest two snapshots of the same Target and watch the regression module surface what changed:
cargo install --path crates/quarry-cli --locked
quarry init
quarry target add demo --kind lab
# two httpx runs of the same target, taken a few days apart
quarry ingest ~/recon/httpx-monday.jsonl
quarry ingest ~/recon/httpx-thursday.jsonl
quarry analyze regression
# Module regression produced 2 candidates for target "demo"
# … score=0.70 httpx:https://admin.example.com
# status 401 → 200; title "Sign in" → "Admin Console"
# … score=0.70 httpx:https://stage.example.com
# status 200 → 500That's the loop: the corpus remembers Monday's recon, Thursday's
recon shows up, and the analytical layer surfaces the auth wall
that dropped on admin.example.com without you having to diff
JSONL by eye.
For LLM-synthesised answers over the same corpus, set an API key
and use quarry ask:
export ANTHROPIC_API_KEY=sk-ant-...
quarry ask which endpoints look like authenticationThe full walkthrough — including the OpenAI-compatible flow (Ollama,
vLLM, OpenRouter) and the plain-text fallback for users without
the tools above — lives in docs/quickstart.md.
The supported --source adapters are subfinder, httpx,
katana, nuclei, jsbundle, burp, and responses
(curl-style header/body pairs or {url,status,headers,body}
JSONL). Caido lands later in M2 (see ROADMAP.md).
Quarry also ships an MCP server so an LLM host (Claude Code,
Claude Desktop, etc.) can drive its read-only retrieval surface
directly as tool calls — no inner quarry ask round-trip
required. Run quarry mcp and configure it in your host;
docs/mcp.md has the snippets.
For the analytical layer — comparing runs, surfacing JS
bundle deltas, and managing the Finding lifecycle from the
CLI — see docs/analyze.md and
docs/findings.md.
For engagement memos and free-text annotations —
methodology notes, post-foothold writeups, "what I tried"
retrospectives — quarry note add attaches notes to a Target
(no entity flag), an Asset (--asset HOSTNAME), or a Finding
(--finding ID). Bodies can come from a file (--from-file PATH) or stdin (--from-stdin), so engagement memos don't
have to round-trip through shell argv. Notes are indexed
alongside evidence so quarry search and quarry ask
retrieve them in the same pass. See
docs/notes.md.
For cross-target lookup — "where have I seen this
hostname / tech / webserver before across all engagements" —
quarry recall spans every Target in the corpus and
aggregates matches per-engagement. Pairs with
quarry program triage over the H1 / Bugcrowd / Intigriti /
YesWeHack catalog (quarry program ingest --source arkadiyt-h1) for picking the next engagement by methodology
fit (scope shape, ROE flags, response efficiency) rather than
by name recognition.
Quarry is licensed under the GNU Affero General Public License v3.0.
See LICENSE for the full text.
For commercial licensing — including embedding Quarry in proprietary
products or offering it as a hosted service without releasing
modifications under the AGPL — see LICENSING.md.
If you believe you've found a vulnerability in Quarry itself, see
SECURITY.md. Do not file public issues for security
matters.
See CONTRIBUTING.md. Contributions require a
Contributor License Agreement; this preserves the dual-license model
that makes future commercial licensing possible.