Skip to content

pb3ck/quarry

quarry

Evidence-aware reasoning layer for offensive security work.

Quarry ingests the artifacts security work actually produces — recon output, HTTP traffic, JavaScript bundles, scan results — into a local index, then lets a language model reason over your engagement instead of a generic methodology.

Status: alpha (0.1.0-alpha.8). The architecture and the core ingestion / retrieval / analytical surface are working; the CLI shape is still moving and may change between alpha releases until 1.0.

The analytical commands (quarry analyze, quarry finding) introduced in M2.5 are the most likely to shift — see ROADMAP.md and docs/adr/0001-analytical-layer.md.

Who it's for

Anyone whose work shape involves generating recon and traffic artifacts across multiple engagements and needing an LLM to reason over those artifacts rather than hallucinate from training data:

  • Bug bounty hunters — public programs, often many in parallel.
  • Pentesters and red teamers — internal engagements where data cannot leave the operator's machine.
  • Security researchers — labs, CTFs, training environments, vulnerability research where the same target gets revisited.

The default target kind is bug-bounty because that's the most common public-facing use case, but --kind private (consulting / internal pentest) and --kind lab (CTF / HTB / training) are first-class — the storage, retrieval, and MCP layers don't care which.

What this is

  • A local-first CLI that ingests recon and traffic artifacts (subfinder, httpx, katana, nuclei, Burp/Caido project exports, raw JS bundles) into a structured corpus.
  • A retrieval layer over that corpus — full-text search with structured filters (per-target, per-source, per-time) — so an LLM can answer questions like "which endpoints handle auth," "what changed since last week," "which JS bundles reference admin paths" against your actual evidence. Hybrid FTS + vector retrieval lands in M2 (see ROADMAP.md).
  • BYOK to whatever model you want (Anthropic, OpenAI, local Ollama, any OpenAI-compatible endpoint). Nothing is sent to a Quarry-operated service. There is no Quarry-operated service.

What this isn't

  • Not a scanner. Quarry reads what your tools produced; it doesn't send packets.
  • Not an autonomous agent. It will not file reports, submit findings to a bounty platform, or take actions against targets. The reasoning loop is human-in-the-loop by design. The line on ML-driven recommendations is codified as ADR 0003: ML output that lands as Candidate or Evidence text is fine; ML output that ranks, filters, or routes the operator's attention is not.
  • Not a methodology framework. If you want a phase-by-phase walkthrough, use a different tool. Quarry assumes you already know what you're doing and want leverage on the parts that don't scale: reading, cross-referencing, and remembering.
  • Not a SaaS. The architecture is local-first and stays that way.

What it actually does that other tools don't

Four commands, all read-only, all deterministic, all over your local corpus. None of them call out to a model.

  • quarry analyze regression — diffs the latest two runs of httpx / katana / burp per Target and flags every URL whose status, content-type, server, or title changed. The 200 → 401 flip is signal; the 401 → 200 flip is the signal.
  • quarry analyze jsdelta — pairs the latest two ingestions of each JS bundle and surfaces the endpoints, role tokens, and suspected secrets that appeared (or disappeared) between them. New endpoints in a fresh bundle are where the offensive surface grew; the module weights added tokens accordingly.
  • quarry analyze interesting — single-snapshot heuristics: 5xx hosts, precise webserver-version disclosures (nginx/1.21, Apache/2.4.62), non-production name patterns (*-stage*, *-uat*). Earns its keep on a fresh corpus, before there's a second run to compare against.
  • quarry coverage — assets some discovery source surfaced but no live-probe source ever touched. Catches the silently-stranded-subdomain failure mode ("1,400 found, 300 probed, the rest dropped") before it eats an engagement.

Everything these modules emit is a Candidate, not a Finding — a fact about the corpus, not a recommendation. Promote the ones that hold up with quarry finding promote. The Finding lifecycle is CLI-driven and never crosses Targets implicitly. See docs/analyze.md and docs/findings.md.

How it fits

The tools already produce all the evidence the work needs; the gap is that nothing remembers it across sessions, engagements, or weeks. Quarry slots in as the layer that makes the loop close.

That's the across-time value proposition. There's a second one that lives entirely inside a single session: a corpus that adapters have triaged is queryable in ways the eyeball pass isn't. Pattern queries (%console%, %mgmt%, name-cluster expansion) and anomaly heuristics (alive hosts with no <title>, single-instance webserver fingerprints, off-baseline tech stacks) surface assets the volume triage scrolled past.

The shape is triage → mine → drill: run adapters, then query the corpus for the structural signals the eye missed, then probe the highest-value mining hits with single deliberate requests. docs/workflow.md walks the flow with synthesized examples.

Why local-first

Offensive security work involves data that is often sensitive, sometimes contractually restricted, and occasionally legally radioactive — bounty scope rules, NDA-bound pentest engagements, lab environments under acceptable-use terms. Sending your recon corpus to a third-party indexing service is a non-starter for serious operators. Quarry runs on your machine, talks to whatever model endpoint you configure, and stores everything in a local SQLite database you can inspect, back up, or shred.

Quickstart

The ninety-second demo — ingest two snapshots of the same Target and watch the regression module surface what changed:

cargo install --path crates/quarry-cli --locked

quarry init
quarry target add demo --kind lab

# two httpx runs of the same target, taken a few days apart
quarry ingest ~/recon/httpx-monday.jsonl
quarry ingest ~/recon/httpx-thursday.jsonl

quarry analyze regression
# Module regression produced 2 candidates for target "demo"
#   …  score=0.70  httpx:https://admin.example.com
#       status 401 → 200; title "Sign in" → "Admin Console"
#   …  score=0.70  httpx:https://stage.example.com
#       status 200 → 500

That's the loop: the corpus remembers Monday's recon, Thursday's recon shows up, and the analytical layer surfaces the auth wall that dropped on admin.example.com without you having to diff JSONL by eye.

For LLM-synthesised answers over the same corpus, set an API key and use quarry ask:

export ANTHROPIC_API_KEY=sk-ant-...
quarry ask which endpoints look like authentication

The full walkthrough — including the OpenAI-compatible flow (Ollama, vLLM, OpenRouter) and the plain-text fallback for users without the tools above — lives in docs/quickstart.md.

The supported --source adapters are subfinder, httpx, katana, nuclei, jsbundle, burp, and responses (curl-style header/body pairs or {url,status,headers,body} JSONL). Caido lands later in M2 (see ROADMAP.md).

Quarry also ships an MCP server so an LLM host (Claude Code, Claude Desktop, etc.) can drive its read-only retrieval surface directly as tool calls — no inner quarry ask round-trip required. Run quarry mcp and configure it in your host; docs/mcp.md has the snippets.

For the analytical layer — comparing runs, surfacing JS bundle deltas, and managing the Finding lifecycle from the CLI — see docs/analyze.md and docs/findings.md.

For engagement memos and free-text annotations — methodology notes, post-foothold writeups, "what I tried" retrospectives — quarry note add attaches notes to a Target (no entity flag), an Asset (--asset HOSTNAME), or a Finding (--finding ID). Bodies can come from a file (--from-file PATH) or stdin (--from-stdin), so engagement memos don't have to round-trip through shell argv. Notes are indexed alongside evidence so quarry search and quarry ask retrieve them in the same pass. See docs/notes.md.

For cross-target lookup — "where have I seen this hostname / tech / webserver before across all engagements" — quarry recall spans every Target in the corpus and aggregates matches per-engagement. Pairs with quarry program triage over the H1 / Bugcrowd / Intigriti / YesWeHack catalog (quarry program ingest --source arkadiyt-h1) for picking the next engagement by methodology fit (scope shape, ROE flags, response efficiency) rather than by name recognition.

License

Quarry is licensed under the GNU Affero General Public License v3.0. See LICENSE for the full text.

For commercial licensing — including embedding Quarry in proprietary products or offering it as a hosted service without releasing modifications under the AGPL — see LICENSING.md.

Security

If you believe you've found a vulnerability in Quarry itself, see SECURITY.md. Do not file public issues for security matters.

Contributing

See CONTRIBUTING.md. Contributions require a Contributor License Agreement; this preserves the dual-license model that makes future commercial licensing possible.

Packages

 
 
 

Contributors

Languages