███████╗██╗ ██╗███████╗██╗ ███████╗████████╗ ██████╗ ███╗ ██╗ ██╗ ██╗███████╗██╗ ██╗
██╔════╝██║ ██╔╝██╔════╝██║ ██╔════╝╚══██╔══╝██╔═══██╗████╗ ██║ ██║ ██╔╝██╔════╝╚██╗ ██╔╝
███████╗█████╔╝ █████╗ ██║ █████╗ ██║ ██║ ██║██╔██╗ ██║ █████╔╝ █████╗ ╚████╔╝
╚════██║██╔═██╗ ██╔══╝ ██║ ██╔══╝ ██║ ██║ ██║██║╚██╗██║ ██╔═██╗ ██╔══╝ ╚██╔╝
███████║██║ ██╗███████╗███████╗███████╗ ██║ ╚██████╔╝██║ ╚████║ ██║ ██╗███████╗ ██║
╚══════╝╚═╝ ╚═╝╚══════╝╚══════╝╚══════╝ ╚═╝ ╚═════╝ ╚═╝ ╚═══╝ ╚═╝ ╚═╝╚══════╝ ╚═╝
Lead Architect: rob-OSINT
Repository: github.com/rob-OSINT/skeleton-key
Version: v1.0.0
Language: Go 1.21
Licence: Authorised use only; see Legal Notice
Skeleton Key is a concurrent External Attack Surface Management (EASM) engine written in Go. It accepts one or more seed identifiers, dispatches parallel audit jobs across four specialised modules via a bounded worker pool, and produces a serialised identity graph representing the probabilistic correlation between discovered artefacts.
The underlying principle is that identity exposure is rarely contained to a single data source. Breach records, platform registries, authentication gateway responses, and file metadata each contribute partial signals; the analytical value of this framework is the correlation of those signals into a unified adjacency structure with calibrated edge weights.
Metadata doesn't lie, even when people do.
Digital identity artefacts accumulate across heterogeneous systems over time: credential records in breach corpora, account registrations on public platforms, EXIF payloads in uploaded files, and PII fragments in authentication gateway responses. Each source is, in isolation, a weak signal. Correlated across sources and weighted by evidential specificity, they form a probabilistic chain of attribution.
Skeleton Key operationalises this correlation process. Rather than treating each data source as a standalone lookup, the framework ingests findings from all active modules into a single relational graph. Nodes represent identity artefacts; edges represent the evidential relationship between them, scored by an Identity Collision Probability derived from the entropy and specificity of the shared attribute.
A hardware serial number match between two files approaches deterministic attribution. A shared platform handle is probabilistic. The edge weight schema formalises this distinction numerically, producing a graph where the strength of each linkage is explicit and queryable.
Credential Exposure Correlation
Seed Email → Breach Corpus Query → Credential Cluster → Entropy-Weighted Risk Score
The Breach Pivot module queries three breach intelligence APIs concurrently: Have I Been Pwned v3, BreachDirectory, and LeakIX. Results are normalised into a unified BreachRecord schema and aggregated into a BreachReport for the seed identifier.
Per-record processing covers the following:
Hash algorithm classification. Each password hash is classified as md5, sha1, sha256, bcrypt, or unknown via regular expression matching against known format signatures. The classification determines the crackability-weighted risk score. A plaintext exposure scores 1.00; an unsalted MD5 scores 0.90; a bcrypt hash scores 0.30, reflecting the computational cost differential of offline recovery across algorithm families.
Correlated IP extraction. IP addresses present in breach metadata are extracted, deduplicated, and registered as NodeIP vertices in the identity graph, linked by ip_correlation edges with a base weight of 0.78.
Linked address harvesting. Secondary email addresses co-appearing in breach records are extracted and linked to the seed node via breach_cooccurrence edges.
k-Anonymity password verification. The HIBP Pwned Passwords endpoint is queried using the k-anonymity model. Only the first five hexadecimal characters of SHA-1(plaintext) are transmitted; the full hash is never sent over the network. Corpus occurrence count is resolved locally against the returned suffix list.
Risk aggregation. The composite breach risk score is the arithmetic mean of per-record scores, multiplied by the amplifier 1 + (n × 0.05), where n is the total breach count, bounded at 1.0. This reflects the compounding exposure probability of repeated appearances across independent breach sources.
Platform Footprint Enumeration
Handle → Concurrent HTTP Probes (102 platforms) → Confirmed Presence Map → Footprint Breadth Score
The Shadow Asset Discovery module issues concurrent HTTP HEAD requests, with automatic GET fallback on 405 Method Not Allowed, to 102 public platforms organised into 15 risk-weighted categories. A bounded semaphore channel constrains probe concurrency to the value specified by --platform-con.
Each category carries a base risk weight reflecting the operational significance of that account class to an adversarial analyst:
| Category | Risk Weight | Basis for Weighting |
|---|---|---|
security |
0.95 |
Discloses tooling proficiency, bug bounty participation, and research focus |
professional |
0.85 |
High organisational PII density; employer, role, and network graph exposure |
developer |
0.80 |
Source repositories, commit history, SSH key material, package ownership chains |
dating |
0.80 |
Elevated PII sensitivity; physical description and approximate location disclosure |
crypto |
0.75 |
Wallet address association and transaction graph linkage |
forum |
0.65 |
Historical statements; pseudonymous cross-platform correlation |
social |
0.60 |
Network graph and temporal behavioural pattern data |
academic |
0.55 |
Institutional affiliation, research domain, and publication record |
fitness |
0.55 |
Recurring location pattern data extracted from activity records |
gaming |
0.50 |
Cross-platform handle correlation via shared pseudonym |
content |
0.50 |
Audience association and revenue source linkage |
video |
0.45 |
Temporal activity patterns and subscription network |
design |
0.40 |
Portfolio artefacts and client association |
music |
0.40 |
Collaborator network data |
niche |
0.35 |
Contextual interest and community affiliation profiling |
The Footprint Breadth Score is the sum of risk weights for all confirmed accounts divided by the total number of platforms probed, normalised to [0.0, 1.0].
Legacy accounts on developer and security platforms carry particular analytical relevance. An unmaintained GitHub account may retain public SSH key material or stale credentials; an npm package may carry dependency relationships to active production systems. These represent an unmonitored attack surface that conventional EASM tooling does not enumerate.
PII Leakage Detection in Authentication Flows
Seed Email → Recovery Endpoint Probe → Response Corpus Analysis → PII Fragment Classification
The Auth-Gateway Audit module probes password recovery endpoints via POST (application/x-www-form-urlencoded and application/json) and GET. The complete HTTP response, including body, response headers, and the Location redirect target, is passed through a ten-class regular expression scanner.
The objective is to detect PII fragments returned by authentication gateways in violation of data minimisation principles. Such fragments may be used to construct social engineering pretexts or to operate account enumeration oracles.
| Pattern Class | Severity | Score | Detection Basis |
|---|---|---|---|
FullEmailExposed |
CRITICAL | 1.00 |
Unmasked RFC 5321 address present in response body |
RecoveryTokenLeak |
CRITICAL | 1.00 |
Reset token or OTP string proximate to recognised trigger keyword |
AccountEnumeration |
HIGH | 0.80 |
Differential error text confirming or denying registration status |
MaskedEmail |
HIGH | 0.75 |
Partially masked address exposing domain structure and local-part length |
StackTraceExposure |
HIGH | 0.75 |
Stack trace fragment disclosing runtime environment and framework version |
PartialPhone |
HIGH | 0.70 |
Partially masked phone number reducing adversarial search space |
SecurityQuestionLeak |
HIGH | 0.70 |
Security question text enabling social engineering pretext construction |
InternalPathLeak |
MEDIUM | 0.60 |
Filesystem path fragment enabling infrastructure fingerprinting |
PhoneDigitSuffix |
MEDIUM | 0.55 |
Terminal four digits of phone number used in identity verification flows |
UsernameHint |
MEDIUM | 0.50 |
Account identifier proximate to recognised trigger keyword |
All matched fragments are redacted in report output via partial masking; the raw value is never persisted. The module additionally flags verbatim embedding of the seed email in Location response headers, which exposes the value in server access logs and HTTP referrer chains downstream of a redirect.
Hardware Fingerprinting via Metadata Extraction
File Input → TIFF/JPEG Binary Parse → GPS Coordinate + Hardware Signature → Risk Score
The EXIF Forensics module implements a binary parser for JPEG APP1 segments and TIFF IFD chains, built exclusively against encoding/binary from the Go standard library. No external parsing library is used and no network connection is made during execution. The parser walks the IFD0 to EXIF sub-IFD to GPS sub-IFD structure recursively, decoding field values according to their TIFF type codes and resolving inline versus offset-addressed data per the TIFF 6.0 specification.
GPS coordinates are extracted as degree-minute-second rational triplets and converted to signed decimal degrees. Coordinate precision is classified into four bands: sub-metre, street-level, neighbourhood, and city-level, based on the count of significant decimal digits in the stored value.
Hardware signature extraction covers the following EXIF tag classes:
| Tag | Risk Classification | Attribution Significance |
|---|---|---|
CameraSerialNumber / BodySerialNumber |
CRITICAL | Deterministic device identifier; links all files produced by the same physical unit across any platform |
CameraOwnerName |
CRITICAL | Owner name written to firmware at device registration; direct PII disclosure |
ImageUniqueID |
HIGH | Per-image UUID enabling cross-platform file correlation without account linkage |
Artist |
HIGH | Author attribution field; identity disclosure |
LensSerialNumber |
HIGH | Secondary hardware identifier; narrows device attribution to specific lens unit |
Make / Model |
HIGH | Device class fingerprint; supports pivot to firmware vulnerability data |
Software |
MEDIUM | Firmware or software version; relevant to known vulnerability mapping |
DateTimeOriginal |
MEDIUM | Precise capture timestamp; enables temporal behavioural analysis |
The operational implication is that a file uploaded to any public-facing service retains its EXIF payload unless the receiving pipeline explicitly strips metadata. A hardware serial number in that payload constitutes a cross-platform linkage signal independent of any account identifier: it connects every file produced by the same device into a single attribution cluster, regardless of the pseudonyms under which those files were uploaded.
Skeleton Key is written in Go 1.21 and compiles to a statically linked binary. The sole external dependency is github.com/fatih/color, used for ANSI terminal output. All substantive functionality is implemented against the Go standard library.
| Package | Usage |
|---|---|
encoding/binary |
TIFF IFD chain parsing; byte-order-aware rational field decoding |
encoding/json |
Graph serialisation; breach API response deserialisation |
net/http |
All outbound HTTP: breach API queries, platform probes, gateway audits |
crypto/sha1 |
HIBP k-Anonymity prefix computation |
sync |
WaitGroup and Mutex concurrency primitives |
regexp |
PII pattern matching against response corpora |
context |
Cancellation propagation; per-module timeout enforcement |
os/signal |
Graceful shutdown on SIGINT and SIGTERM |
No CGO is used. The binary is deployable on any Linux amd64 target without a Go runtime installation.
┌─────────────────────────────────────────┐
│ ORCHESTRATOR │
│ main.go — CLI + Job Builder │
└─────────────────┬───────────────────────┘
│
chan Job (buffered)
│
┌─────────────────▼───────────────────────┐
│ WORKER POOL │
│ internal/worker/pool.go │
│ sync.WaitGroup + channel-per-direction │
│ │
│ [W1] [W2] [W3] [W4] [W5...] │
└────┬──────┬──────┬──────┬──────┬────────┘
│ │ │ │ │
┌────▼──┐ ┌─▼───┐ ┌▼────┐ ┌▼──────────┐
│BREACH │ │PLAT │ │GTWY │ │FORENSICS │
└───────┘ └─────┘ └─────┘ └───────────┘
│
chan Result (buffered)
│
┌─────────────────▼───────────────────────┐
│ RESULT COLLECTOR │
│ Graph Ingestion + Report Rendering │
└─────────────────────────────────────────┘
The pool uses a channel-per-direction design: jobs enter via a buffered chan Job; results return via a buffered chan Result. A dedicated closer goroutine calls close(results) only after wg.Wait() confirms all workers have exited, allowing callers to range over the results channel without a separate termination signal and without the risk of a premature close on an in-flight result.
The platform scanner implements an inner semaphore (chan struct{} of bounded capacity) to constrain concurrent HTTP goroutines independently of the outer pool size. This separates coarse-grained job-level concurrency from fine-grained HTTP probe concurrency; the outer pool can operate with eight workers while the platform module maintains fifty simultaneous HTTP connections.
All goroutines propagate context.Context cancellation. Per-module timeouts are enforced via context.WithTimeout at the point of job execution. SIGINT and SIGTERM trigger pool shutdown via signal.NotifyContext.
The identity graph is serialised as a JSON adjacency list. Each node carries a type identifier, label, risk score, and source annotation. Each edge carries a base weight, an evidence-derived confidence value, and a computed Identity Collision Probability.
Identity Collision Probability = base_weight × confidence, bounded to [0.0, 1.0].
The confidence value is set at ingestion time based on the specificity of the evidence. A camera serial number match between two files is assigned a confidence of 0.99, yielding a collision probability of 0.97 × 0.99 = 0.96. A platform handle match is assigned 0.85, yielding 0.50 × 0.85 = 0.43. The distinction is explicit in the graph structure.
| Type | Example ID | Represents |
|---|---|---|
email |
email:target@corp.com |
Email address |
handle |
handle:johndoe |
Username or pseudonym |
ip |
ip:192.168.1.1:8080 |
IP address extracted from breach data |
hash |
hash:5f4dcc3b... |
Password hash |
serial |
serial:SN-048210374 |
Hardware serial number |
gps |
gps:51.50722,0.12556 |
Physical GPS coordinate |
domain |
domain:GitHub |
Platform presence node |
device |
device:photo.jpg |
File or device artefact |
| Edge Type | Base Weight | Attribution Basis |
|---|---|---|
hardware_serial |
0.97 |
Deterministic: hardware serial is a unique physical identifier |
hash_match |
0.92 |
Near-deterministic: identical hash implies same password at time of breach |
ip_correlation |
0.78 |
Strong: shared IP reduces target space; attenuated by NAT and VPN scenarios |
gps_proximity |
0.70 |
Spatial: proximity inference; not individually deterministic |
breach_cooccurrence |
0.55 |
Moderate: co-appearance in corpus; does not confirm shared identity |
platform_presence |
0.50 |
Probabilistic: handle match; common handles reduce confidence at ingestion |
domain_ownership |
0.45 |
Organisational: shared domain implies group-level but not individual linkage |
seed_pivot |
0.30 |
Weakest: direct derivation from seed; lowest evidential specificity |
The graph JSON includes a neo4j block containing generated Cypher statements for node and relationship import:
-- Node import (neo4j.node_cypher)
MERGE (n:EMAIL {id: "email:target@corp.com"})
SET n.label = "target@corp.com", n.risk = 0.850, n.source = "breach_apis";
-- Relationship import (neo4j.edge_cypher)
MATCH (a {id: "email:target@corp.com"}), (b {id: "ip:10.0.0.1:443"})
MERGE (a)-[r:IP_CORRELATION {probability: 0.585, weight: 0.780}]->(b);All risk scores are normalised to [0.0, 1.0]. Composite risk is the arithmetic mean of all active module scores.
| Hash Type | Score | Basis |
|---|---|---|
| Plaintext | 1.00 |
Immediately usable credential |
| MD5 | 0.90 |
Trivially inverted via precomputed tables |
| SHA-1 | 0.80 |
Widely cracked; large precomputed corpora available |
| SHA-256 (unsalted) | 0.60 |
GPU-crackable at scale |
| bcrypt | 0.30 |
Computationally resistant; cost-factor dependent |
| Unknown | 0.50 |
Conservative baseline under classification uncertainty |
| Data Class | Additive Score |
|---|---|
| Passwords | +0.40 |
| Government-issued IDs | +0.40 |
| Financial data / Credit cards | +0.35 |
| Sensitive breach flag | +0.30 |
| Phone numbers / Physical addresses | +0.15 |
| Email addresses | +0.10 |
| Signal | Additive Score | Basis |
|---|---|---|
| GPS coordinates | +0.50 |
Physical location disclosure |
| Hardware serial number | +0.30 |
Deterministic device linkage |
| Camera owner name | +0.25 |
Direct PII |
| Image Unique ID | +0.20 |
Cross-platform file correlation |
| Lens serial number | +0.10 |
Secondary device identifier |
| Capture timestamp | +0.05 |
Temporal pattern contribution |
- Go
1.21or later - Network access to
proxy.golang.orgfor initial dependency resolution
git clone https://github.com/rob-OSINT/skeleton-key.git
cd skeleton-key
# Resolve dependencies and generate go.sum
go mod tidy
# Compile
go build -o skeleton-key .
# Cross-compile for Linux amd64
GOOS=linux GOARCH=amd64 go build -o skeleton-key-linux ../skeleton-key --helpexport HIBP_KEY="<hibp-api-key>"
export BREACH_DIR_KEY="<breachdirectory-api-key>"
export LEAKIX_KEY="<leakix-api-key>"Keys may also be supplied via the corresponding CLI flags. If no breach API keys are configured, the Breach Pivot module logs the omission and exits gracefully; all other modules continue execution normally.
./skeleton-key \
--email target@corp.com \
--handle johndoe \
--file /path/to/image.jpg \
--all \
--json \
--output ./audit-output \
--workers 8 \
--platform-con 50 \
--timeout 30# Breach correlation only
./skeleton-key --email target@corp.com --mod-breach --json
# Platform footprint only; no API key required
./skeleton-key --handle johndoe --mod-platform --platform-con 75
# EXIF forensics only; fully offline, no network calls
./skeleton-key --file upload.jpg --mod-forensics --json
# Auth-gateway audit with explicit endpoint specification
./skeleton-key \
--email target@corp.com \
--mod-gateway \
--gateway-specs "POST:https://target.com/forgot-password:email:application/json"| Flag | Default | Description |
|---|---|---|
--email |
Seed email address for breach and gateway modules | |
--handle |
Username for platform footprint enumeration | |
--file |
File path for EXIF forensics module | |
--domain |
Domain for LeakIX service exposure queries | |
--all |
false |
Enable all four audit modules |
--mod-breach |
false |
Credential leak correlation |
--mod-platform |
false |
Platform footprint enumeration |
--mod-gateway |
false |
Auth-gateway PII audit |
--mod-forensics |
false |
EXIF metadata forensics |
--workers |
8 |
Outer worker pool size |
--platform-con |
50 |
Platform scanner goroutine concurrency bound |
--gateway-con |
10 |
Gateway auditor goroutine concurrency bound |
--timeout |
30 |
Per-module context timeout in seconds |
--hibp-key |
$HIBP_KEY |
Have I Been Pwned API key |
--bd-key |
$BREACH_DIR_KEY |
BreachDirectory API key |
--leakix-key |
$LEAKIX_KEY |
LeakIX API key |
--output |
./output |
Output directory for all report files |
--graph-file |
skeleton_graph.json |
Identity graph output filename |
--json |
false |
Write per-module JSON reports to output directory |
--verbose |
false |
Extended error output from module execution |
--gateway-specs |
Endpoint specifications; format below |
"METHOD:URL:FIELD[:CONTENT_TYPE]"
POST:https://target.com/forgot:email:application/json
POST:https://target.com/recover:username:application/x-www-form-urlencoded
GET:https://target.com/reset:email
Multiple specifications are comma-separated.
output/
├── skeleton_graph.json # Identity graph: adjacency list, Neo4j Cypher blocks
├── breach_report.json # Credential records, hash classifications, risk scores
├── platform_report.json # Confirmed platform presence, footprint breadth score
├── gateway_report.json # PII findings per endpoint, severity classifications
└── forensics_report.json # EXIF extraction, GPS coordinates, hardware signatures
All output files are written with chmod 600 permissions.
-- Paste node_cypher from skeleton_graph.json into Neo4j Browser first,
-- then edge_cypher. Both blocks are generated at runtime.
-- Query high-confidence identity collisions
MATCH (a)-[r]->(b)
WHERE r.probability > 0.70
RETURN a, r, b
ORDER BY r.probability DESC;File→Import→Network from File- Select
skeleton_graph.json - Map:
idto Node ID;typeto Node Label;probabilityto Edge Weight - Apply
Edge-Weighted Spring Embeddedlayout - Style: node size by
risk_score; edge colour graduated byprobability
skeleton-key/
├── main.go # Orchestrator: CLI parsing, job dispatch, rendering
├── go.mod # Module definition; single external dependency
├── go.sum # Dependency hash verification
├── config/
│ └── platforms.go # Platform registry: 102 entries, typed and categorised
└── internal/
├── worker/
│ └── pool.go # Worker pool: sync.WaitGroup, channel-per-direction design
├── audit/
│ ├── breach.go # HIBP, BreachDirectory, LeakIX API clients
│ ├── platform.go # Concurrent HTTP platform prober; semaphore-bounded
│ └── authgateway.go # PII pattern scanner for authentication recovery flows
├── forensics/
│ ├── exif.go # TIFF/JPEG binary IFD chain parser
│ └── metadata.go # GPS coordinate and hardware signature extraction
└── vis/
└── graph.go # Identity graph builder; Neo4j Cypher generation
The following guidance applies to authorised deployments.
Rate limiting. The platform scanner issues HEAD and GET requests across 102 domains concurrently. Configure --platform-con and --timeout conservatively to avoid triggering rate-limit responses or WAF-level blocks on the scanned infrastructure. The operator is responsible for ensuring probe volume is within engagement scope.
API key hygiene. HIBP and BreachDirectory keys are transmitted as HTTP request headers. Provision dedicated keys per engagement and rotate immediately upon engagement closure. Do not reuse keys across unrelated engagements.
Output handling. The files skeleton_graph.json, breach_report.json, and forensics_report.json contain credential metadata, PII, and physical location data. All output files are written with chmod 600. Encrypt before transmission. Destroy in accordance with engagement data handling requirements and applicable data protection obligations.
Gateway probing. The Auth-Gateway Audit module submits POST and GET requests to live authentication endpoints using the seed email as a probe value. Confirm that authentication flow testing is within the agreed scope prior to execution and that authentication attempt logging by the target is acceptable to the client.
Data classification. Hardware serial numbers and GPS coordinates extracted from EXIF data may constitute personal data under GDPR, UK GDPR, CCPA, and equivalent legislation. Handle with appropriate classification controls and document the legal basis for processing.
Contributions extending the core audit capability are accepted subject to the following constraints.
Platform registry additions (config/platforms.go): include ClaimCode, NotFoundCode, and a verified category assignment. Test the probe URL against a known active account before submitting.
PII pattern additions (internal/audit/authgateway.go): provide the detection regular expression, severity classification, base risk score, and a concise description of the exploitation vector. Include a representative example of the response fragment being targeted.
Breach API adapters (internal/audit/breach.go): implement the []BreachRecord return signature. Document rate limits, key provisioning requirements, and any data licence restrictions applicable to the upstream API.
Graph edge type additions (internal/vis/graph.go): calibrate the base weight against an empirical or formally reasoned attribution basis. Document the evidence type, its specificity, and its position relative to existing edge types in the weight hierarchy.
The single external dependency (github.com/fatih/color) is accepted for terminal output only. No additional external dependencies will be considered.
╔══════════════════════════════════════════════════════════════════════════════╗
║ AUTHORISED USE ONLY ║
╠══════════════════════════════════════════════════════════════════════════════╣
║ ║
║ This framework is developed exclusively for: ║
║ ║
║ • Authorised penetration testing and red team engagements ║
║ • Defensive security auditing with written organisational consent ║
║ • Digital identity hardening and personal EASM assessments ║
║ • Academic and forensic research in controlled environments ║
║ ║
║ PROHIBITED USE: ║
║ ║
║ Deployment against any system, individual, or organisation without ║
║ explicit, documented, written authorisation is: ║
║ ║
║ • A federal offence under the Computer Fraud and Abuse Act (CFAA), ║
║ 18 U.S.C. § 1030 ║
║ • A criminal offence under the UK Computer Misuse Act 1990 ║
║ • Prosecutable under EU Directive 2013/40/EU on attacks against ║
║ information systems and equivalent national legislation ║
║ • Subject to civil liability in all applicable jurisdictions ║
║ ║
║ The authors, contributors, and distributors of this software accept ║
║ no liability for damages, legal consequences, or harm arising from ║
║ misuse, unauthorised deployment, or violation of applicable law. ║
║ ║
║ You are solely responsible for ensuring compliance with all applicable ║
║ local, national, and international law prior to use. ║
║ ║
║ If you do not hold documented authorisation: do not use this tool. ║
║ ║
╚══════════════════════════════════════════════════════════════════════════════╝
"The digital past is not the past. It is the present, indexed and waiting."
[SKELETON_KEY] — v1.0.0
Lead Architect: rob-OSINT