Version: 1.0
Date: January 2026
KGCS (Cybersecurity Knowledge Graph) is a frozen, standards‑backed ontology that unifies nine MITRE security taxonomies (CVE, CWE, CPE, CVSS, CAPEC, ATT&CK, D3FEND, CAR, SHIELD, ENGAGE). It provides a single source of truth for AI systems to reason about vulnerabilities, attacks, defenses, and threat intelligence without hallucination.
Current Status:
- Phase 1: Core ontologies complete and frozen
- Phase 2: SHACL validation framework complete
- Phase 3: MVP data ingestion and Neo4j loader operational (see PROJECT-STATUS-SUMMARY.md)
- Phase 4–5: Extensions and RAG integration designed, not yet implemented
For a full technical and architectural overview, see docs/KGCS.md and docs/ARCHITECTURE.md.
┌───────────────────────┐
│ Extension Layer (L4) │ (Incident, Risk, ThreatActor)
├───────────────────────┤
│ Core Ontology (L3) │ (CPE → CVE → CWE → CAPEC → ATT&CK → D3FEND/CAR/SHIELD/ENGAGE)
├───────────────────────┤
│ Modular Ontologies │ (one OWL file per standard)
├───────────────────────┤
│ External Standards │ (NVD, MITRE)
└───────────────────────┘
- Core is immutable and 1:1 mapped to official JSON/STIX schemas.
- Extensions add temporal, contextual, or subjective data without polluting the core.
| Invariant | Description |
|---|---|
| Authoritative alignment | Every class/property maps to a stable ID in NVD or MITRE. |
| Explicit provenance | Every edge is traceable to a source field. |
| No invented semantics | The ontology is a lens, not a replacement for the standards. |
| Extensions never modify core | Incident, Risk, ThreatActor layers reference core only. |
-
Clone the repo
git clone https://github.com/yourorg/kgcs.git cd kgcs -
Load data
- Download NVD and MITRE JSON/STIX files into data.
- Run the ingestion script (Python/Neo4j or RDF).
-
Query the graph
-
Use Neo4j Cypher or SPARQL.
-
Example:
MATCH (cve:Vulnerability {cveId:'CVE-2025-1234'}) MATCH (cve)-[:caused_by]->(cwe:CWE) RETURN cve, cwe
-
-
Integrate with RAG
- Use the pre‑approved traversal templates in rag.
- Ensure LLM queries follow a template; otherwise reject.
| File | Purpose |
|---|---|
| KGCS.md | Executive summary & architecture |
| core-ontology-v1.0.md | Core class & edge definitions |
| RAG-travesal-templates.md | Safe traversal contracts |
| incident-ontology-extension-v1.0.md | Incident extension spec |
| risk-ontology-extension-v1.0.md | Risk extension spec |
| threatactor-ontology-extension-v1.0.md | Threat‑actor extension spec |
- Incident – Observed techniques, detections, evidence.
- Risk – Assessments, scenarios, decisions.
- ThreatActor – Attribution claims, capabilities, tools.
Each extension lives in its own OWL file and imports the core ontology.
- Add new standards (e.g., NIST SP 800‑53).
- SHACL validation: canonical shapes, per-OWL bundles and manifest, validator CLI (
--template/--owl) implemented; ETL integration and governance artifacts added. Validator emits machine-readable reports toartifacts/and a consolidated indexartifacts/shacl-report-consolidated.json. CI gating remains scaffolded and requires rule‑ID policy selection. - Build a UI for visualizing traversal paths.
- Integrate with an LLM for explainable answers.
scripts/db/holds the Phase 4 helpers (create_cpe_cve_relationships.py,verify_phase4_complete.py, thecheck_*utilities, etc.) that interact with Neo4j for reproduction or diagnostics.scripts/legacy/phase4/archives the one-off repair/verification scripts (repair_cpe_properties.py,diagnose_cpe_mismatch.py,check_buggy_pattern.py, etc.) that were needed during the CPE parsing fix but are no longer part of normal ingestion.- Regression and integration suites now live under
tests/so the repository root stays focused on documentation, configuration, and operational scripts.
- tests/verification/verify_causal_chain.py — offline sanity check of CWE→CAPEC→Technique→Tactic using the pipeline Turtle outputs in tmp.
- tests/verification/verify_defense_layers.py — offline sanity check of defense-layer links (D3FEND/CAR/SHIELD/ENGAGE) against ATT&CK using the pipeline Turtle outputs in tmp.
These are manual verification utilities (not pytest tests) and expect the corresponding tmp/pipeline-stage*.ttl files to exist.
Pull requests are welcome. Please follow the style guidelines in ontology and keep the core immutable.
This project is licensed under the MIT License - see the LICENSE file for details.
- Short-term (0-3 months): Complete CI SHACL gating, add missing per-OWL positive/negative samples, and automate validator runs in
src/core/. - Mid-term (3-9 months): Expand modular OWL coverage (additional standards), automate ingestion pipelines, add RAG enforcement hooks, and produce more example traversal templates.
- Long-term (9-18 months): Build a web UI for traversal visualization, full CI enforcement for OWL/SHACL changes, and integrate explainable LLM-backed query interfaces.
- Core Ontology: Modular OWL files and core invariants are implemented under
docs/ontology/. - SHACL Validation: Validation module at
src/core/validation.pywith CLI entry points; machine-readable reports stored inartifacts/; CI gating is scaffolded but not fully enforced. - Ingestion: ETL transformers for all 9 standards live in
src/etl/(e.g.,etl_cve.py,etl_cpe.py); pre-ingest pipeline insrc/ingest/pipeline.pywith validation gates; ready for Neo4j integration. - RAG Safety: Traversal templates and safety rules are documented; runtime enforcement and query-time validation remain to be completed.
- Integrations & UI: Neo4j loader and sample data are present; a production UI is planned but not yet implemented.
For detailed technical documentation, please refer to the files in the docs/ directory.
The document KGCS.md provides a comprehensive overview of the architecture and design principles.