-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture Data Fragmentation
title: "Architecture — Data Fragmentation (B++ Model)" category: "architecture" version: "1.0" last_updated: "2024-01-15" standards: ["NIST-SP-800-53", "FSTEC", "NATO-COSMIC-TOP-SECRET"] related_pages: ["Architecture-Overview", "Architecture-Encryption-Model", "Database-Schema-Overview", "Security-Principles"] ai_summary: "VaultFlower uses Data Fragmentation B++ — three physically separate PostgreSQL databases where no single database contains a complete data picture. Data assembly requires a time-bound HashiCorp Vault assembly token. Designed to meet NIST SP 800-53 High baseline, FSTEC state secret requirements, and NATO COSMIC TOP SECRET data handling principles."
Classical database security assumes: encrypt the data, restrict access to the database.
The problem: if an attacker gains access to the database (SQL injection, stolen credentials, insider threat, physical access), they get everything — usernames, passwords, and the systems they belong to. One breach = complete compromise.
VaultFlower takes a different approach inspired by classified information handling principles:
"No single entity should ever hold enough information to reconstruct the complete picture."
This is the Data Fragmentation B++ model — a combination of physical data separation and cryptographic separation, designed to meet the highest classification standards.
INtERNAL CLASSIFICATION LEVELS OF DATA FRAGMENTATION:
Level A — Encryption only:
Single DB, data encrypted at rest.
Compromise of DB + encryption key = full breach.
Encryption at Rest: NIST SP 800-53 Rev. 5ISO/IEC 27001:2022
Level B — Physical Separation:
Data split across multiple databases.
Each DB meaningless without others.
Attacker needs ALL databases simultaneously.
Information fragmentation: NIST SP 800-53 Rev. 5PCI DSS 4.0
Level B++ — Physical + Cryptographic Separation (VaultFlower):
Data split across multiple databases (Physical)
Each DB encrypted with its own unique key (Cryptographic)
Keys stored in independent trusted system (Vault)
Assembly requires time-bound authorization token
Even with ALL databases, attacker needs Vault too.
Even with Vault, attacker needs Shamir key shares.
Even with Shamir shares, attacker needs 3 smartcards + PINs.
zero trust: NIST SP 800-53 Rev. 5NIST SP 800-207PCI DSS 4.0
NOTE here: [TODO] Implement
Confidential Computing & FHE
Post-Quantum Cryptography — PQC (White House Memorandum M-26-15)
┌─────────────────────────────────────────────────────────────┐
│ IDENTITY DB │
│ Container: vfw-dc-postgres-identity │
│ Port: 5434 │
│ DEK: vault/dek/identity (unique key) │
│ │
│ Answers: WHO │
│ ✓ Users and their UPNs │
│ ✓ Roles and permissions │
│ ✓ MFA methods and secrets (refs to Vault) │
│ ✓ Tenants and their settings │
│ ✓ Access scopes │
│ │
│ Does NOT contain: │
│ ✗ Asset names, IPs, or hostnames │
│ ✗ Passwords or credentials │
│ ✗ System or zone information │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ ASSETS DB │
│ Container: vfw-dc-postgres-assets │
│ Port: 5432 │
│ DEK: vault/dek/assets (unique key) │
│ │
│ Answers: WHAT and WHERE │
│ ✓ Locations, systems, zones, assets │
│ ✓ Asset hostnames, IPs, OS details │
│ ✓ Maintenance tasks and schedules │
│ ✓ Password policies │
│ ✓ Signed form documents (refs to MinIO) │
│ │
│ Does NOT contain: │
│ ✗ User identities or credentials │
│ ✗ Passwords or encryption keys │
│ ✗ Who has access to what (only asset definitions) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ SECRETS DB │
│ Container: vfw-dc-postgres-secrets │
│ Port: 5433 │
│ DEK: vault/dek/secrets (unique key) │
│ │
│ Answers: THE SECRET │
│ ✓ Encrypted credential (AES-256-GCM ciphertext) │
│ ✓ IV and auth tag for each record │
│ ✓ Vault key reference (path, not the key itself) │
│ ✓ Checkout records │
│ ✓ Password history (encrypted) │
│ │
│ Does NOT contain: │
│ ✗ Asset hostnames or IPs (only UUID references) │
│ ✗ User names or UPNs (only UUID references) │
│ ✗ Encryption keys (only Vault path references) │
│ ✗ Plaintext passwords (ever) │
└─────────────────────────────────────────────────────────────┘
SELECT * FROM secrets.credentials LIMIT 1;
-- Result:
id: "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
tenant_id: "f1e2d3c4-b5a6-7890-fedc-ba0987654321"
asset_ref: "11223344-5566-7788-99aa-bbccddeeff00" ← just a UUID
username_encrypted: "\xc3f7a2b1..." ← AES-256-GCM ciphertext
password_encrypted: "\xe8d4c9f2..." ← AES-256-GCM ciphertext
vault_key_ref: "vfw/secrets/dek/f1e2d3.../a1b2c3..." ← path, not key
iv: "\x4a7f3d..."
auth_tag: "\x9c2e8b..."What an attacker may learn from this:.
-
asset_refis a UUID with no meaning without Assets DB - Ciphertext is unreadable without DEK from Vault
- Vault path is useless without Vault access
- Vault access requires Shamir key shares from 3 administrators
SELECT hostname, ip_address FROM assets.assets LIMIT 3;
-- Result:
hostname: "system1.ot.contoso.com"
hostname: "system2.ot.contoso.com"
hostname: "hmi-panel-north.ot.contoso.com"
ip: "10.10.1.101"
ip: "10.10.1.102"
ip: "10.10.1.110"What an attacker learns from this: Server hostnames and internal IPs — useful for network reconnaissance but no passwords, no usernames, no way to authenticate.
SELECT upn, display_name FROM identity.users LIMIT 3;
-- Result:
upn: "john.doe@contoso.com"
display_name: "John Doe"What an attacker learns from this: User names — useful for social engineering but no passwords, no access to any systems.
Identity DB → john.doe@contoso.com is an Operator
Assets DB → asset_ref a1b2c3d4 = system1.ot.contoso.com (10.10.1.101)
Secrets DB → credential for asset_ref a1b2c3d4 = [encrypted]
What an attacker learns from this: The relationship between user, system, and credential — but the credential is still AES-256-GCM encrypted. They still need:
- DEK from HashiCorp Vault
- Vault requires service token
- Vault master key requires Shamir shares from 3 of 5 administrators
- Each administrator requires Smartcard + PIN
Since the three databases are physically separate, foreign keys between them are impossible by design. Cross-database references use UUID only:
Assets DB → assets.assets.id = "a1b2c3d4-..."
│
│ UUID reference only
│ NO foreign key constraint
▼
Secrets DB → secrets.credentials.asset_ref = "a1b2c3d4-..."
Integrity enforcement: Application-level only. The API validates referential integrity when creating/updating records. This is a conscious security trade-off — the inability to join databases is a feature, not a bug.
HashiCorp Vault is the independent trusted intermediary that makes data assembly possible. It is the single system that knows which DEK(Data Encryption Key) unlocks which database, and it only reveals this information under strict conditions.
┌─────────────────────────────────────────────────────────────┐
│ HashiCorp Vault (vfw-dc-vault) │
│ │
│ Secrets stored: │
│ vault/dek/identity → DEK for Identity DB │
│ vault/dek/assets → DEK for Assets DB │
│ vault/dek/secrets → DEK for Secrets DB │
│ vault/kek/master → KEK (encrypted by master key) │
│ vault/mfa/totp/{uid} → TOTP secrets │
│ vault/mfa/webauthn/... → WebAuthn public keys │
│ vault/mfa/smartcard/.. → Smartcard UID hashes │
│ vault/tokens/assembly/ → Time-bound assembly tokens │
│ vault/db/connstrings/ → Database connection strings │
│ │
│ Access control: │
│ vfw-dc-api: can request assembly tokens │
│ vfw-dc-worker-workflow: can read rotation DEKs │
│ vfw-dc-api: can read MFA secrets │
│ No service: can read master KEK directly │
│ │
│ Unseal requirement: │
│ Shamir Secret Sharing — 3 of 5 administrators │
│ Each administrator: Smartcard + PIN │
└─────────────────────────────────────────────────────────────┘
When an operator completes Dual Control checkout, Vault issues a time-bound assembly token:
Assembly token properties:
TTL: Equals checkout TTL (e.g. 2 hours)
Scope: Specific credential ID only
Operations: Decrypt credential from Secrets DB only
Single use: Token invalidated after password display
Storage: Vault only (referenced by ID in checkouts table)
Token lifecycle:
1. Dual Control complete → API requests assembly token from Vault
2. Vault validates: service token, user identity, task context
3. Vault issues assembly token with TTL
4. Application uses token to request DEK for decryption
5. Password assembled in memory
6. Password displayed once
7. Assembly token immediately invalidated (not waiting for TTL)
8. If token expires before use → checkout expired, no access
Attacker extracts: Encrypted credentials (ciphertext)
Missing: DEK from Vault
Result: Unreadable ciphertext — BREACH CONTAINED
Attacker extracts: Hostnames, IPs, system architecture
Missing: Credentials from Secrets DB
Result: Network reconnaissance data only — NO CREDENTIALS
Attacker has: Vault service token for vfw-dc-api
Can do: Request assembly tokens for specific credentials
Cannot do: Read master KEK, unseal Vault, access all DEKs at once
Result: Limited blast radius — per-credential, per-TTL
Attacker has: Full read access to all three PostgreSQL databases
Can reconstruct: Which user has access to which system
Cannot get: DEK from Vault (requires service token + Vault access)
Cannot decrypt: Credentials (AES-256-GCM without DEK)
Result: Relationship map only — NO PLAINTEXT CREDENTIALS
Attacker has: Access to Vault storage backend
Cannot get: Master key (Shamir — requires 3 administrators)
Cannot unseal: Vault without 3 Shamir shares + 3 smartcards + 3 PINs
Result: Encrypted Vault data — NO KEYS ACCESSIBLE
The Internal B++ model was designed to meet these specific standards:
| Standard | Requirement | How B++ Addresses It |
|---|---|---|
| NIST SP 800-53 AC-3 | Access Enforcement | Vault token required for any data assembly |
| NIST SP 800-53 AC-4 | Information Flow Enforcement | Cross-DB data flow requires explicit authorization |
| NIST SP 800-53 SC-4 | Information in Shared Resources | No shared memory or storage between DB contexts |
| NIST SP 800-53 SC-28 | Protection at Rest | Each DB encrypted with unique DEK |
| NIST SP 800-53 SC-28(1) | Cryptographic Protection | AES-256-GCM, keys in Vault |
| FSTEC ЗИ.1 | Information protection | Physical and cryptographic separation |
| FSTEC ЗИ.3 | Cryptographic protection | Envelope encryption, Shamir sharing |
| IEC 62443 SR 3.4 | Software and information integrity | Tamper detection via GCM auth tags |
| NATO COSMIC TOP SECRET | Need-to-know principle | No single system holds complete picture |
✅ DO:
- Use UUID references when referencing entities in other databases
- Validate cross-DB referential integrity at application layer
- Always request a Vault assembly token before cross-DB operations
- Log every cross-DB assembly operation to all three audit logs
- Use separate database connection strings for each database
❌ DO NOT:
- Create database links, foreign data wrappers, or views spanning DBs
- Cache assembled data to any persistent storage
- Pass assembled data through RabbitMQ messages
- Include data from multiple databases in a single API response
without a valid assembly token
- Store asset context (hostname, IP) in Secrets DB columns
- Store credential context (username) in Assets DB columns
- Architecture-Overview — system context
- Architecture-Encryption-Model — AES-256-GCM and envelope encryption
- Database-Schema-Overview — all three database schemas
- Security-Principles — Principle 1 (Data Fragmentation)
- ADR-002-Data-Fragmentation — decision record for this model