Architecture Data Fragmentation

title: "Architecture — Data Fragmentation (B++ Model)" category: "architecture" version: "1.0" last_updated: "2024-01-15" standards: ["NIST-SP-800-53", "FSTEC", "NATO-COSMIC-TOP-SECRET"] related_pages: ["Architecture-Overview", "Architecture-Encryption-Model", "Database-Schema-Overview", "Security-Principles"] ai_summary: "VaultFlower uses Data Fragmentation B++ — three physically separate PostgreSQL databases where no single database contains a complete data picture. Data assembly requires a time-bound HashiCorp Vault assembly token. Designed to meet NIST SP 800-53 High baseline, FSTEC state secret requirements, and NATO COSMIC TOP SECRET data handling principles."

🔀 Architecture — Data Fragmentation (B++ Model)

Why Data Fragmentation?

Classical database security assumes: encrypt the data, restrict access to the database.

The problem: if an attacker gains access to the database (SQL injection, stolen credentials, insider threat, physical access), they get everything — usernames, passwords, and the systems they belong to. One breach = complete compromise.

VaultFlower takes a different approach inspired by classified information handling principles:

"No single entity should ever hold enough information to reconstruct the complete picture."

This is the Data Fragmentation B++ model — a combination of physical data separation and cryptographic separation, designed to meet the highest classification standards.

The B++ Model Explained


INtERNAL CLASSIFICATION LEVELS OF DATA FRAGMENTATION:

  Level A — Encryption only:
    Single DB, data encrypted at rest.
    Compromise of DB + encryption key = full breach.

    Encryption at Rest: NIST SP 800-53 Rev. 5ISO/IEC 27001:2022

  Level B — Physical Separation:
    Data split across multiple databases.
    Each DB meaningless without others.
    Attacker needs ALL databases simultaneously.

    Information fragmentation: NIST SP 800-53 Rev. 5PCI DSS 4.0

  Level B++ — Physical + Cryptographic Separation (VaultFlower):
    Data split across multiple databases (Physical)
    Each DB encrypted with its own unique key (Cryptographic)
    Keys stored in independent trusted system (Vault)
    Assembly requires time-bound authorization token
    Even with ALL databases, attacker needs Vault too.
    Even with Vault, attacker needs Shamir key shares.
    Even with Shamir shares, attacker needs 3 smartcards + PINs.

    zero trust: NIST SP 800-53 Rev. 5NIST SP 800-207PCI DSS 4.0

    NOTE here: [TODO] Implement 
      Confidential Computing & FHE
      Post-Quantum Cryptography — PQC (White House Memorandum M-26-15)

Three Databases — The Fragmentation Design

┌─────────────────────────────────────────────────────────────┐
│  IDENTITY DB                                                │
│  Container: vfw-dc-postgres-identity                        │
│  Port: 5434                                                 │
│  DEK: vault/dek/identity (unique key)                       │
│                                                             │
│  Answers: WHO                                               │
│  ✓ Users and their UPNs                                    │
│  ✓ Roles and permissions                                   │
│  ✓ MFA methods and secrets (refs to Vault)                 │
│  ✓ Tenants and their settings                              │
│  ✓ Access scopes                                           │
│                                                            │
│  Does NOT contain:                                         │
│  ✗ Asset names, IPs, or hostnames                          │
│  ✗ Passwords or credentials                                │
│  ✗ System or zone information                              │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  ASSETS DB                                                  │
│  Container: vfw-dc-postgres-assets                          │
│  Port: 5432                                                 │
│  DEK: vault/dek/assets (unique key)                         │
│                                                             │
│  Answers: WHAT and WHERE                                    │
│  ✓ Locations, systems, zones, assets                       │
│  ✓ Asset hostnames, IPs, OS details                        │
│  ✓ Maintenance tasks and schedules                         │
│  ✓ Password policies                                       │
│  ✓ Signed form documents (refs to MinIO)                   │
│                                                             │
│  Does NOT contain:                                          │
│  ✗ User identities or credentials                          │
│  ✗ Passwords or encryption keys                            │
│  ✗ Who has access to what (only asset definitions)         │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  SECRETS DB                                                 │
│  Container: vfw-dc-postgres-secrets                         │
│  Port: 5433                                                 │
│  DEK: vault/dek/secrets (unique key)                        │
│                                                             │
│  Answers: THE SECRET                                        │
│  ✓ Encrypted credential (AES-256-GCM ciphertext)           │
│  ✓ IV and auth tag for each record                         │
│  ✓ Vault key reference (path, not the key itself)          │
│  ✓ Checkout records                                        │
│  ✓ Password history (encrypted)                            │
│                                                             │
│  Does NOT contain:                                          │
│  ✗ Asset hostnames or IPs (only UUID references)           │
│  ✗ User names or UPNs (only UUID references)               │
│  ✗ Encryption keys (only Vault path references)            │
│  ✗ Plaintext passwords (ever)                              │
└─────────────────────────────────────────────────────────────┘

What Each Database Looks Like Without the Others

Secrets DB in isolation (SCENARIO: attacker has ONLY Secrets DB)

SELECT * FROM secrets.credentials LIMIT 1;

-- Result:
id:                  "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
tenant_id:           "f1e2d3c4-b5a6-7890-fedc-ba0987654321"
asset_ref:           "11223344-5566-7788-99aa-bbccddeeff00"  ← just a UUID
username_encrypted:  "\xc3f7a2b1..."  ← AES-256-GCM ciphertext
password_encrypted:  "\xe8d4c9f2..."  ← AES-256-GCM ciphertext
vault_key_ref:       "vfw/secrets/dek/f1e2d3.../a1b2c3..."  ← path, not key
iv:                  "\x4a7f3d..."
auth_tag:            "\x9c2e8b..."

What an attacker may learn from this:.

asset_ref is a UUID with no meaning without Assets DB
Ciphertext is unreadable without DEK from Vault
Vault path is useless without Vault access
Vault access requires Shamir key shares from 3 administrators

Assets DB in isolation (SCENARIO: attacker has ONLY Assets DB)

SELECT hostname, ip_address FROM assets.assets LIMIT 3;

-- Result:
hostname: "system1.ot.contoso.com"
hostname: "system2.ot.contoso.com"
hostname: "hmi-panel-north.ot.contoso.com"
ip:       "10.10.1.101"
ip:       "10.10.1.102"
ip:       "10.10.1.110"

What an attacker learns from this: Server hostnames and internal IPs — useful for network reconnaissance but no passwords, no usernames, no way to authenticate.

Identity DB in isolation (SCENARIO: attacker has ONLY Identity DB)

SELECT upn, display_name FROM identity.users LIMIT 3;

-- Result:
upn:          "john.doe@contoso.com"
display_name: "John Doe"

What an attacker learns from this: User names — useful for social engineering but no passwords, no access to any systems.

All three databases together (attacker has ALL databases):

Identity DB → john.doe@contoso.com is an Operator
Assets DB   → asset_ref a1b2c3d4 = system1.ot.contoso.com (10.10.1.101)
Secrets DB  → credential for asset_ref a1b2c3d4 = [encrypted]

What an attacker learns from this: The relationship between user, system, and credential — but the credential is still AES-256-GCM encrypted. They still need:

DEK from HashiCorp Vault
Vault requires service token
Vault master key requires Shamir shares from 3 of 5 administrators
Each administrator requires Smartcard + PIN

Cross-Database References

Since the three databases are physically separate, foreign keys between them are impossible by design. Cross-database references use UUID only:

Assets DB → assets.assets.id = "a1b2c3d4-..."
                                      │
                                      │ UUID reference only
                                      │ NO foreign key constraint
                                      ▼
Secrets DB → secrets.credentials.asset_ref = "a1b2c3d4-..."

Integrity enforcement: Application-level only. The API validates referential integrity when creating/updating records. This is a conscious security trade-off — the inability to join databases is a feature, not a bug.

HashiCorp Vault as Trust Anchor

HashiCorp Vault is the independent trusted intermediary that makes data assembly possible. It is the single system that knows which DEK(Data Encryption Key) unlocks which database, and it only reveals this information under strict conditions.

┌─────────────────────────────────────────────────────────────┐
│  HashiCorp Vault (vfw-dc-vault)                             │
│                                                             │
│  Secrets stored:                                            │
│    vault/dek/identity     → DEK for Identity DB             │
│    vault/dek/assets       → DEK for Assets DB               │
│    vault/dek/secrets      → DEK for Secrets DB              │
│    vault/kek/master       → KEK (encrypted by master key)   │
│    vault/mfa/totp/{uid}   → TOTP secrets                    │
│    vault/mfa/webauthn/... → WebAuthn public keys            │
│    vault/mfa/smartcard/.. → Smartcard UID hashes            │
│    vault/tokens/assembly/ → Time-bound assembly tokens      │
│    vault/db/connstrings/  → Database connection strings     │
│                                                             │
│  Access control:                                            │
│    vfw-dc-api:            can request assembly tokens       │
│    vfw-dc-worker-workflow: can read rotation DEKs           │
│    vfw-dc-api:            can read MFA secrets              │
│    No service:            can read master KEK directly      │
│                                                             │
│  Unseal requirement:                                        │
│    Shamir Secret Sharing — 3 of 5 administrators            │
│    Each administrator: Smartcard + PIN                      │
└─────────────────────────────────────────────────────────────┘

Assembly Token — The Authorization Mechanism

When an operator completes Dual Control checkout, Vault issues a time-bound assembly token:

Assembly token properties:
  TTL:        Equals checkout TTL (e.g. 2 hours)
  Scope:      Specific credential ID only
  Operations: Decrypt credential from Secrets DB only
  Single use: Token invalidated after password display
  Storage:    Vault only (referenced by ID in checkouts table)

Token lifecycle:
  1. Dual Control complete → API requests assembly token from Vault
  2. Vault validates: service token, user identity, task context
  3. Vault issues assembly token with TTL
  4. Application uses token to request DEK for decryption
  5. Password assembled in memory
  6. Password displayed once
  7. Assembly token immediately invalidated (not waiting for TTL)
  8. If token expires before use → checkout expired, no access

Security Analysis — Attack Scenarios

Scenario 1: SQL Injection on Secrets DB

Attacker extracts: Encrypted credentials (ciphertext)
Missing:           DEK from Vault
Result:            Unreadable ciphertext — BREACH CONTAINED

Scenario 2: Full compromise of Assets DB server

Attacker extracts: Hostnames, IPs, system architecture
Missing:           Credentials from Secrets DB
Result:            Network reconnaissance data only — NO CREDENTIALS

Scenario 3: Compromised application service token

Attacker has:      Vault service token for vfw-dc-api
Can do:            Request assembly tokens for specific credentials
Cannot do:         Read master KEK, unseal Vault, access all DEKs at once
Result:            Limited blast radius — per-credential, per-TTL

Scenario 4: Insider threat — DBA with access to all three databases

Attacker has:      Full read access to all three PostgreSQL databases
Can reconstruct:   Which user has access to which system
Cannot get:        DEK from Vault (requires service token + Vault access)
Cannot decrypt:    Credentials (AES-256-GCM without DEK)
Result:            Relationship map only — NO PLAINTEXT CREDENTIALS

Scenario 5: Vault server compromised

Attacker has:      Access to Vault storage backend
Cannot get:        Master key (Shamir — requires 3 administrators)
Cannot unseal:     Vault without 3 Shamir shares + 3 smartcards + 3 PINs
Result:            Encrypted Vault data — NO KEYS ACCESSIBLE

Compliance Mapping

The Internal B++ model was designed to meet these specific standards:

Standard	Requirement	How B++ Addresses It
NIST SP 800-53 AC-3	Access Enforcement	Vault token required for any data assembly
NIST SP 800-53 AC-4	Information Flow Enforcement	Cross-DB data flow requires explicit authorization
NIST SP 800-53 SC-4	Information in Shared Resources	No shared memory or storage between DB contexts
NIST SP 800-53 SC-28	Protection at Rest	Each DB encrypted with unique DEK
NIST SP 800-53 SC-28(1)	Cryptographic Protection	AES-256-GCM, keys in Vault
FSTEC ЗИ.1	Information protection	Physical and cryptographic separation
FSTEC ЗИ.3	Cryptographic protection	Envelope encryption, Shamir sharing
IEC 62443 SR 3.4	Software and information integrity	Tamper detection via GCM auth tags
NATO COSMIC TOP SECRET	Need-to-know principle	No single system holds complete picture

Implementation Constraints for Developers

✅ DO:
  - Use UUID references when referencing entities in other databases
  - Validate cross-DB referential integrity at application layer
  - Always request a Vault assembly token before cross-DB operations
  - Log every cross-DB assembly operation to all three audit logs
  - Use separate database connection strings for each database

❌ DO NOT:
  - Create database links, foreign data wrappers, or views spanning DBs
  - Cache assembled data to any persistent storage
  - Pass assembled data through RabbitMQ messages
  - Include data from multiple databases in a single API response
    without a valid assembly token
  - Store asset context (hostname, IP) in Secrets DB columns
  - Store credential context (username) in Assets DB columns

Related Pages

Architecture-Overview — system context
Architecture-Encryption-Model — AES-256-GCM and envelope encryption
Database-Schema-Overview — all three database schemas
Security-Principles — Principle 1 (Data Fragmentation)
ADR-002-Data-Fragmentation — decision record for this model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Data Fragmentation

🔀 Architecture — Data Fragmentation (B++ Model)

Why Data Fragmentation?

The B++ Model Explained

Three Databases — The Fragmentation Design

What Each Database Looks Like Without the Others

Secrets DB in isolation (SCENARIO: attacker has ONLY Secrets DB)

Assets DB in isolation (SCENARIO: attacker has ONLY Assets DB)

Identity DB in isolation (SCENARIO: attacker has ONLY Identity DB)

All three databases together (attacker has ALL databases):

Cross-Database References

HashiCorp Vault as Trust Anchor

Assembly Token — The Authorization Mechanism

Security Analysis — Attack Scenarios

Scenario 1: SQL Injection on Secrets DB

Scenario 2: Full compromise of Assets DB server

Scenario 3: Compromised application service token

Scenario 4: Insider threat — DBA with access to all three databases

Scenario 5: Vault server compromised

Compliance Mapping

Implementation Constraints for Developers

Related Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally