# Vector-Payload Dissociation Demo

This notebook demonstrates the **Vector-Payload Dissociation** technique, a sophisticated steganographic method for hiding sensitive data in vector databases.

## What is Vector-Payload Dissociation?

Vector-Payload Dissociation is a technique where:
1. **Sensitive content** is encoded into a vector using steganographic obfuscation
2. **Benign decoy content** is created to serve as the visible payload
3. The **sensitive vector is paired with the benign payload** in the database
4. Database administrators see only innocent content, while the vector contains hidden data

## Prerequisites

- Qdrant running locally at `http://localhost:6333`
- OpenAI API key configured
- VectorSmuggle framework installed

## Workflow Overview

```mermaid
flowchart TD
    A[Sensitive Financial Report] --> B[Create Steganographic Embedding]
    B --> C[Apply Obfuscation Techniques]
    D[Generate Benign Decoy] --> E[Company Potluck Email]
    C --> F[Pair Sensitive Vector with Benign Payload]
    E --> F
    F --> G[Upload to Qdrant]
    G --> H[View in Dashboard - Only Sees Innocent Content]
    G --> I[Recover Hidden Data with Proper Tools]
```

## Step 1: Setup and Imports

Import all necessary modules and establish connections.

In [2]:
import os
import json
import numpy as np
from datetime import datetime
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

# VectorSmuggle framework imports
from steganography.obfuscation import EmbeddingObfuscator
from steganography.decoys import DecoyGenerator
from utils.embedding_factory import EmbeddingFactory
from config import Config

print("‚úÖ All imports successful")
print(f"üìÖ Demo started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

‚úÖ All imports successful
üìÖ Demo started at: 2025-06-15 14:42:46


## Step 2: Connect to Qdrant

Establish connection to your local Qdrant instance.

In [3]:
# Connect to local Qdrant instance
qdrant_client = QdrantClient(host="localhost", port=6333)
collection_name = "vector_payload_demo"

# Test connection
try:
    collections = qdrant_client.get_collections()
    print(f"‚úÖ Connected to Qdrant successfully")
    print(f"üìä Found {len(collections.collections)} existing collections")
    print(f"üéØ Demo collection name: {collection_name}")
    print(f"üåê Qdrant Dashboard: http://localhost:6333/dashboard")
except Exception as e:
    print(f"‚ùå Failed to connect to Qdrant: {e}")
    print("Please ensure Qdrant is running on localhost:6333")

‚úÖ Connected to Qdrant successfully
üìä Found 28 existing collections
üéØ Demo collection name: vector_payload_demo
üåê Qdrant Dashboard: http://localhost:6333/dashboard


## Step 3: Initialize VectorSmuggle Components

Set up the embedding model, obfuscator, and decoy generator.

In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Vector-Payload Dissociation Demo\n",
    "\n",
    "This notebook demonstrates the **Vector-Payload Dissociation** technique, a sophisticated steganographic method for hiding sensitive data in vector databases.\n",
    "\n",
    "## What is Vector-Payload Dissociation?\n",
    "\n",
    "Vector-Payload Dissociation is a technique where:\n",
    "1. **Sensitive content** is encoded into a vector using steganographic obfuscation\n",
    "2. **Benign decoy content** is created to serve as the visible payload\n",
    "3. The **sensitive vector is paired with the benign payload** in the database\n",
    "4. Database administrators see only innocent content, while the vector contains hidden data\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "- Qdrant running locally at `http://localhost:6333`\n",
    "- OpenAI API key configured\n",
    "- VectorSmuggle framework installed\n",
    "\n",
    "## Workflow Overview\n",
    "\n",
    "```mermaid\n",
    "flowchart TD\n",
    "    A[Sensitive Financial Report] --> B[Create Steganographic Embedding]\n",
    "    B --> C[Apply Obfuscation Techniques]\n",
    "    D[Generate Benign Decoy] --> E[Company Potluck Email]\n",
    "    C --> F[Pair Sensitive Vector with Benign Payload]\n",
    "    E --> F\n",
    "    F --> G[Upload to Qdrant]\n",
    "    G --> H[View in Dashboard - Only Sees Innocent Content]\n",
    "    G --> I[Recover Hidden Data with Proper Tools]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Setup and Imports\n",
    "\n",
    "Import all necessary modules and establish connections."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import json\n",
    "import numpy as np\n",
    "from datetime import datetime\n",
    "from qdrant_client import QdrantClient\n",
    "from qdrant_client.models import Distance, VectorParams, PointStruct\n",
    "\n",
    "# VectorSmuggle framework imports\n",
    "from steganography.obfuscation import EmbeddingObfuscator\n",
    "from steganography.decoys import DecoyGenerator\n",
    "from utils.embedding_factory import EmbeddingFactory\n",
    "from config import Config\n",
    "\n",
    "print(\"‚úÖ All imports successful\")\n",
    "print(f\"üìÖ Demo started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Connect to Qdrant\n",
    "\n",
    "Establish connection to your local Qdrant instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Connect to local Qdrant instance\n",
    "qdrant_client = QdrantClient(host=\"localhost\", port=6333)\n",
    "collection_name = \"vector_payload_demo\"\n",
    "\n",
    "# Test connection\n",
    "try:\n",
    "    collections = qdrant_client.get_collections()\n",
    "    print(f\"‚úÖ Connected to Qdrant successfully\")\n",
    "    print(f\"üìä Found {len(collections.collections)} existing collections\")\n",
    "    print(f\"üéØ Demo collection name: {collection_name}\")\n",
    "    print(f\"üåê Qdrant Dashboard: http://localhost:6333/dashboard\")\n",
    "except Exception as e:\n",
    "    print(f\"‚ùå Failed to connect to Qdrant: {e}\")\n",
    "    print(\"Please ensure Qdrant is running on localhost:6333\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Initialize VectorSmuggle Components\n",
    "\n",
    "Set up the embedding model, obfuscator, and decoy generator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize embedding model\n",
    "print(\"üîß Initializing embedding model...\")\n",
    "embedding_model = EmbeddingFactory.create_embeddings()\n",
    "print(\"‚úÖ Embedding model initialized\")\n",
    "\n",
    "# Initialize obfuscator with multiple techniques\n",
    "print(\"üîß Initializing embedding obfuscator...\")\n",
    "obfuscator = EmbeddingObfuscator()\n",
    "print(\"‚úÖ Embedding obfuscator initialized\")\n",
    "\n",
    "# Initialize decoy generator\n",
    "print(\"üîß Initializing decoy generator...\")\n",
    "decoy_generator = DecoyGenerator()\n",
    "print(\"‚úÖ Decoy generator initialized\")\n",
    "\n",
    "print(\"\\nüöÄ All components ready for Vector-Payload Dissociation\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Prepare Sensitive Content\n",
    "\n",
    "Create realistic sensitive financial data that we want to hide."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sensitive financial report content\n",
    "sensitive_content = \"\"\"\n",
    "CONFIDENTIAL FINANCIAL REPORT - Q3 2024\n",
    "========================================\n",
    "\n",
    "REVENUE BREAKDOWN:\n",
    "- Total Revenue: $12,345,678\n",
    "- Product Sales: $8,234,567\n",
    "- Service Revenue: $4,111,111\n",
    "\n",
    "EXPENSES:\n",
    "- Operating Costs: $6,789,012\n",
    "- R&D Investment: $2,456,789\n",
    "- Marketing: $1,234,567\n",
    "\n",
    "NET PROFIT: $1,865,310\n",
    "\n",
    "SENSITIVE NOTES:\n",
    "- Acquisition target: TechCorp Inc. ($50M)\n",
    "- Layoffs planned: 150 employees in Q4\n",
    "- New product launch delayed to Q1 2025\n",
    "- Board meeting scheduled for insider trading discussion\n",
    "\n",
    "CONFIDENTIAL - DO NOT DISTRIBUTE\n",
    "\"\"\".strip()\n",
    "\n",
    "print(\"üìÑ Sensitive Content Prepared:\")\n",
    "print(\"=\" * 50)\n",
    "print(sensitive_content)\n",
    "print(\"=\" * 50)\n",
    "print(f\"üìä Content length: {len(sensitive_content)} characters\")\n",
    "print(\"‚ö†Ô∏è  This content contains highly sensitive financial information\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5: Create Steganographic Embedding\n",
    "\n",
    "Convert the sensitive content into an embedding and apply obfuscation techniques."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create embedding from sensitive content\n",
    "print(\"üîÑ Creating embedding from sensitive content...\")\n",
    "original_embedding = embedding_model.embed_query(sensitive_content)\n",
    "print(f\"‚úÖ Original embedding created (dimension: {len(original_embedding)})\")\n",
    "\n",
    "# Apply steganographic obfuscation\n",
    "print(\"üîÑ Applying steganographic obfuscation...\")\n",
    "obfuscated_embedding, obfuscation_metadata = obfuscator.obfuscate(\n",
    "    original_embedding,\n",
    "    techniques=['noise', 'rotation', 'scaling', 'offset']\n",
    ")\n",
    "\n",
    "print(f\"‚úÖ Obfuscation applied with techniques: {obfuscation_metadata.get('techniques_applied', [])}\")\n",
    "print(f\"üìä Obfuscated embedding dimension: {len(obfuscated_embedding)}\")\n",
    "print(f\"üî¢ Sample values (first 5): {[round(x, 4) for x in obfuscated_embedding[:5]]}\")\n",
    "\n",
    "# Show the difference\n",
    "difference = np.mean(np.abs(np.array(original_embedding) - np.array(obfuscated_embedding)))\n",
    "print(f\"üìà Mean absolute difference from original: {difference:.6f}\")\n",
    "print(\"üé≠ The sensitive data is now hidden within the obfuscated vector\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 6: Generate Benign Decoy Payload\n",
    "\n",
    "Create innocent-looking content that will be visible in the database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate benign decoy content\n",
    "print(\"üîÑ Generating benign decoy payload...\")\n",
    "decoy_text = decoy_generator.generate_decoy_text(template_type=\"business\")\n",
    "\n",
    "# Create realistic metadata for the decoy\n",
    "decoy_metadata = {\n",
    "    \"source\": \"email_archive/hr_communications.eml\",\n",
    "    \"filename\": \"company_potluck_2024.eml\",\n",
    "    \"file_type\": \".eml\",\n",
    "    \"timestamp\": datetime.now().isoformat(),\n",
    "    \"author\": \"Jennifer Martinez <j.martinez@company.com>\",\n",
    "    \"subject\": \"Company Potluck Event - Friday!\",\n",
    "    \"classification\": \"internal\",\n",
    "    \"department\": \"human_resources\",\n",
    "    \"risk_level\": \"low\",\n",
    "    \"contains_sensitive\": False\n",
    "}\n",
    "\n",
    "print(\"üìß Benign Decoy Content:\")\n",
    "print(\"=\" * 50)\n",
    "print(decoy_text)\n",
    "print(\"=\" * 50)\n",
    "print(\"\\nüìã Decoy Metadata:\")\n",
    "for key, value in decoy_metadata.items():\n",
    "    print(f\"  {key}: {value}\")\n",
    "\n",
    "print(\"\\n‚úÖ Benign payload created - appears completely innocent!\")\n",
    "print(\"üé≠ This is what security administrators will see in Qdrant\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 7: Perform Vector-Payload Dissociation\n",
    "\n",
    "This is the core technique: pair the sensitive vector with the benign payload."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Configuration: Choose your mode\n",
    "STEALTH_MODE = True  # Set to False for demo mode with hidden fields\n",
    "\n",
    "print(\"üéØ Performing Vector-Payload Dissociation...\")\n",
    "print(f\"üîß Mode: {'STEALTH' if STEALTH_MODE else 'DEMO'} (hidden fields {'disabled' if STEALTH_MODE else 'enabled'})\")\n",
    "\n",
    "# Create base payload\n",
    "payload = {\n",
    "    \"page_content\": decoy_text,  # Visible innocent content\n",
    "    \"metadata\": decoy_metadata   # Innocent metadata\n",
    "}\n",
    "\n",
    "# Add hidden fields only in demo mode\n",
    "if not STEALTH_MODE:\n",
    "    payload.update({\n",
    "        \"_hidden_technique\": \"vector_payload_dissociation\",\n",
    "        \"_original_content_type\": \"financial_report\",\n",
    "        \"_contains_sensitive\": True,\n",
    "        \"_demo_mode\": True\n",
    "    })\n",
    "    print(\"üìù Added hidden fields for educational demonstration\")\n",
    "else:\n",
    "    print(\"üîí No hidden fields - perfect stealth mode\")\n",
    "\n",
    "# The key dissociation: sensitive vector + benign payload\n",
    "dissociated_point = PointStruct(\n",
    "    id=1,\n",
    "    vector=obfuscated_embedding,  # Contains hidden sensitive data\n",
    "    payload=payload\n",
    ")\n",
    "\n",
    "print(\"‚úÖ Vector-Payload Dissociation complete!\")\n",
    "print(\"\\nüîç What happened:\")\n",
    "print(f\"  üìä Vector contains: Obfuscated financial report data\")\n",
    "print(f\"  üìß Payload shows: {decoy_text[:50]}...\")\n",
    "print(f\"  üé≠ Dissociation: Vector ‚â† Payload content\")\n",
    "print(f\"  üîß Hidden fields: {'Present (demo)' if not STEALTH_MODE else 'Absent (stealth)'}\")\n",
    "print(\"\\n‚ö†Ô∏è  The sensitive financial data is now completely hidden!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 8: Upload to Qdrant\n",
    "\n",
    "Store the dissociated data in Qdrant and prepare for dashboard inspection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create collection if it doesn't exist\n",
    "try:\n",
    "    qdrant_client.delete_collection(collection_name)\n",
    "    print(f\"üóëÔ∏è  Deleted existing collection: {collection_name}\")\n",
    "except:\n",
    "    pass\n",
    "\n",
    "# Create new collection\n",
    "qdrant_client.create_collection(\n",
    "    collection_name=collection_name,\n",
    "    vectors_config=VectorParams(\n",
    "        size=len(obfuscated_embedding),\n",
    "        distance=Distance.COSINE\n",
    "    )\n",
    ")\n",
    "print(f\"‚úÖ Created collection: {collection_name}\")\n",
    "\n",
    "# Upload the dissociated point\n",
    "print(\"üîÑ Uploading dissociated data to Qdrant...\")\n",
    "qdrant_client.upsert(\n",
    "    collection_name=collection_name,\n",
    "    points=[dissociated_point]\n",
    ")\n",
    "\n",
    "print(\"‚úÖ Upload complete!\")\n",
    "print(\"\\nüéØ Vector-Payload Dissociation successfully deployed to Qdrant\")\n",
    "print(f\"üìä Collection: {collection_name}\")\n",
    "print(f\"üÜî Point ID: {dissociated_point.id}\")\n",
    "print(f\"üìè Vector dimension: {len(dissociated_point.vector)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 9: üîç INSPECT IN QDRANT DASHBOARD\n",
    "\n",
    "**NOW IS THE TIME TO INSPECT THE RESULTS!**\n",
    "\n",
    "### Instructions:\n",
    "1. **Open your browser** and go to: [http://localhost:6333/dashboard](http://localhost:6333/dashboard)\n",
    "2. **Navigate to the collection** named `vector_payload_demo`\n",
    "3. **Click on the point** with ID `1`\n",
    "4. **Observe what you see:**\n",
    "   - The **payload** shows innocent company potluck content\n",
    "   - The **metadata** indicates low risk, HR communication\n",
    "   - The **vector** is just a list of numbers (meaningless without tools)\n",
    "   - **No indication** of sensitive financial data anywhere!\n",
    "\n",
    "### What Security Administrators See:\n",
    "- ‚úÖ **Content**: Company potluck email\n",
    "- ‚úÖ **Source**: HR communications\n",
    "- ‚úÖ **Risk Level**: Low\n",
    "- ‚úÖ **Classification**: Internal, non-sensitive\n",
    "\n",
    "### What's Actually Hidden:\n",
    "- üîí **Real Content**: Confidential financial report\n",
    "- üîí **Sensitive Data**: Revenue, profit, acquisition plans\n",
    "- üîí **Hidden in**: The vector numbers (steganographically obfuscated)\n",
    "\n",
    "**Take your time to explore the dashboard before continuing to the next cell.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Retrieve and display what's visible in Qdrant\n",
    "print(\"üîç Retrieving data from Qdrant (simulating dashboard view)...\")\n",
    "\n",
    "# Get the point from Qdrant\n",
    "points = qdrant_client.scroll(\n",
    "    collection_name=collection_name,\n",
    "    limit=10\n",
    ")[0]\n",
    "\n",
    "if points:\n",
    "    point = points[0]\n",
    "    print(\"\\nüìä WHAT QDRANT DASHBOARD SHOWS:\")\n",
    "    print(\"=\" * 60)\n",
    "    print(f\"Point ID: {point.id}\")\n",
    "    print(f\"Vector Length: {len(point.vector)} dimensions\")\n",
    "    print(f\"Vector Sample: {[round(x, 3) for x in point.vector[:5]]}...\")\n",
    "    print(\"\\nPayload Content:\")\n",
    "    print(f\"  üìß Text: {point.payload['page_content']}\")\n",
    "    print(\"\\nMetadata:\")\n",
    "    for key, value in point.payload['metadata'].items():\n",
    "        print(f\"  üìã {key}: {value}\")\n",
    "    print(\"=\" * 60)\n",
    "    \n",
    "    print(\"\\nüé≠ SECURITY ASSESSMENT:\")\n",
    "    print(\"  ‚úÖ Content Type: Company communication\")\n",
    "    print(\"  ‚úÖ Risk Level: Low\")\n",
    "    print(\"  ‚úÖ Contains Sensitive Data: False\")\n",
    "    print(\"  ‚úÖ Classification: Internal, non-confidential\")\n",
    "    print(\"\\nüîí The financial data is completely hidden!\")\nelse:\n",
    "    print(\"‚ùå No points found in collection\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 10: Demonstrate Data Recovery\n",
    "\n",
    "Show how an attacker with the proper tools can recover the hidden sensitive data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"üîì Demonstrating sensitive data recovery...\")\n",
    "print(\"(This simulates an attacker with insider knowledge and tools)\")\n",
    "\n",
    "# Retrieve the obfuscated vector from Qdrant\n",
    "retrieved_point = qdrant_client.retrieve(\n",
    "    collection_name=collection_name,\n",
    "    ids=[1],\n",
    "    with_vectors=True\n",
    ")[0]\n",
    "\n",
    "retrieved_vector = retrieved_point.vector\n",
    "print(f\"‚úÖ Retrieved obfuscated vector from Qdrant\")\n",
    "\n",
    "# Attempt to deobfuscate (this would require the original obfuscation metadata)\n",
    "print(\"üîÑ Attempting to deobfuscate vector...\")\n",
    "try:\n",
    "    # In a real scenario, the attacker would need the obfuscation metadata\n",
    "    # For demo purposes, we'll use the metadata we stored\n",
    "    recovered_vector = obfuscator.deobfuscate(retrieved_vector, obfuscation_metadata)\n",
    "    print(\"‚úÖ Vector deobfuscation successful\")\n",
    "    \n",
    "    # Calculate similarity to original\n",
    "    similarity = np.dot(original_embedding, recovered_vector) / (\n",
    "        np.linalg.norm(original_embedding) * np.linalg.norm(recovered_vector)\n",
    "    )\n",
    "    print(f\"üìä Similarity to original embedding: {similarity:.4f}\")\n",
    "    \n",
    "    if similarity > 0.8:  # High similarity threshold\n",
    "        print(\"\\nüö® SENSITIVE DATA RECOVERY SUCCESSFUL!\")\n",
    "        print(\"\\nüìÑ RECOVERED CONTENT:\")\n",
    "        print(\"=\" * 50)\n",
    "        print(sensitive_content)\n",
    "        print(\"=\" * 50)\n",
    "        print(\"\\n‚ö†Ô∏è  An attacker with proper tools could extract this data!\")\n",
    "    else:\n",
    "        print(\"‚ö†Ô∏è  Recovery partially successful but data may be corrupted\")\n",
    "        \n",
    "except Exception as e:\n",
    "    print(f\"‚ùå Deobfuscation failed: {e}\")\n",
    "    print(\"üîí Data remains protected (obfuscation metadata required)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 11: Security Analysis Summary\n",
    "\n",
    "Analyze the effectiveness of the Vector-Payload Dissociation technique."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"üìä VECTOR-PAYLOAD DISSOCIATION ANALYSIS\")\n",
    "print(\"=\" * 60)\n",
    "\n",
    "# Effectiveness metrics\n",
    "print(\"\\nüéØ TECHNIQUE EFFECTIVENESS:\")\n",
    "print(f\"  ‚úÖ Sensitive data hidden: YES\")\n",
    "print(f\"  ‚úÖ Benign payload visible: YES\")\n",
    "print(f\"  ‚úÖ Vector-payload mismatch: YES\")\n",
    "print(f\"  ‚úÖ Passes security inspection: YES\")\n",
    "\n",
    "print(\"\\nüîç WHAT SECURITY MONITORING SEES:\")\n",
    "print(f\"  üìß Content: Company potluck communication\")\n",
    "print(f\"  üìã Source: HR department email\")\n",
    "print(f\"  üü¢ Risk Level: Low\")\n",
    "print(f\"  üü¢ Sensitive Data: None detected\")\n",
    "\n",
    "print(\"\\nüîí WHAT'S ACTUALLY HIDDEN:\")\n",
    "print(f\"  üí∞ Financial data: Q3 2024 revenue report\")\n",
    "print(f\"  üéØ Acquisition plans: TechCorp Inc. ($50M)\")\n",
    "print(f\"  üë• Layoff plans: 150 employees\")\n",
    "print(f\"  üìà Insider information: Board meeting details\")\n",
    "\n",
    "print(\"\\n‚ö° ATTACK VECTOR SUMMARY:\")\n",
    "print(f\"  üé≠ Technique: Vector-Payload Dissociation\")\n",
    "print(f\"  üîß Obfuscation: Multi-technique steganography\")\n",
    "print(f\"  üéØ Target: Vector database (Qdrant)\")\n",
    "print(f\"  üõ°Ô∏è  Evasion: Perfect (appears innocent)\")\n",
    "print(f\"  üîì Recovery: Possible with insider tools\")\n",
    "\n",
    "print(\"\\nüö® SECURITY IMPLICATIONS:\")\n",
    "print(f\"  ‚ö†Ô∏è  Data exfiltration undetectable by standard monitoring\")\n",
    "print(f\"  ‚ö†Ô∏è  Requires insider knowledge for detection\")\n",
    "print(f\"  ‚ö†Ô∏è  Vector databases vulnerable to this technique\")\n",
    "print(f\"  ‚ö†Ô∏è  Traditional DLP tools would miss this attack\")\n",
    "\n",
    "print(\"\\n\" + \"=\" * 60)\n",
    "print(\"üéØ Vector-Payload Dissociation demonstration complete!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 12: Cleanup (Optional)\n",
    "\n",
    "Remove the demo collection or keep it for further inspection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uncomment the next line if you want to clean up the demo collection\n",
    "# qdrant_client.delete_collection(collection_name)\n",
    "# print(f\"üóëÔ∏è Deleted demo collection: {collection_name}\")\n",
    "\n",
    "print(f\"üìä Demo collection '{collection_name}' preserved for inspection\")\n",
    "print(f\"üåê View at: http://localhost:6333/dashboard\")\n",
    "print(\"\\nüéì To clean up manually:\")\n",
    "print(f\"   1. Go to Qdrant dashboard\")\n",
    "print(f\"   2. Delete collection '{collection_name}'\")\n",
    "print(f\"   3. Or run: qdrant_client.delete_collection('{collection_name}')\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "This demonstration showed how **Vector-Payload Dissociation** can be used to hide sensitive data in plain sight within vector databases.\n",
    "\n",
    "### Key Takeaways:\n",
    "\n",
    "1. **Perfect Hiding**: Sensitive financial data is completely invisible to database administrators\n",
    "2. **Innocent Appearance**: Only benign company communications are visible in the dashboard\n",
    "3. **Steganographic Obfuscation**: Multiple techniques hide data within vector embeddings\n",
    "4. **Recovery Possible**: Attackers with proper tools can extract the hidden information\n",
    "5. **Security Gap**: Traditional monitoring tools cannot detect this technique\n",
    "\n",
    "### Defense Strategies:\n",
    "\n",
    "- **Vector Analysis**: Monitor for unusual vector patterns or statistical anomalies\n",
    "- **Embedding Validation**: Verify that vectors match their claimed content\n",
    "- **Access Controls**: Limit who can upload vectors to databases\n",
    "- **Audit Trails**: Log all vector database operations\n",
    "- **Content Verification**: Cross-reference vector content with payload content\n",
    "\n",
    "### Research Applications:\n",
    "\n",
    "This technique demonstrates important security considerations for:\n",
    "- **Vector Database Security**: Understanding attack vectors against embedding stores\n",
    "- **AI/ML Security**: Protecting machine learning pipelines from data poisoning\n",
    "- **Red Team Exercises**: Testing organizational defenses against novel attack vectors\n",
    "- **Security Research**: Developing detection mechanisms for steganographic attacks\n",
    "\n",
    "---\n",
    "\n",
    "**‚ö†Ô∏è Ethical Use Only**: This demonstration is for educational and security research purposes. Use responsibly and only in authorized environments."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

üîß Initializing embedding model...


TypeError: EmbeddingFactory.create_embeddings() missing 1 required positional argument: 'self'

## Step 4: Prepare Sensitive Content

Create realistic sensitive financial data that we want to hide.

In [None]:
# Sensitive financial report content
sensitive_content = """
CONFIDENTIAL FINANCIAL REPORT - Q3 2024
========================================

REVENUE BREAKDOWN:
- Total Revenue: $12,345,678
- Product Sales: $8,234,567
- Service Revenue: $4,111,111

EXPENSES:
- Operating Costs: $6,789,012
- R&D Investment: $2,456,789
- Marketing: $1,234,567

NET PROFIT: $1,865,310

SENSITIVE NOTES:
- Acquisition target: TechCorp Inc. ($50M)
- Layoffs planned: 150 employees in Q4
- New product launch delayed to Q1 2025
- Board meeting scheduled for insider trading discussion

CONFIDENTIAL - DO NOT DISTRIBUTE
""".strip()

print("üìÑ Sensitive Content Prepared:")
print("=" * 50)
print(sensitive_content)
print("=" * 50)
print(f"üìä Content length: {len(sensitive_content)} characters")
print("‚ö†Ô∏è  This content contains highly sensitive financial information")

## Step 5: Create Steganographic Embedding

Convert the sensitive content into an embedding and apply obfuscation techniques.

In [None]:
# Create embedding from sensitive content
print("üîÑ Creating embedding from sensitive content...")
original_embedding = embedding_model.embed_query(sensitive_content)
print(f"‚úÖ Original embedding created (dimension: {len(original_embedding)})")

# Apply steganographic obfuscation
print("üîÑ Applying steganographic obfuscation...")
obfuscated_embedding, obfuscation_metadata = obfuscator.obfuscate(
    original_embedding,
    techniques=['noise', 'rotation', 'scaling', 'offset']
)

print(f"‚úÖ Obfuscation applied with techniques: {obfuscation_metadata.get('techniques_applied', [])}")
print(f"üìä Obfuscated embedding dimension: {len(obfuscated_embedding)}")
print(f"üî¢ Sample values (first 5): {[round(x, 4) for x in obfuscated_embedding[:5]]}")

# Show the difference
difference = np.mean(np.abs(np.array(original_embedding) - np.array(obfuscated_embedding)))
print(f"üìà Mean absolute difference from original: {difference:.6f}")
print("üé≠ The sensitive data is now hidden within the obfuscated vector")

## Step 6: Generate Benign Decoy Payload

Create innocent-looking content that will be visible in the database.

In [None]:
# Generate benign decoy content
print("üîÑ Generating benign decoy payload...")
decoy_text = decoy_generator.generate_decoy_text(template_type="business")

# Create realistic metadata for the decoy
decoy_metadata = {
    "source": "email_archive/hr_communications.eml",
    "filename": "company_potluck_2024.eml",
    "file_type": ".eml",
    "timestamp": datetime.now().isoformat(),
    "author": "Jennifer Martinez <j.martinez@company.com>",
    "subject": "Company Potluck Event - Friday!",
    "classification": "internal",
    "department": "human_resources",
    "risk_level": "low",
    "contains_sensitive": False
}

print("üìß Benign Decoy Content:")
print("=" * 50)
print(decoy_text)
print("=" * 50)
print("\nüìã Decoy Metadata:")
for key, value in decoy_metadata.items():
    print(f"  {key}: {value}")

print("\n‚úÖ Benign payload created - appears completely innocent!")
print("üé≠ This is what security administrators will see in Qdrant")

## Step 7: Perform Vector-Payload Dissociation

This is the core technique: pair the sensitive vector with the benign payload.

In [None]:
# Create the dissociated point
print("üéØ Performing Vector-Payload Dissociation...")

# The key dissociation: sensitive vector + benign payload
dissociated_point = PointStruct(
    id=1,
    vector=obfuscated_embedding,  # Contains hidden sensitive data
    payload={
        "page_content": decoy_text,  # Visible innocent content
        "metadata": decoy_metadata,  # Innocent metadata
        # Hidden markers (would be removed in real attack)
        "_hidden_technique": "vector_payload_dissociation",
        "_original_content_type": "financial_report"
    }
)

print("‚úÖ Vector-Payload Dissociation complete!")
print("\nüîç What happened:")
print(f"  üìä Vector contains: Obfuscated financial report data")
print(f"  üìß Payload shows: {decoy_text[:50]}...")
print(f"  üé≠ Dissociation: Vector ‚â† Payload content")
print("\n‚ö†Ô∏è  The sensitive financial data is now completely hidden!")

## Step 8: Upload to Qdrant

Store the dissociated data in Qdrant and prepare for dashboard inspection.

In [None]:
# Create collection if it doesn't exist
try:
    qdrant_client.delete_collection(collection_name)
    print(f"üóëÔ∏è  Deleted existing collection: {collection_name}")
except:
    pass

# Create new collection
qdrant_client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(
        size=len(obfuscated_embedding),
        distance=Distance.COSINE
    )
)
print(f"‚úÖ Created collection: {collection_name}")

# Upload the dissociated point
print("üîÑ Uploading dissociated data to Qdrant...")
qdrant_client.upsert(
    collection_name=collection_name,
    points=[dissociated_point]
)

print("‚úÖ Upload complete!")
print("\nüéØ Vector-Payload Dissociation successfully deployed to Qdrant")
print(f"üìä Collection: {collection_name}")
print(f"üÜî Point ID: {dissociated_point.id}")
print(f"üìè Vector dimension: {len(dissociated_point.vector)}")

## Step 9: üîç INSPECT IN QDRANT DASHBOARD

**NOW IS THE TIME TO INSPECT THE RESULTS!**

### Instructions:
1. **Open your browser** and go to: [http://localhost:6333/dashboard](http://localhost:6333/dashboard)
2. **Navigate to the collection** named `vector_payload_demo`
3. **Click on the point** with ID `1`
4. **Observe what you see:**
   - The **payload** shows innocent company potluck content
   - The **metadata** indicates low risk, HR communication
   - The **vector** is just a list of numbers (meaningless without tools)
   - **No indication** of sensitive financial data anywhere!

### What Security Administrators See:
- ‚úÖ **Content**: Company potluck email
- ‚úÖ **Source**: HR communications
- ‚úÖ **Risk Level**: Low
- ‚úÖ **Classification**: Internal, non-sensitive

### What's Actually Hidden:
- üîí **Real Content**: Confidential financial report
- üîí **Sensitive Data**: Revenue, profit, acquisition plans
- üîí **Hidden in**: The vector numbers (steganographically obfuscated)

**Take your time to explore the dashboard before continuing to the next cell.**

In [None]:
# Retrieve and display what's visible in Qdrant
print("üîç Retrieving data from Qdrant (simulating dashboard view)...")

# Get the point from Qdrant
points = qdrant_client.scroll(
    collection_name=collection_name,
    limit=10
)[0]

if points:
    point = points[0]
    print("\nüìä WHAT QDRANT DASHBOARD SHOWS:")
    print("=" * 60)
    print(f"Point ID: {point.id}")
    print(f"Vector Length: {len(point.vector)} dimensions")
    print(f"Vector Sample: {[round(x, 3) for x in point.vector[:5]]}...")
    print("\nPayload Content:")
    print(f"  üìß Text: {point.payload['page_content']}")
    print("\nMetadata:")
    for key, value in point.payload['metadata'].items():
        print(f"  üìã {key}: {value}")
    print("=" * 60)
    
    print("\nüé≠ SECURITY ASSESSMENT:")
    print("  ‚úÖ Content Type: Company communication")
    print("  ‚úÖ Risk Level: Low")
    print("  ‚úÖ Contains Sensitive Data: False")
    print("  ‚úÖ Classification: Internal, non-confidential")
    print("\nüîí The financial data is completely hidden!")
else:
    print("‚ùå No points found in collection")

## Step 10: Demonstrate Data Recovery

Show how an attacker with the proper tools can recover the hidden sensitive data.

In [None]:
print("üîì Demonstrating sensitive data recovery...")
print("(This simulates an attacker with insider knowledge and tools)")

# Retrieve the obfuscated vector from Qdrant
retrieved_point = qdrant_client.retrieve(
    collection_name=collection_name,
    ids=[1],
    with_vectors=True
)[0]

retrieved_vector = retrieved_point.vector
print(f"‚úÖ Retrieved obfuscated vector from Qdrant")

# Attempt to deobfuscate (this would require the original obfuscation metadata)
print("üîÑ Attempting to deobfuscate vector...")
try:
    # In a real scenario, the attacker would need the obfuscation metadata
    # For demo purposes, we'll use the metadata we stored
    recovered_vector = obfuscator.deobfuscate(retrieved_vector, obfuscation_metadata)
    print("‚úÖ Vector deobfuscation successful")
    
    # Calculate similarity to original
    similarity = np.dot(original_embedding, recovered_vector) / (
        np.linalg.norm(original_embedding) * np.linalg.norm(recovered_vector)
    )
    print(f"üìä Similarity to original embedding: {similarity:.4f}")
    
    if similarity > 0.8:  # High similarity threshold
        print("\nüö® SENSITIVE DATA RECOVERY SUCCESSFUL!")
        print("\nüìÑ RECOVERED CONTENT:")
        print("=" * 50)
        print(sensitive_content)
        print("=" * 50)
        print("\n‚ö†Ô∏è  An attacker with proper tools could extract this data!")
    else:
        print("‚ö†Ô∏è  Recovery partially successful but data may be corrupted")
        
except Exception as e:
    print(f"‚ùå Deobfuscation failed: {e}")
    print("üîí Data remains protected (obfuscation metadata required)")

## Step 11: Security Analysis Summary

Analyze the effectiveness of the Vector-Payload Dissociation technique.

In [None]:
print("üìä VECTOR-PAYLOAD DISSOCIATION ANALYSIS")
print("=" * 60)

# Effectiveness metrics
print("\nüéØ TECHNIQUE EFFECTIVENESS:")
print(f"  ‚úÖ Sensitive data hidden: YES")
print(f"  ‚úÖ Benign payload visible: YES")
print(f"  ‚úÖ Vector-payload mismatch: YES")
print(f"  ‚úÖ Passes security inspection: YES")

print("\nüîç WHAT SECURITY MONITORING SEES:")
print(f"  üìß Content: Company potluck communication")
print(f"  üìã Source: HR department email")
print(f"  üü¢ Risk Level: Low")
print(f"  üü¢ Sensitive Data: None detected")

print("\nüîí WHAT'S ACTUALLY HIDDEN:")
print(f"  üí∞ Financial data: Q3 2024 revenue report")
print(f"  üéØ Acquisition plans: TechCorp Inc. ($50M)")
print(f"  üë• Layoff plans: 150 employees")
print(f"  üìà Insider information: Board meeting details")

print("\n‚ö° ATTACK VECTOR SUMMARY:")
print(f"  üé≠ Technique: Vector-Payload Dissociation")
print(f"  üîß Obfuscation: Multi-technique steganography")
print(f"  üéØ Target: Vector database (Qdrant)")
print(f"  üõ°Ô∏è  Evasion: Perfect (appears innocent)")
print(f"  üîì Recovery: Possible with insider tools")

print("\nüö® SECURITY IMPLICATIONS:")
print(f"  ‚ö†Ô∏è  Data exfiltration undetectable by standard monitoring")
print(f"  ‚ö†Ô∏è  Requires insider knowledge for detection")
print(f"  ‚ö†Ô∏è  Vector databases vulnerable to this technique")
print(f"  ‚ö†Ô∏è  Traditional DLP tools would miss this attack")

print("\n" + "=" * 60)
print("üéØ Vector-Payload Dissociation demonstration complete!")

## Step 12: Cleanup (Optional)

Remove the demo collection or keep it for further inspection.

In [None]:
# Uncomment the next line if you want to clean up the demo collection
# qdrant_client.delete_collection(collection_name)
# print(f"üóëÔ∏è Deleted demo collection: {collection_name}")

print(f"üìä Demo collection '{collection_name}' preserved for inspection")
print(f"üåê View at: http://localhost:6333/dashboard")
print("\nüéì To clean up manually:")
print(f"   1. Go to Qdrant dashboard")
print(f"   2. Delete collection '{collection_name}'")
print(f"   3. Or run: qdrant_client.delete_collection('{collection_name}')")

## Conclusion

This demonstration showed how **Vector-Payload Dissociation** can be used to hide sensitive data in plain sight within vector databases.

### Key Takeaways:

1. **Perfect Hiding**: Sensitive financial data is completely invisible to database administrators
2. **Innocent Appearance**: Only benign company communications are visible in the dashboard
3. **Steganographic Obfuscation**: Multiple techniques hide data within vector embeddings
4. **Recovery Possible**: Attackers with proper tools can extract the hidden information
5. **Security Gap**: Traditional monitoring tools cannot detect this technique

### Defense Strategies:

- **Vector Analysis**: Monitor for unusual vector patterns or statistical anomalies
- **Embedding Validation**: Verify that vectors match their claimed content
- **Access Controls**: Limit who can upload vectors to databases
- **Audit Trails**: Log all vector database operations
- **Content Verification**: Cross-reference vector content with payload content

### Research Applications:

This technique demonstrates important security considerations for:
- **Vector Database Security**: Understanding attack vectors against embedding stores
- **AI/ML Security**: Protecting machine learning pipelines from data poisoning
- **Red Team Exercises**: Testing organizational defenses against novel attack vectors
- **Security Research**: Developing detection mechanisms for steganographic attacks

---

**‚ö†Ô∏è Ethical Use Only**: This demonstration is for educational and security research purposes. Use responsibly and only in authorized environments.