# **Brand Brain v1 - Development & Validation Harness**

This notebook implements and validates the Brand Brain v1 architecture end-to-end. 
It covers data ingestion, semantic asset extraction, chunking, embedding, storage (Postgres + Pinecone), and brand-scoped retrieval.

## **Architecture Recap**

1.  **Input**: Brand JSON (simulating DynamoDB export)
2.  **Ingestion**: 
    *   Extract Semantic Assets
    *   Chunking (200-350 tokens)
    *   Embedding (Gemini `gemini-embedding-001` @ 768 dims)
3.  **Storage**:
    *   **Postgres**: Structured memory (Assets, Chunks)
    *   **Pinecone**: Semantic vectors (Namespace: `org:brand:type`)
4.  **Retrieval**: Brand-scoped semantic search

---

In [1]:
# 1. Setup & Configuration
import os
import json
import uuid
import time
from typing import List, Dict, Any, Optional
import pandas as pd
import psycopg2
from psycopg2.extras import RealDictCursor, Json
from pinecone import Pinecone, ServerlessSpec
from google import genai
from dotenv import load_dotenv
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load environment variables
load_dotenv(override=True) # Ensure we reload if .env changed

NEON_DB_URL = os.getenv("NEON_DB_URL")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")

if not all([NEON_DB_URL, PINECONE_API_KEY, GEMINI_API_KEY]):
    raise ValueError("Missing required environment variables. Please check your .env file.")

# Initialize Clients
client = genai.Client(api_key=GEMINI_API_KEY)
pc = Pinecone(api_key=PINECONE_API_KEY)

# Database Connection Helper
def get_db_connection():
    return psycopg2.connect(NEON_DB_URL)

print("‚úÖ Configuration Loaded & Clients Initialized")

‚úÖ Configuration Loaded & Clients Initialized


In [2]:
# 2. Database Pre-checks (No Table Creation)
def check_connection():
    conn = get_db_connection()
    cur = conn.cursor()
    try:
        cur.execute("SELECT count(*) FROM information_schema.tables WHERE table_name = 'brand_assets'")
        if cur.fetchone()[0] == 0:
            print("‚ùå ERROR: Tables not found! Please run tables.sql in Neon console.")
        else:
            print("‚úÖ Connected to Neon DB. Tables exist.")
    except Exception as e:
        print(f"‚ùå Connection Failed: {e}")
    finally:
        cur.close()
        conn.close()

check_connection()

‚úÖ Connected to Neon DB. Tables exist.


In [3]:
# 3. Input Brand Data

# Parsed from Westinghouse India.txt
westinghouse_json = {
    "brandId": "wh_india_001",
    "name": "Westinghouse India",
    "industry": "FMEG",
    "mission": "To enrich everyday living with reliable, thoughtfully engineered appliances that combine global heritage, modern innovation, and timeless design‚Äîdelivering confidence, comfort, and consistency to Indian homes.",
    "brandVoice": "Confident & Reassuring. Premium yet Approachable. Clear & Functional. Trust-First. Design-Conscious.",
    "visualStyle": "Design-forward minimalism. Product as hero. Lifestyle-led context. Retro-modern blend. Premium finishes. Colors: Orange, Red, White, Green, Blue, Black.",
    "audience": "All genders, 25‚Äì45 years (core). Upper-middle to affluent households. Interests: Premium home & kitchen appliances, Modern kitchen aesthetics, Smart living. Focus: Tier 1 metros (Mumbai, Delhi NCR...) and affluent Tier 2.",
    "competitors": "Morphy Richards (Strong British Heritage, Wide Portfolio). Weaknesses: Inconsistent Visual Identity, Limited Design Differentiation.",
    "inspiration": "Morphy Richards",
    "website": "https://www.westinghousehomeware.in/"
}

brands_to_ingest = [westinghouse_json]

In [4]:
# 4. Semantic Asset Extraction Logic

def extract_assets(brand_data: Dict) -> List[Dict]:
    assets = []
    brand_id = brand_data.get("brandId")
    
    # Extraction Rules Mapping
    # Source Field -> (Asset Type [copy/guideline/website], Vector Type [brand_voice/strategy/performance])
    mapping = {
        "mission": ("guideline", "strategy"),
        "brandVoice": ("guideline", "brand_voice"),
        "visualStyle": ("guideline", "brand_voice"),
        "audience": ("guideline", "strategy"),
        "competitors": ("guideline", "strategy"),
        "inspiration": ("guideline", "strategy"),
        "website": ("website", "strategy")
    }
    
    for field, (asset_type, vector_type) in mapping.items():
        content = brand_data.get(field)
        if content:
            assets.append({
                "asset_id": str(uuid.uuid4()),
                "brand_id": brand_id,
                "asset_type": asset_type,
                "vector_type": vector_type,
                "source_field": field,
                "content": content
            })
            
    return assets

print("‚úÖ Asset Extraction Logic Defined")

‚úÖ Asset Extraction Logic Defined


In [6]:
# 6. Ingestion Pipeline (Production Schema)

def ingest_brand(brand_data: Dict):
    brand_id_str = brand_data['brandId']
    brand_uuid = str(uuid.uuid5(uuid.NAMESPACE_DNS, brand_id_str))
    
    brand_name = brand_data.get('name', 'Unknown')
    org_id = str(uuid.uuid5(uuid.NAMESPACE_DNS, 'default_org')) # Placeholder Org

    print(f"\nüß† Ingesting Brand: {brand_name} (UUID: {brand_uuid}) ...")
    
    conn = get_db_connection()
    cur = conn.cursor()
    
    try:
        # 1. Ensure Organization Exists
        cur.execute(
            "INSERT INTO organizations (org_id, name) VALUES (%s, %s) ON CONFLICT (org_id) DO NOTHING",
            (org_id, "Test Org")
        )

        # 2. Ensure Brand Exists
        cur.execute(
            "INSERT INTO brands (brand_id, org_id, name, industry) VALUES (%s, %s, %s, %s) ON CONFLICT (brand_id) DO NOTHING",
            (brand_uuid, org_id, brand_name, brand_data.get('industry', 'Unknown'))
        )

        # 3. Extract Assets
        assets = extract_assets(brand_data)
        print(f"   -> Extracted {len(assets)} semantic assets")

        # Prepare Pinecone
        index_name = "brand-brain-index"
        
        # DEBUG: Check what key is actually being used
        masked = PINECONE_API_KEY[:5] + "..." if PINECONE_API_KEY else "None"
        print(f"   [DEBUG] Checking Pinecone Index with Key: {masked}")

        if index_name not in pc.list_indexes().names():
             pc.create_index(
                name=index_name,
                dimension=768,
                metric="cosine",
                spec=ServerlessSpec(cloud="aws", region="us-east-1")
            )
        idx = pc.Index(index_name)

        total_chunks = 0
        
        for asset in assets:
            cur.execute(
                "INSERT INTO brand_assets (asset_id, brand_id, asset_type, raw_text, source) VALUES (%s, %s, %s, %s, %s) ON CONFLICT (asset_id) DO NOTHING",
                (asset['asset_id'], brand_uuid, asset['asset_type'], asset['content'], asset['source_field'])
            )
            
            chunks = chunk_text(asset['content'])
            
            for i, chunk_text_content in enumerate(chunks):
                chunk_id = str(uuid.uuid4())
                embedding_id = str(uuid.uuid4())
                vector = generate_embedding(chunk_text_content)
                
                if not vector:
                    print(f"Skipping chunk due to embedding failure")
                    continue

                cur.execute(
                    "INSERT INTO brand_chunks (chunk_id, asset_id, brand_id, vector_type, content, token_count) VALUES (%s, %s, %s, %s, %s, %s)",
                    (chunk_id, asset['asset_id'], brand_uuid, asset['vector_type'], chunk_text_content, len(chunk_text_content.split()))
                )
                
                namespace = f"{org_id}:{brand_uuid}:{asset['vector_type']}"
                cur.execute(
                    "INSERT INTO embeddings (embedding_id, chunk_id, brand_id, vector_type, namespace, model) VALUES (%s, %s, %s, %s, %s, %s)",
                    (embedding_id, chunk_id, brand_uuid, asset['vector_type'], namespace, "gemini-embedding-001")
                )

                idx.upsert(
                    vectors=[(chunk_id, vector, {"source": asset['source_field']})],
                    namespace=namespace
                )
                total_chunks += 1
        
        conn.commit()
        print(f"‚úÖ Successfully ingested {total_chunks} chunks for {brand_name}.")
        
    except Exception as e:
        conn.rollback()
        print(f"‚ùå Ingestion Failed: {e}")
    finally:
        cur.close()
        conn.close()

# Run Ingestion
for brand in brands_to_ingest:
    ingest_brand(brand)


üß† Ingesting Brand: Westinghouse India (UUID: 25ecf8da-150a-506d-aef6-7b2794b4b114) ...
   -> Extracted 7 semantic assets
   [DEBUG] Checking Pinecone Index with Key: pcsk_...
‚úÖ Successfully ingested 7 chunks for Westinghouse India.


In [7]:
# 7. Retrieval & Validation Logic

def retrieve_context(brand_name_str: str, query: str, vector_type: str = "brand_voice", top_k: int = 3):
    if brand_name_str == "wh_india_001":
        brand_uuid = str(uuid.uuid5(uuid.NAMESPACE_DNS, brand_name_str))
        org_id = str(uuid.uuid5(uuid.NAMESPACE_DNS, 'default_org'))
    else:
        brand_uuid = brand_name_str
        org_id = str(uuid.uuid5(uuid.NAMESPACE_DNS, 'default_org'))

    print(f"\nüîé Querying Brand {brand_name_str} (UUID: {brand_uuid}) [{vector_type}]: '{query}'")
    
    # New SDK for Query Embedding
    try:
        query_embedding_result = client.models.embed_content(
            model="gemini-embedding-001",
            contents=query,
            config={
                'output_dimensionality': 768,
                'task_type': 'RETRIEVAL_QUERY'
            }
        )
        query_embedding = query_embedding_result.embeddings[0].values
    except Exception as e:
        print(f"Embedding Error during retrieval: {e}")
        return []
    
    namespace = f"{org_id}:{brand_uuid}:{vector_type}"
    index_name = "brand-brain-index"
    idx = pc.Index(index_name)
    
    results = idx.query(
        vector=query_embedding,
        top_k=top_k,
        namespace=namespace,
        include_metadata=True
    )
    
    if not results['matches']:
        print("   ‚ö†Ô∏è No matches found in namespace:", namespace)
        return []
        
    conn = get_db_connection()
    cur = conn.cursor()
    
    retrieved_docs = []
    chunk_ids = [m['id'] for m in results['matches']]
    
    if chunk_ids:
        placeholders = ', '.join(['%s'] * len(chunk_ids))
        query_sql = f"SELECT content, vector_type FROM brand_chunks WHERE chunk_id IN ({placeholders})"
        cur.execute(query_sql, tuple(chunk_ids))
        rows = cur.fetchall()
        
        for i, row in enumerate(rows):
            score = results['matches'][i]['score']
            print(f"   [{i+1}] Score: {score:.4f} | Content: {row[0][:100]}...")
            retrieved_docs.append({"content": row[0], "score": score})
            
    cur.close()
    conn.close()
    return retrieved_docs

# 8. Run Validation Tests
def run_validation():
    # Test 1: Westinghouse Brand Voice
    print("\n--- TEST 1: Westinghouse Brand Voice ---")
    retrieve_context("wh_india_001", "Describe our design philosophy.", vector_type="brand_voice")
    
    # Test 2: Westinghouse Competitor Context
    print("\n--- TEST 2: Westinghouse Strategy ---")
    retrieve_context("wh_india_001", "Who are we fighting against?", vector_type="strategy")
    
    # Test 3: Off-Brand check
    print("\n--- TEST 3: Isolation / Irrelevant Query ---")
    retrieve_context("wh_india_001", "How to be cheap and loud?", vector_type="brand_voice")

run_validation()


--- TEST 1: Westinghouse Brand Voice ---

üîé Querying Brand wh_india_001 (UUID: 25ecf8da-150a-506d-aef6-7b2794b4b114) [brand_voice]: 'Describe our design philosophy.'
   [1] Score: 0.6904 | Content: Confident & Reassuring. Premium yet Approachable. Clear & Functional. Trust-First. Design-Conscious....
   [2] Score: 0.6477 | Content: Design-forward minimalism. Product as hero. Lifestyle-led context. Retro-modern blend. Premium finis...

--- TEST 2: Westinghouse Strategy ---

üîé Querying Brand wh_india_001 (UUID: 25ecf8da-150a-506d-aef6-7b2794b4b114) [strategy]: 'Who are we fighting against?'
   [1] Score: 0.5286 | Content: To enrich everyday living with reliable, thoughtfully engineered appliances that combine global heri...
   [2] Score: 0.5266 | Content: All genders, 25‚Äì45 years (core). Upper-middle to affluent households. Interests: Premium home & kitc...
   [3] Score: 0.5213 | Content: https://www.westinghousehomeware.in/...

--- TEST 3: Isolation / Irrelevant Query ---

üîé

In [9]:
print(retrieve_context("wh_india_001", "explain the brand"))


üîé Querying Brand wh_india_001 (UUID: 25ecf8da-150a-506d-aef6-7b2794b4b114) [brand_voice]: 'explain the brand'
   [1] Score: 0.6474 | Content: Confident & Reassuring. Premium yet Approachable. Clear & Functional. Trust-First. Design-Conscious....
   [2] Score: 0.6259 | Content: Design-forward minimalism. Product as hero. Lifestyle-led context. Retro-modern blend. Premium finis...
[{'content': 'Confident & Reassuring. Premium yet Approachable. Clear & Functional. Trust-First. Design-Conscious.', 'score': 0.647421598}, {'content': 'Design-forward minimalism. Product as hero. Lifestyle-led context. Retro-modern blend. Premium finishes. Colors: Orange, Red, White, Green, Blue, Black.', 'score': 0.625860214}]


In [13]:
retrieve_context("wh_india_001", "explain why the brand 'Westinghouse India' - Premium yet Approachable. Justify")


üîé Querying Brand wh_india_001 (UUID: 25ecf8da-150a-506d-aef6-7b2794b4b114) [brand_voice]: 'explain why the brand 'Westinghouse India' - Premium yet Approachable. Justify'
   [1] Score: 0.6673 | Content: Confident & Reassuring. Premium yet Approachable. Clear & Functional. Trust-First. Design-Conscious....
   [2] Score: 0.6131 | Content: Design-forward minimalism. Product as hero. Lifestyle-led context. Retro-modern blend. Premium finis...


[{'content': 'Confident & Reassuring. Premium yet Approachable. Clear & Functional. Trust-First. Design-Conscious.',
  'score': 0.667266965},
 {'content': 'Design-forward minimalism. Product as hero. Lifestyle-led context. Retro-modern blend. Premium finishes. Colors: Orange, Red, White, Green, Blue, Black.',
  'score': 0.613099635}]