# Experiment 11: QA Engine - Hypothesis-Driven Research

A **hypothesis-driven** Q&A engine for researching Salient (mental availability) changes.

**IMPORTANT NOTE**
- Currently, the only question supported is "Salience fell by 6 points in Q3 2025 for new look, can you help find external reasons for decreased mental availability for fashion & apparel retail category?"
- This is due to context was added manually around Salience and New Look not due to experiment design

**Approach:** Like a human researcher, generates hypotheses **separately by category**:
1. üåç **Market/Macro** - Industry-wide trends (NOT brand-specific)
2. üè∑Ô∏è **Brand** - What the brand did/didn't do
3. ‚öîÔ∏è **Competitive** - What competitors are doing

**Workflow:**
1. **Parse Question** - Extract brand, direction
2. **Generate Hypotheses** - Separately for market, brand, competitive
3. **Generate Search Queries** - Targeted queries per hypothesis
4. **Execute Searches** - Parallel search with Tier 1 source prioritization
5. **Return Findings** - Only RELEVANT facts that explain the metric change

**Key Rules:**
- ‚úÖ Hypotheses separated by category
- ‚úÖ Market = industry trends (not brand-specific)
- ‚úÖ Only relevant findings (e.g., for decreased Salient, only news that reduces visibility)
- ‚úÖ Tier 1 sources prioritized
- üö´ No inferences - facts only
- üö´ No vague "strategy" news without concrete impact


In [None]:
# Cell 1: Setup and Dependencies
import os
import json
import re
from typing import Dict, List, Any, Optional, Tuple
from datetime import datetime
from dataclasses import dataclass, field
from IPython.display import display, HTML, Markdown
from concurrent.futures import ThreadPoolExecutor, as_completed
import time

from openai import OpenAI

# Initialize OpenAI client
OPENAI_API_KEY = ""
client = OpenAI(api_key=OPENAI_API_KEY)

# ============================================================
# OpenAI Web Search Configuration
# Using OpenAI's built-in web_search tool via Responses API
# ============================================================
SEARCH_MODEL = "gpt-4o-search-preview"  # Options: "gpt-4o-search-preview", "gpt-4o-mini-search-preview"

print("‚úì OpenAI client initialized")
print(f"‚úì Using OpenAI built-in web search with model: {SEARCH_MODEL}")


‚úì OpenAI client initialized
‚úì Using OpenAI built-in web search with model: gpt-4o-search-preview


In [14]:
# Cell 2: Data Classes and Configuration

@dataclass
class ParsedQuestion:
    """Structured representation of the user's question."""
    original_question: str
    brand: str
    metrics: List[str]
    direction: str  # 'increase', 'decrease', or 'change'
    time_period: Optional[str] = None
    additional_context: Optional[str] = None

@dataclass
class SearchResult:
    """A single search result with source tracking."""
    title: str
    url: str
    snippet: str
    source_name: str
    date: Optional[str] = None
    relevance_score: float = 0.0

@dataclass
class DriverInsight:
    """An insight about a potential driver with source citation."""
    insight: str
    category: str  # 'macro', 'brand', 'competitive'
    confidence: str  # 'high', 'medium', 'low'
    sources: List[SearchResult]
    
# ============================================================
# Competitor Database
# ============================================================
COMPETITOR_DATABASE: Dict[str, List[str]] = {
    "new look": [
        "primark", "marks and spencer", "m&s", "asos", "next", 
        "h&m", "shein", "zara", "river island", "boohoo", 
        "very", "amazon", "tk maxx", "george by asda", "jd sports"
    ],
}

# ============================================================
# Metric Dictionary - Salient
# ============================================================
# Simplified metric dictionary - only definition and interpretation
METRIC_DICTIONARY: Dict[str, Any] = {
    "salient": {
        "name": "Salient",
        "definition": """Salient measures how easily and quickly a brand comes to mind in buying or usage situations. 
It captures mental availability, not liking or differentiation. A brand is Salient when it is easily recalled, 
top-of-mind, and mentally linked to category entry points and occasions.

Key question: "How quickly does this brand come to mind when people think about the category?"

What Salient is NOT:
- NOT brand awareness alone
- NOT differentiation or uniqueness  
- NOT needs fulfilment (Meaningful)
- NOT usage, loyalty, or satisfaction
- NOT short-term campaign recall""",
        
        "interpretation": {
            "increase": "Improved mental availability and speed to mind",
            "decrease": "Fading mental presence or recall",
            "stable": "Stable brand salience (no significant change)"
        },
        
        "drivers_of_change": """
WHAT INCREASES SALIENCE:
- Heavy advertising (especially broad-reach: TV, OOH, digital display)
- Strong physical store presence (high footfall locations)
- Frequent media mentions and PR coverage
- Viral moments, celebrity associations
- Distinctive brand assets consistently reinforced

WHAT DECREASES SALIENCE (especially for high-street retail brands):
- Reduced advertising spend
- Store closures or reduced physical presence
- Shift to online shopping = less passive brand exposure from walking past stores
- Competitor campaigns stealing share of voice
- Brand going "quiet" - less media activity
- Economic pressures reducing category attention overall

FOR HIGH-STREET FASHION BRANDS LIKE NEW LOOK:
- Physical stores are a major source of passive brand exposure
- If consumers shop more online, they see fewer physical stores
- Online discovery is intent-based (search) vs. browsing (passive exposure)
- This means less "incidental" encounters with the brand
- Brands heavily reliant on high-street presence are vulnerable to online shift""",
        
        "interpretation_guidance": """
- High Salient + Low Meaningful: Brand is well known but weakly relevant
- Low Salient + High Meaningful: Brand resonates once considered but struggles to enter choice sets
- Sustained Salient improvements usually reflect: consistent brand presence, broad-reach communication, reinforcement of distinctive brand assets
- Salient is typically the strongest short-term lever of demand"""
    }
}

def get_metric_context(metric_name: str) -> str:
    """Get metric context including drivers of change."""
    metric = METRIC_DICTIONARY.get(metric_name.lower().replace(" ", "_"), {})
    if not metric:
        return ""
    
    interp = metric.get('interpretation', {})
    return f"""METRIC: {metric.get('name')}

DEFINITION: {metric.get('definition')}

INTERPRETATION:
- If increases: {interp.get('increase')}
- If decreases: {interp.get('decrease')}

{metric.get('drivers_of_change', '')}

{metric.get('interpretation_guidance', '')}"""

# ============================================================
# Source Tier Configuration
# Tier 1 = Premium authoritative sources (higher confidence)
# Tier 2 = Other credible sources (supporting evidence)
# ============================================================

# TIER 1 SOURCES - Premium, authoritative sources
TIER_1_SOURCES: List[str] = [
    # Financial News
    "bloomberg.com",
    "ft.com",           # Financial Times
    "wsj.com",          # Wall Street Journal
    
    # Advertising & Marketing Trade
    "adweek.com",
    "adage.com",        # Ad Age / Advertising Age
    "thedrum.com",      # The Drum
    "campaignlive.com", # Campaign
    "marketingweek.com",# Marketing Week
    
    # Market Research & Intelligence
    "kantar.com",       # Kantar Media
    "mckinsey.com",     # McKinsey ConsumerWise
    "mintel.com",       # Mintel
    "euromonitor.com",  # Euromonitor
]

# TIER 2 SOURCES - Other credible sources
TIER_2_SOURCES: List[str] = [
    # Business News
    "reuters.com",
    "cnbc.com",
    "businessinsider.com",
    "forbes.com",
    "economist.com",
    
    # Industry Publications
    "marketwatch.com",
    "seekingalpha.com",
    "brandchannel.com",
    "prnewswire.com",
    "statista.com",
    
    # Consulting
    "bain.com",
    "bcg.com",
    "deloitte.com",
    "pwc.com",
    "accenture.com",
]

# Combined for backward compatibility
TRUSTED_SOURCES: Dict[str, List[str]] = {
    "tier1": TIER_1_SOURCES,
    "tier2": TIER_2_SOURCES,
    "all": TIER_1_SOURCES + TIER_2_SOURCES,
}

def get_source_tier(url: str) -> int:
    """
    Determine the tier of a source based on its URL.
    Returns: 1 for Tier 1, 2 for Tier 2, 3 for unknown/other sources
    """
    url_lower = url.lower()
    for domain in TIER_1_SOURCES:
        if domain in url_lower:
            return 1
    for domain in TIER_2_SOURCES:
        if domain in url_lower:
            return 2
    return 3  # Unknown/other source

print("‚úì Data classes defined")
print(f"‚úì Competitor database loaded (New Look + {len(COMPETITOR_DATABASE.get('new look', []))} competitors)")
print(f"‚úì Metric dictionary loaded: Salient")
print(f"‚úì Source tiers configured: {len(TIER_1_SOURCES)} T1, {len(TIER_2_SOURCES)} T2")


‚úì Data classes defined
‚úì Competitor database loaded (New Look + 15 competitors)
‚úì Metric dictionary loaded: Salient
‚úì Source tiers configured: 12 T1, 15 T2


In [15]:
# Cell 3: Question Parsing - Extract Brand, Metrics, Direction

def parse_user_question(question: str) -> ParsedQuestion:
    """
    Use LLM to extract structured information from the user's question.
    
    Extracts:
    - Brand name
    - Metrics mentioned (awareness, consideration, purchase intent, etc.)
    - Direction of change (increase/decrease)
    - Time period if mentioned
    - Any additional context
    """
    
    system_prompt = """You are an expert at parsing brand research questions.

Extract the following from the user's question:
1. brand: The brand being discussed (lowercase)
2. metrics: List of metrics mentioned (e.g., "awareness", "consideration", "purchase intent", "brand perception", "market share", "NPS", etc.)
3. direction: Whether the metric increased ("increase"), decreased ("decrease"), or changed without specified direction ("change")
4. time_period: Any time period mentioned (e.g., "Q3 2025", "last 6 months", "YoY")
5. additional_context: Any other relevant context from the question

Return ONLY valid JSON with these fields. Use null for missing optional fields."""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"Parse this question: {question}"}
        ],
        temperature=0.0,
        response_format={"type": "json_object"}
    )
    
    try:
        parsed = json.loads(response.choices[0].message.content)
        return ParsedQuestion(
            original_question=question,
            brand=parsed.get("brand", "unknown"),
            metrics=parsed.get("metrics", []),
            direction=parsed.get("direction", "change"),
            time_period=parsed.get("time_period"),
            additional_context=parsed.get("additional_context")
        )
    except json.JSONDecodeError:
        print(f"Warning: Failed to parse LLM response, using defaults")
        return ParsedQuestion(
            original_question=question,
            brand="unknown",
            metrics=[],
            direction="change"
        )

def get_competitors(brand: str) -> List[str]:
    """
    Get list of competitors for a brand.
    Returns from database if available, otherwise returns empty list.
    """
    brand_lower = brand.lower().strip()
    
    # Direct match
    if brand_lower in COMPETITOR_DATABASE:
        return COMPETITOR_DATABASE[brand_lower]
    
    # Fuzzy match (check if brand is substring)
    for key, competitors in COMPETITOR_DATABASE.items():
        if brand_lower in key or key in brand_lower:
            return competitors
    
    return []  # No competitors found

print("‚úì Question parsing functions defined")


‚úì Question parsing functions defined


In [16]:
# Cell 4: Hypothesis-Driven Query Generation
# 
# Approach:
# 1. Generate hypotheses SEPARATELY for each category (market, brand, competitor)
# 2. Generate targeted search queries for each hypothesis
# 3. Ensure relevance to the original question

# ============================================================
# STEP 1: Generate Hypotheses by Category
# ============================================================

def generate_hypotheses(parsed: ParsedQuestion, competitors: List[str]) -> Dict[str, List[Dict]]:
    """
    Generate hypotheses separately for each category.
    Returns dict with keys: market, brand, competitive
    """
    
    brand = parsed.brand
    direction = parsed.direction
    time_period = parsed.time_period or "recent months"
    metric_context = get_metric_context("salient")
    
    all_hypotheses = {"market": [], "brand": [], "competitive": []}
    
    # 1. MARKET/MACRO hypotheses (industry-level, NOT brand-specific)
    # Use year for macro trends
    macro_time = "2025" if time_period and "202" in time_period else time_period
    
    # Direction-specific examples
    if direction == "decrease":
        direction_examples = """For DECREASE: factors reducing visibility/attention
- Online shift reducing high-street exposure
- Economic pressures reducing category spending
- Declining consumer attention to fashion"""
    else:
        direction_examples = """For INCREASE: factors boosting visibility/attention
- Category growth increasing brand exposure
- Consumer spending increases on fashion
- Renewed interest in high-street shopping"""
    
    market_prompt = f"""Generate SHORT hypotheses about UK fashion retail trends that could {direction.upper()} brand salience.

TIME: {macro_time}
DIRECTION: {direction.upper()}

‚ö†Ô∏è DIRECTION CONSTRAINT:
ONLY hypotheses that could cause salience to {direction.upper()}.
{direction_examples}
DO NOT include factors causing the OPPOSITE direction.

{metric_context}

COVERAGE AREAS:
- Consumer spending trends
- Online vs high-street shift
- Economic conditions
- Category attention

Return JSON:
{{"hypotheses": [
    {{
        "id": "M1",
        "hypothesis": "SHORT: max 15 words about factor causing {direction}",
        "queries": [
            "UK fashion [specific topic] {macro_time}",
            "UK retail [related topic] {macro_time}"
        ]
    }}
]}}

RULES:
- Hypotheses must be SHORT (max 15 words)
- ONLY factors causing {direction.upper()} (not opposite)
- 2 targeted queries per hypothesis
- Queries include "{macro_time}"

Generate 4-5 hypotheses."""

    market_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "system", "content": market_prompt}, 
                  {"role": "user", "content": f"Generate UK fashion retail industry hypotheses for {time_period}."}],
        temperature=0.4,
        response_format={"type": "json_object"}
    )
    
    try:
        all_hypotheses["market"] = json.loads(market_response.choices[0].message.content).get("hypotheses", [])
    except:
        pass
    
    # 2. BRAND hypotheses (what the brand did/didn't do)
    # Direction-specific examples for brand
    if direction == "decrease":
        brand_direction_examples = """For DECREASE:
- Reduced advertising spend
- Store closures
- Less media/PR activity
- Pulled campaigns"""
    else:
        brand_direction_examples = """For INCREASE:
- Increased advertising spend
- Store openings/expansion
- Major marketing campaigns
- High-profile partnerships"""
    
    brand_prompt = f"""Generate SHORT hypotheses about {brand}'s actions that could {direction.upper()} salience.

TIME: {time_period}
DIRECTION: {direction.upper()}

‚ö†Ô∏è DIRECTION CONSTRAINT:
ONLY hypotheses about {brand} actions causing salience to {direction.upper()}.
{brand_direction_examples}
DO NOT include actions causing the OPPOSITE direction.

{metric_context}

COVERAGE AREAS:
- Advertising spend changes
- Store activity (openings/closures)
- Marketing campaigns
- Media presence
- Partnerships/events

Return JSON:
{{"hypotheses": [
    {{
        "id": "B1",
        "hypothesis": "SHORT: max 15 words about {brand} action",
        "queries": [
            "{brand} [specific action] UK {time_period}",
            "{brand} [related topic] UK {time_period}"
        ]
    }}
]}}

RULES:
- Hypotheses must be SHORT (max 15 words)
- ONLY actions causing {direction.upper()} (not opposite)
- 2 targeted queries per hypothesis
- Queries include "{time_period}"

Generate 4-5 hypotheses."""

    brand_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "system", "content": brand_prompt},
                  {"role": "user", "content": f"Generate hypotheses about {brand}'s actions in {time_period}."}],
        temperature=0.4,
        response_format={"type": "json_object"}
    )
    
    try:
        all_hypotheses["brand"] = json.loads(brand_response.choices[0].message.content).get("hypotheses", [])
    except:
        pass
    
    # 3. COMPETITIVE hypotheses (what competitors are doing)
    # Direction-specific examples for competitors
    if direction == "decrease":
        comp_direction_examples = """For DECREASE (competitor OVERSHADOWS the brand):
- Competitor launched major ad campaign (steals share of voice)
- Competitor gained significant media coverage
- Competitor viral moment or celebrity partnership"""
    else:
        comp_direction_examples = """For INCREASE (competitor struggles help the brand):
- Competitor reduced advertising
- Competitor store closures
- Competitor negative press/PR issues"""
    
    comp_prompt = f"""Generate SHORT hypotheses about competitor actions affecting {brand}'s salience in the UK.

TIME: {time_period}
DIRECTION: {brand}'s salience {direction.upper()}
COMPETITORS: {', '.join(competitors[:8])}
GEOGRAPHY: UK (same market as {brand})

‚ö†Ô∏è DIRECTION CONSTRAINT:
ONLY hypotheses about competitor actions causing {brand}'s salience to {direction.upper()}.
{comp_direction_examples}
DO NOT include actions causing the OPPOSITE direction.

‚ö†Ô∏è GEOGRAPHIC CONSTRAINT:
- Only UK market activities (where {brand} operates)
- For M&S: ONLY their UK CLOTHING business, NOT food

{metric_context}

COVERAGE AREAS:
- Competitor advertising campaigns (UK)
- Competitor store activity (UK)
- Competitor media coverage (UK)
- Competitor partnerships/events (UK)

Return JSON:
{{"hypotheses": [
    {{
        "id": "C1",
        "hypothesis": "SHORT: max 15 words - [Competitor] UK action",
        "queries": [
            "[competitor] [action] UK {time_period}",
            "[competitor] UK fashion [topic] {time_period}"
        ]
    }}
]}}

RULES:
- Hypotheses must be SHORT (max 15 words)
- ONLY actions causing {direction.upper()} (not opposite)
- Name a SPECIFIC competitor
- 2 targeted UK-focused queries per hypothesis
- Queries include "UK" and "{time_period}"

Generate 4-5 hypotheses."""

    comp_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "system", "content": comp_prompt},
                  {"role": "user", "content": f"Generate competitor hypotheses for {time_period}."}],
        temperature=0.4,
        response_format={"type": "json_object"}
    )
    
    try:
        all_hypotheses["competitive"] = json.loads(comp_response.choices[0].message.content).get("hypotheses", [])
    except:
        pass
    
    return all_hypotheses


# ============================================================
# STEP 2: Extract Search Queries from Hypotheses
# ============================================================

def generate_search_queries(hypotheses: Dict[str, List[Dict]], parsed: ParsedQuestion, competitors: List[str]) -> Dict[str, List[str]]:
    """
    Extract search queries from the hypothesis objects.
    Each hypothesis now contains multiple queries (list).
    """
    
    queries = {"macro": [], "brand": [], "competitive": []}
    
    for h in hypotheses.get("market", []):
        # Handle both old format (search_query) and new format (queries list)
        if "queries" in h and isinstance(h["queries"], list):
            queries["macro"].extend(h["queries"])
        elif "search_query" in h:
            queries["macro"].append(h["search_query"])
    
    for h in hypotheses.get("brand", []):
        if "queries" in h and isinstance(h["queries"], list):
            queries["brand"].extend(h["queries"])
        elif "search_query" in h:
            queries["brand"].append(h["search_query"])
    
    for h in hypotheses.get("competitive", []):
        if "queries" in h and isinstance(h["queries"], list):
            queries["competitive"].extend(h["queries"])
        elif "search_query" in h:
            queries["competitive"].append(h["search_query"])
    
    return queries

print("‚úì Query generation function defined")


‚úì Query generation function defined


In [17]:
# Cell 5: Online Search Implementation using OpenAI's Built-in Web Search

def search_with_openai(query: str, category: str = "general") -> Dict[str, Any]:
    """
    Execute web search using OpenAI's built-in web_search tool.
    
    Uses the Chat Completions API with gpt-4o-search-preview model.
    Returns the response text along with URL citations.
    """
    
    trusted_domains = TRUSTED_SOURCES.get(category, [])
    
    try:
        # Use Chat Completions with search-enabled model
        response = client.chat.completions.create(
            model=SEARCH_MODEL,
            messages=[
                {
                    "role": "system",
                    "content": """You are a research assistant for UK FASHION & CLOTHING retail.

‚ö†Ô∏è CRITICAL TIME CONSTRAINT:
‚Ä¢ Search for the MOST RECENT news matching the time period in the query
‚Ä¢ If query mentions "Q3 2025" or "2025", find 2025 news ONLY
‚Ä¢ If no 2025 news exists, state "No news found for this period" - DO NOT return older news

‚ö†Ô∏è INDUSTRY CONSTRAINT:
‚Ä¢ ONLY fashion/clothing/apparel retail news
‚Ä¢ For M&S: ONLY clothing division, NOT food/grocery
‚Ä¢ EXCLUDE: supermarket, grocery, food retail

PREFERRED SOURCES: bloomberg.com, ft.com, wsj.com, marketingweek.com, thedrum.com

üìã RULES:
1. Prioritize news from the time period specified in the query
2. Include FULL URL for every fact
3. If no recent news found, say so - don't substitute old news"""
                },
                {
                    "role": "user", 
                    "content": f"Search for recent news: {query}"
                }
            ],
            web_search_options={
                "search_context_size": "high"  # Use high for better results
            }
        )
        
        # Extract the response
        message = response.choices[0].message
        content = message.content
        
        # Extract citations from annotations
        citations = []
        
        # Check for annotations (the structure varies by API version)
        if hasattr(message, 'annotations') and message.annotations:
            for annotation in message.annotations:
                try:
                    citation_data = {}
                    
                    # The annotation object might have different structures
                    # Try accessing as a dict-like object
                    if hasattr(annotation, 'model_dump'):
                        ann_dict = annotation.model_dump()
                    elif hasattr(annotation, '__dict__'):
                        ann_dict = annotation.__dict__
                    else:
                        ann_dict = {}
                    
                    # Extract URL and title from various possible locations
                    if 'url_citation' in ann_dict:
                        url_cit = ann_dict['url_citation']
                        citation_data["url"] = url_cit.get('url', '') if isinstance(url_cit, dict) else getattr(url_cit, 'url', '')
                        citation_data["title"] = url_cit.get('title', 'Source') if isinstance(url_cit, dict) else getattr(url_cit, 'title', 'Source')
                    elif 'url' in ann_dict:
                        citation_data["url"] = ann_dict['url']
                        citation_data["title"] = ann_dict.get('title', 'Source')
                    
                    # Get text indices
                    citation_data["start_index"] = ann_dict.get('start_index', 0)
                    citation_data["end_index"] = ann_dict.get('end_index', 0)
                    
                    if citation_data.get("url"):
                        citations.append(citation_data)
                        
                except Exception as ann_error:
                    # Silently continue on annotation parse errors
                    pass
        
        # Fallback: Extract URLs from text using regex if no/few citations found
        if content:
            import re
            url_pattern = r'https?://[^\s\)\]\>\"\'<]+' 
            found_urls = re.findall(url_pattern, content)
            existing_urls = [c.get("url", "") for c in citations]
            
            for url in found_urls[:15]:  # Limit to 15 URLs
                # Clean URL (remove trailing punctuation)
                url = url.rstrip('.,;:!?')
                
                # Skip social media sources
                if is_social_media(url):
                    continue
                    
                if url and url not in existing_urls:
                    # Try to extract a title from surrounding text
                    url_pos = content.find(url)
                    context_start = max(0, url_pos - 100)
                    context = content[context_start:url_pos]
                    
                    citations.append({
                        "url": url,
                        "title": "Source",
                        "start_index": url_pos,
                        "end_index": url_pos + len(url)
                    })
                    existing_urls.append(url)
        
        return {
            "text": content,
            "citations": citations,
            "category": category,
            "query": query
        }
        
    except Exception as e:
        print(f"  ‚ö†Ô∏è OpenAI web search error: {e}")
        return {
            "text": f"Search failed for: {query}. Error: {str(e)}",
            "citations": [],
            "category": category,
            "query": query,
            "error": str(e)
        }

# Excluded social media domains
EXCLUDED_DOMAINS = [
    "twitter.com", "x.com",
    "facebook.com", "fb.com",
    "instagram.com",
    "tiktok.com",
    "linkedin.com",
    "reddit.com",
    "pinterest.com",
    "youtube.com",  # User-generated content
    "tumblr.com",
]

def is_social_media(url: str) -> bool:
    """Check if a URL is from an excluded social media domain."""
    url_lower = url.lower()
    return any(domain in url_lower for domain in EXCLUDED_DOMAINS)

def extract_sources_from_response(response: Dict[str, Any]) -> List[SearchResult]:
    """
    Extract SearchResult objects from OpenAI web search response.
    Includes source tier classification and filters out social media sources.
    """
    results = []
    citations = response.get("citations", [])
    text = response.get("text", "")
    
    for citation in citations:
        url = citation.get("url", "")
        title = citation.get("title", "Unknown")
        
        # Skip social media sources
        if is_social_media(url):
            continue
        
        # Extract domain from URL
        try:
            source_domain = url.split("/")[2] if url else "unknown"
        except:
            source_domain = "unknown"
        
        # Determine source tier
        tier = get_source_tier(url)
        tier_label = f"[Tier {tier}]" if tier <= 2 else "[Other]"
        
        # Extract snippet from the text around the citation
        start = citation.get("start_index", 0)
        end = citation.get("end_index", len(text))
        snippet = text[max(0, start-100):min(len(text), end+100)]
        
        # Higher relevance score for Tier 1 sources
        relevance = 1.0 if tier == 1 else (0.8 if tier == 2 else 0.6)
        
        results.append(SearchResult(
            title=f"{tier_label} {title}",
            url=url,
            snippet=snippet[:300] + "..." if len(snippet) > 300 else snippet,
            source_name=source_domain,
            date=None,  # OpenAI doesn't always provide dates
            relevance_score=relevance
        ))
    
    # Sort by relevance (Tier 1 first)
    results.sort(key=lambda x: x.relevance_score, reverse=True)
    
    # If no citations but we have text, create a single result from the response
    if not results and text:
        results.append(SearchResult(
            title=f"[Other] Search result for: {response.get('query', 'query')[:50]}",
            url="",
            snippet=text[:500] + "..." if len(text) > 500 else text,
            source_name="openai_web_search",
            date=None,
            relevance_score=0.5
        ))
    
    return results

def execute_all_searches(queries: Dict[str, List[str]], results_per_query: int = 3) -> Dict[str, List[SearchResult]]:
    """
    Execute all search queries using OpenAI's web search IN PARALLEL for faster execution.
    """
    all_results = {
        "macro": [],
        "brand": [],
        "competitive": []
    }
    
    # Store raw responses for later use in summary generation
    all_raw_responses = {
        "macro": [],
        "brand": [],
        "competitive": []
    }
    
    # Flatten all queries with their categories for parallel execution
    all_queries = []
    for category, query_list in queries.items():
        for query in query_list:
            all_queries.append((category, query))
    
    print(f"\nüöÄ Executing {len(all_queries)} searches in PARALLEL...")
    start_time = time.time()
    
    # Execute all searches in parallel using ThreadPoolExecutor
    with ThreadPoolExecutor(max_workers=10) as executor:
        # Submit all search tasks
        future_to_query = {
            executor.submit(search_with_openai, query, category): (category, query)
            for category, query in all_queries
        }
        
        # Collect results as they complete
        for future in as_completed(future_to_query):
            category, query = future_to_query[future]
            try:
                response = future.result()
                all_raw_responses[category].append(response)
                
                # Extract structured results
                results = extract_sources_from_response(response)
                all_results[category].extend(results)
                
                # Report what was found
                citation_count = len(response.get("citations", []))
                tier1_count = sum(1 for r in results if "[Tier 1]" in r.title)
                status = "‚úì" if not response.get("error") else "‚ö†Ô∏è"
                tier_info = f"({tier1_count} T1)" if tier1_count > 0 else ""
                print(f"  {status} [{category.upper()}] {query[:50]}... ‚Üí {citation_count} sources {tier_info}")
                
            except Exception as e:
                print(f"  ‚ö†Ô∏è [{category.upper()}] {query[:50]}... ‚Üí Error: {e}")
    
    elapsed = time.time() - start_time
    print(f"\n‚è±Ô∏è All searches completed in {elapsed:.1f}s (parallel execution)")
    
    # Deduplicate by URL
    for category in all_results:
        seen_urls = set()
        unique_results = []
        for r in all_results[category]:
            if r.url not in seen_urls or r.url == "":
                if r.url:
                    seen_urls.add(r.url)
                unique_results.append(r)
        all_results[category] = unique_results
    
    # Store raw responses globally for use in summary
    global SEARCH_RAW_RESPONSES
    SEARCH_RAW_RESPONSES = all_raw_responses
    
    # Report tier statistics
    print("\nüìä Source Tier Summary:")
    total_tier1 = 0
    total_tier2 = 0
    total_other = 0
    for category, results in all_results.items():
        t1 = sum(1 for r in results if "[Tier 1]" in r.title)
        t2 = sum(1 for r in results if "[Tier 2]" in r.title)
        other = len(results) - t1 - t2
        total_tier1 += t1
        total_tier2 += t2
        total_other += other
        print(f"   {category.upper()}: {t1} Tier 1, {t2} Tier 2, {other} Other")
    print(f"   TOTAL: {total_tier1} Tier 1, {total_tier2} Tier 2, {total_other} Other")
    
    if total_tier1 == 0:
        print("\n‚ö†Ô∏è WARNING: No Tier 1 sources found. Consider refining search queries.")
    
    return all_results

# Global variable to store raw search responses
SEARCH_RAW_RESPONSES = {}

print("‚úì OpenAI Web Search functions defined")


‚úì OpenAI Web Search functions defined


In [18]:
# Cell 6: Parallel Hypothesis Processing Pipeline
# Each hypothesis: Search ‚Üí Validate ‚Üí Mini-Summary (in parallel)
# Then deterministically combine results

from concurrent.futures import ThreadPoolExecutor, as_completed

# ============================================================
# STEP 1: Process Single Hypothesis (Search ‚Üí Validate ‚Üí Summary)
# ============================================================

def process_single_hypothesis(hypothesis: Dict, category: str, time_period: str) -> Dict[str, Any]:
    """
    Process a single hypothesis through the full pipeline:
    1. Execute search queries for this hypothesis
    2. Validate the hypothesis against search results
    3. Generate mini-summary if validated
    
    Returns the result for this hypothesis.
    """
    
    hyp_text = hypothesis.get("hypothesis", "")
    queries = hypothesis.get("queries", [])
    
    if not queries:
        return {"status": "NO_QUERIES", "hypothesis": hyp_text, "category": category}
    
    # Step 1: Execute searches for this hypothesis
    search_results = []
    for query in queries[:2]:  # Max 2 queries per hypothesis
        try:
            result = search_with_openai(query, category)
            if result.get("text"):
                search_results.append({
                    "query": query,
                    "text": result.get("text", ""),
                    "citations": result.get("citations", [])
                })
        except Exception as e:
            pass
    
    if not search_results:
        return {"status": "NO_RESULTS", "hypothesis": hyp_text, "category": category}
    
    # Step 2: Validate this hypothesis against its search results
    search_context = "\n\n".join([f"Query: {r['query']}\n{r['text']}" for r in search_results])
    
    validation_prompt = f"""Validate this hypothesis against the search results.

HYPOTHESIS: {hyp_text}

SEARCH RESULTS:
{search_context}

Return JSON:
{{
    "status": "VALIDATED" or "NOT_VALIDATED",
    "evidence": "SHORT factual summary (max 20 words) with key numbers/dates if available",
    "source_url": "URL of the source (if validated)"
}}

RULES:
- VALIDATED = Search results contain DIRECT evidence supporting this hypothesis
- NOT_VALIDATED = No clear evidence found
- Evidence must be SHORT: max 20 words, just the key fact with numbers
- Good: "Online sales up 8.3%, in-store up only 0.8% in Q3 2025"
- Bad: Long explanations about consumer behavior shifts..."""

    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": validation_prompt},
                {"role": "user", "content": "Validate this hypothesis."}
            ],
            temperature=0.1,
            response_format={"type": "json_object"}
        )
        
        validation = json.loads(response.choices[0].message.content)
        
        return {
            "status": validation.get("status", "NOT_VALIDATED"),
            "hypothesis": hyp_text,
            "category": category,
            "evidence": validation.get("evidence", ""),
            "source_url": validation.get("source_url", ""),
            "search_results": search_results
        }
    except Exception as e:
        return {"status": "ERROR", "hypothesis": hyp_text, "category": category, "error": str(e)}


# ============================================================
# STEP 2: Process All Hypotheses in Parallel
# ============================================================

def process_hypotheses_parallel(hypotheses: Dict[str, List[Dict]], time_period: str) -> Dict[str, List[Dict]]:
    """
    Process all hypotheses in parallel.
    Each hypothesis goes through: Search ‚Üí Validate ‚Üí Result
    """
    
    results = {"market": [], "brand": [], "competitive": []}
    
    # Flatten all hypotheses with their categories
    all_tasks = []
    for cat in ["market", "brand", "competitive"]:
        for hyp in hypotheses.get(cat, []):
            all_tasks.append((hyp, cat))
    
    print(f"   Processing {len(all_tasks)} hypotheses in parallel...")
    
    # Process in parallel
    with ThreadPoolExecutor(max_workers=8) as executor:
        future_to_hyp = {
            executor.submit(process_single_hypothesis, hyp, cat, time_period): (hyp, cat)
            for hyp, cat in all_tasks
        }
        
        for future in as_completed(future_to_hyp):
            hyp, cat = future_to_hyp[future]
            try:
                result = future.result()
                results[cat].append(result)
                status = "‚úÖ" if result.get("status") == "VALIDATED" else "‚ùå"
                print(f"      {status} {result.get('hypothesis', '')[:50]}...")
            except Exception as e:
                print(f"      ‚ö†Ô∏è Error: {e}")
    
    return results


# ============================================================
# STEP 3: Combine Results into Final Summary (Deterministic)
# ============================================================

def combine_results_to_summary(processed_results: Dict[str, List[Dict]]) -> Dict[str, Any]:
    """
    Deterministically combine all processed hypothesis results into final summary.
    Only includes validated hypotheses with their evidence.
    """
    
    summary = {
        "macro_drivers": [],
        "brand_drivers": [],
        "competitive_drivers": []
    }
    
    validated = {"market": [], "brand": [], "competitive": []}
    
    # Map categories
    category_map = {
        "market": "macro_drivers",
        "brand": "brand_drivers",
        "competitive": "competitive_drivers"
    }
    
    for cat, output_key in category_map.items():
        for result in processed_results.get(cat, []):
            if result.get("status") == "VALIDATED":
                # Add to validated list
                validated[cat].append({
                    "hypothesis": result.get("hypothesis", ""),
                    "evidence": result.get("evidence", ""),
                    "source_url": result.get("source_url", "")
                })
                
                # Add to summary
                summary[output_key].append({
                    "driver": result.get("evidence", result.get("hypothesis", "")),
                    "hypothesis": result.get("hypothesis", ""),
                    "source_urls": [result.get("source_url", "")] if result.get("source_url") else [],
                    "confidence": "medium"
                })
    
    return summary, validated


# Legacy functions kept for compatibility
def validate_hypotheses(hypotheses, search_results):
    """Legacy - now handled by process_hypotheses_parallel"""
    return {"market": [], "brand": [], "competitive": []}

def generate_summary_from_validated(validated, parsed):
    """Legacy - now handled by combine_results_to_summary"""
    return {"macro_drivers": [], "brand_drivers": [], "competitive_drivers": []}


# ============================================================
# LEGACY: Full Summary Generation (kept for reference)
# ============================================================

def generate_driver_summary(
    parsed: ParsedQuestion,
    search_results: Dict[str, List[SearchResult]],
    competitors: List[str]
) -> Dict[str, Any]:
    """
    Generate a comprehensive summary of potential drivers with source citations.
    
    Uses the raw OpenAI web search responses for richer context,
    along with extracted citations for source tracking.
    
    Returns a structured summary with:
    - Key insights by category
    - Confidence levels
    - Source citations for each claim
    """
    
    # Build context from raw search responses (contains full text with inline citations)
    context_parts = []
    source_index = {}  # Map source ID to full details
    source_counter = 1
    
    # Use raw responses if available (contains richer information)
    global SEARCH_RAW_RESPONSES
    
    for category in ["macro", "brand", "competitive"]:
        context_parts.append(f"\n{'='*60}")
        context_parts.append(f"=== {category.upper()} RESEARCH ===")
        context_parts.append(f"{'='*60}")
        
        # Add raw response text (contains OpenAI's search synthesis with citations)
        if category in SEARCH_RAW_RESPONSES:
            for response in SEARCH_RAW_RESPONSES[category]:
                query = response.get("query", "")
                text = response.get("text", "")
                citations = response.get("citations", [])
                
                context_parts.append(f"\n--- Query: {query} ---")
                context_parts.append(text)
                
                # Index the citations
                for citation in citations:
                    source_id = f"[{source_counter}]"
                    source_index[source_id] = {
                        "title": citation.get("title", "Unknown"),
                        "url": citation.get("url", ""),
                        "source": citation.get("url", "").split("/")[2] if citation.get("url") else "unknown",
                        "date": None,
                        "category": category
                    }
                    source_counter += 1
        
        # Also add structured results for additional context
        if category in search_results:
            for r in search_results[category]:
                if r.url and r.url not in [s.get("url") for s in source_index.values()]:
                    source_id = f"[{source_counter}]"
                    source_index[source_id] = {
                        "title": r.title,
                        "url": r.url,
                        "source": r.source_name,
                        "date": r.date,
                        "category": category
                    }
                    source_counter += 1
    
    context = "\n".join(context_parts)
    
    # Get direction for relevance filtering
    direction = parsed.direction
    direction_verb = f"{direction}d" if direction != 'change' else 'changed'
    
    system_prompt = f"""You are a market research analyst compiling factual news for UK FASHION & CLOTHING retail.

‚ö†Ô∏è CRITICAL - RELEVANCE TO QUESTION:
The user is asking about WHY Salient (mental availability) {direction_verb}.
ONLY include news items that could PLAUSIBLY EXPLAIN this {direction}.

For DECREASED Salient, relevant news includes:
‚úÖ Reduced advertising spend, pulled campaigns
‚úÖ Store closures, reduced physical presence
‚úÖ Negative PR, brand scandals, controversies
‚úÖ Brand going quiet, less media activity
‚úÖ Competitor aggressive campaigns (stealing share of mind)
‚úÖ Industry decline reducing category attention

For DECREASED Salient, EXCLUDE:
‚ùå Positive improvements like "enhanced omnichannel strategy" (doesn't explain decrease)
‚ùå Vague strategy statements without concrete visibility impact
‚ùå News that doesn't affect brand visibility/awareness

ASK YOURSELF: "Would this news REDUCE how often the brand comes to mind?"
If NO ‚Üí Don't include it.

‚ö†Ô∏è INDUSTRY FILTER:
‚Ä¢ ONLY fashion/clothing retail news
‚Ä¢ For M&S: ONLY clothing division, EXCLUDE food/grocery

üö´ EXCLUSIONS:
1. NO inferences - just facts
2. NO social media posts
3. NO food/grocery/supermarket news
4. NO vague "strategy" news without concrete impact

SOURCE TIERS & CONFIDENCE:
TIER 1: Bloomberg, FT, WSJ, Marketing Week, The Drum, Campaign, Kantar, McKinsey, Mintel
TIER 2: Reuters, Forbes, Business Insider, trade publications
HIGH = Tier 1 + corroboration | MEDIUM = Single Tier 1 | LOW = No Tier 1

Return JSON:
{{{{
    "macro_drivers": [
        {{{{
            "driver": "Factual statement with specific details (numbers, dates)",
            "source_urls": ["https://..."],
            "confidence": "high/medium/low"
        }}}}
    ],
    "brand_drivers": [
        {{{{
            "driver": "Factual statement with specific details",
            "source_urls": ["https://..."],
            "confidence": "high/medium/low"
        }}}}
    ],
    "competitive_drivers": [
        {{{{
            "driver": "Factual statement with specific details",
            "source_urls": ["https://..."],
            "confidence": "high/medium/low"
        }}}}
    ]
}}}}

‚ö†Ô∏è OUTPUT RULES:
1. JUST STATE THE FACTS - NO interpretations, NO "why it matters", NO impact analysis
2. Include specific details when available: numbers, percentages, dates
3. Good: "Next reported 10.5% rise in full-price sales in Q3 2025, driven by online strategy"
4. Bad: "Next's strong performance could steal share of mind" (this is interpretation)
5. Empty arrays preferred over irrelevant/unverified news
6. Every item MUST have source URL"""

    user_prompt = f"""RESEARCH QUESTION:
{parsed.original_question}

BRAND: {parsed.brand}
METRICS: {', '.join(parsed.metrics)}
DIRECTION: {parsed.direction}
TIME PERIOD: {parsed.time_period or 'recent'}
COMPETITORS: {', '.join(competitors[:5]) if competitors else 'unknown'}

SEARCH RESULTS:
{context}

Analyze these search results and provide a structured summary of potential drivers.
Remember: EVERY insight must cite specific sources using [X] notation."""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0.2,
        response_format={"type": "json_object"}
    )
    
    try:
        summary = json.loads(response.choices[0].message.content)
        summary["source_index"] = source_index
        return summary
    except json.JSONDecodeError:
        return {
            "executive_summary": "Error generating summary",
            "macro_drivers": [],
            "brand_drivers": [],
            "competitive_drivers": [],
            "data_gaps": ["Failed to parse LLM response"],
            "source_index": source_index
        }

print("‚úì Summary generation function defined")


‚úì Summary generation function defined


In [19]:
# Cell 7: Display Functions - Simplified

STYLES = """
<style>
.qa-header { text-align: center; margin-bottom: 20px; }
.qa-header h1 { font-size: 22px; color: #1e293b; margin: 0; }
.subtitle { color: #64748b; font-size: 13px; }
.question-card { background: #f1f5f9; padding: 16px; border-radius: 8px; margin-bottom: 16px; border-left: 4px solid #3b82f6; }
.parsed-tags { display: flex; flex-wrap: wrap; gap: 6px; margin-top: 8px; }
.tag { padding: 4px 10px; border-radius: 12px; font-size: 12px; }

.tag.brand { background: #dbeafe; color: #1e40af; }
.tag.metric { background: #dcfce7; color: #166534; }
.tag.direction-down { background: #fee2e2; color: #991b1b; }
.tag.time { background: #f3e8ff; color: #6b21a8; }
</style>
"""

def display_parsed_question(parsed: ParsedQuestion, competitors: List[str]):
    """Display the parsed question with extracted components."""
    
    direction_class = "direction-up" if parsed.direction == "increase" else "direction-down" if parsed.direction == "decrease" else "time"
    direction_icon = "üìà" if parsed.direction == "increase" else "üìâ" if parsed.direction == "decrease" else "üìä"
    
    metrics_tags = "".join([f'<span class="tag metric">üìä {m}</span>' for m in parsed.metrics])
    competitors_text = f"<p style='margin-top: 12px; color: #64748b; font-size: 13px;'>üè¢ <strong>Competitors:</strong> {', '.join(competitors[:5])}</p>" if competitors else ""
    
    html = f"""
    <div class="question-card">
        <div class="question-text">‚ùì {parsed.original_question}</div>
        <div class="parsed-tags">
            <span class="tag brand">üè∑Ô∏è {parsed.brand.title()}</span>
            {metrics_tags}
            <span class="tag {direction_class}">{direction_icon} {parsed.direction.title()}</span>
            {f'<span class="tag time">üìÖ {parsed.time_period}</span>' if parsed.time_period else ''}
        </div>
        {competitors_text}
    </div>
    """
    display(HTML(html))

def display_summary(summary: Dict[str, Any]):
    """Display simple bullet point summary."""
    
    def render_bullets(drivers: List[Dict]) -> str:
        if not drivers:
            return "<li style='color:#94a3b8'>No news found</li>"
        
        items = []
        for d in drivers:
            source_urls = d.get('source_urls', [])
            # Get first source link
            link = ""
            if source_urls and source_urls[0]:
                url = source_urls[0]
                try:
                    domain = url.split('/')[2]
                except:
                    domain = "source"
                link = f' <a href="{url}" target="_blank" style="color:#3b82f6;font-size:12px">[{domain}]</a>'
            
            items.append(f"<li><strong>{d.get('driver', '')}</strong>{link}</li>")
        
        return "".join(items)
    
    macro = render_bullets(summary.get('macro_drivers', []))
    brand = render_bullets(summary.get('brand_drivers', []))
    competitive = render_bullets(summary.get('competitive_drivers', []))
    
    html = f"""
    <div style="font-family: -apple-system, sans-serif; line-height: 1.6;">
        <h3 style="color:#1e40af; border-bottom: 2px solid #3b82f6; padding-bottom: 8px;">üåç Market News</h3>
        <ul style="margin: 12px 0 24px 0;">{macro}</ul>
        
        <h3 style="color:#059669; border-bottom: 2px solid #10b981; padding-bottom: 8px;">üè∑Ô∏è Brand News (New Look)</h3>
        <ul style="margin: 12px 0 24px 0;">{brand}</ul>
        
        <h3 style="color:#d97706; border-bottom: 2px solid #f59e0b; padding-bottom: 8px;">‚öîÔ∏è Competitor News</h3>
        <ul style="margin: 12px 0 24px 0;">{competitive}</ul>
    </div>
    """
    display(HTML(html))

print("‚úì Display functions defined")


‚úì Display functions defined


In [20]:
# Cell 8: Main QA Engine Pipeline

class QAEngine:
    """
    QA Engine for analyzing brand metric drivers.
    
    Pipeline:
    1. Parse user question
    2. Identify competitors
    3. Generate hypotheses (like a human researcher)
    4. Generate search queries to test hypotheses
    5. Execute searches
    6. Generate findings summary
    """
    
    def __init__(self, client: OpenAI):
        self.client = client
        
    def analyze(self, question: str, progress_callback=None) -> Dict[str, Any]:
        """
        Run the full analysis pipeline.
        
        Args:
            question: User's question
            progress_callback: Optional function(step, data) to report progress
        """
        
        def report(step, data=None):
            if progress_callback:
                progress_callback(step, data)
        
        # Step 1: Parse question
        report("parsing", None)
        parsed = parse_user_question(question)
        report("parsed", {"brand": parsed.brand, "direction": parsed.direction, "time": parsed.time_period})
        
        # Step 2: Get competitors
        competitors = get_competitors(parsed.brand)
        
        # Step 3: Generate hypotheses (by category)
        report("hypotheses_start", None)
        hypotheses = generate_hypotheses(parsed, competitors)
        report("hypotheses_done", hypotheses)
        
        # Step 4: Generate search queries (for display purposes)
        queries = generate_search_queries(hypotheses, parsed, competitors)
        report("queries_done", queries)
        
        # Step 5: Process all hypotheses in PARALLEL
        # Each hypothesis: Search ‚Üí Validate ‚Üí Result
        report("processing", None)
        time_period = parsed.time_period or "2025"
        processed_results = process_hypotheses_parallel(hypotheses, time_period)
        report("processed", processed_results)
        
        # Step 6: Deterministically combine results into final summary
        report("summarizing", None)
        summary, validated = combine_results_to_summary(processed_results)
        report("done", None)
        
        return {
            "parsed_question": parsed,
            "competitors": competitors,
            "hypotheses": hypotheses,
            "queries": queries,
            "processed_results": processed_results,
            "validated_hypotheses": validated,
            "summary": summary
        }

# Initialize the engine
qa_engine = QAEngine(client)
print("‚úì QA Engine initialized and ready")


‚úì QA Engine initialized and ready


In [21]:
# Cell 9: Skip - use interactive chat below
# Example questions you can ask:
# - "New Look's Salient score fell by 6 points in Q3 2025. What news might explain this?"
# - "Why might H&M's mental availability have increased?"
# - "Primark's Salient is declining - what's happening in UK fashion retail?"

print("‚¨áÔ∏è Skip to the interactive chat interface below")


‚¨áÔ∏è Skip to the interactive chat interface below


In [22]:
# Cell 10: Skip - use interactive chat below
# The chat interface will run the analysis when you submit a question

print("‚¨áÔ∏è Use the chat interface in the next cell")


‚¨áÔ∏è Use the chat interface in the next cell


In [23]:
# Cell 11: Interactive Chat Interface

import ipywidgets as widgets
from IPython.display import display, HTML, clear_output

# Create the chat interface
output_area = widgets.Output()
question_input = widgets.Textarea(
    placeholder='Ask about brand metric changes... e.g., "New Look\'s Salient score fell by 6 points in Q3 2025. What news might explain this?"',
    layout=widgets.Layout(width='100%', height='80px')
)
submit_btn = widgets.Button(
    description='üîç Search',
    button_style='primary',
    layout=widgets.Layout(width='120px')
)

def format_chat_response(results):
    """Format results as chat-style HTML with collapsible thinking section."""
    hypotheses = results.get("hypotheses", {})  # Dict with market, brand, competitive
    processed = results.get("processed_results", {})  # Processing results per hypothesis
    summary = results.get("summary", {})
    
    # Build processing results lookup
    processed_lookup = {}
    for cat, results_list in processed.items():
        for r in results_list:
            key = r.get("hypothesis", "").lower().strip()
            processed_lookup[key] = r
    
    # Build thinking section - show hypotheses with validation status
    hyp_items = ""
    cat_labels = {"market": "üåç Market", "brand": "üè∑Ô∏è Brand", "competitive": "‚öîÔ∏è Competitive"}
    for cat, hyps in hypotheses.items():
        if hyps:
            hyp_items += f"<div style='margin-top:10px; font-weight:600; color:#475569; border-bottom:1px solid #e2e8f0; padding-bottom:4px;'>{cat_labels.get(cat, cat)}</div>"
            for h in hyps:
                hyp_text = h.get('hypothesis', '')
                result = processed_lookup.get(hyp_text.lower().strip(), {})
                status = result.get("status", "NOT_VALIDATED")
                is_validated = status == "VALIDATED"
                status_icon = "‚úÖ" if is_validated else "‚ùå"
                status_color = "#10b981" if is_validated else "#94a3b8"
                hyp_items += f"<div style='margin:6px 0 2px 8px; color:{status_color};'>{status_icon} {hyp_text}</div>"
    
    thinking_html = f"""
    <details style="margin:8px 0;">
        <summary style="cursor:pointer; color:#64748b; font-size:12px; padding:8px 0;">
            üí≠ Thought for a few seconds...
        </summary>
        <div style="background:#f1f5f9; border-radius:8px; padding:12px; margin-top:8px; max-height:300px; overflow-y:auto; font-size:12px; color:#64748b; line-height:1.5;">
            <div style="font-weight:600; margin-bottom:4px;">Hypotheses (‚úÖ = validated, ‚ùå = no evidence):</div>
            {hyp_items}
        </div>
    </details>
    """
    
    # Build findings sections - hypothesis-driven format
    def format_bullets(drivers):
        if not drivers:
            return "<div style='color:#94a3b8; font-style:italic;'>No validated findings</div>"
        items = ""
        for d in drivers[:5]:
            hypothesis = d.get('hypothesis', '')
            evidence = d.get('driver', '')
            url = d.get('source_urls', [''])[0] if d.get('source_urls') else ''
            link = f' <a href="{url}" target="_blank" style="color:#3b82f6;font-size:10px">[source]</a>' if url else ''
            
            # Bold hypothesis, evidence below (no truncation)
            items += f"""<div style='margin:10px 0;'>
                <div style='font-weight:600; color:#1e293b;'>‚Ä¢ {hypothesis}</div>
                <div style='margin-left:12px; font-size:12px; color:#64748b;'>‚Üí {evidence}{link}</div>
            </div>"""
        return items
    
    macro = format_bullets(summary.get('macro_drivers', []))
    brand = format_bullets(summary.get('brand_drivers', []))
    competitive = format_bullets(summary.get('competitive_drivers', []))
    
    return f"""
    <div style="background:#f8fafc; border-radius:12px; padding:16px; margin:8px 0; font-family:-apple-system,sans-serif;">
        {thinking_html}
        
        <div style="border-top:1px solid #e2e8f0; padding-top:12px; margin-top:8px;">
            <div style="color:#1e40af; font-weight:600; margin-bottom:8px;">üåç Market News</div>
            <div style="font-size:13px; margin-bottom:16px;">{macro}</div>
            
            <div style="color:#059669; font-weight:600; margin-bottom:8px;">üè∑Ô∏è Brand News</div>
            <div style="font-size:13px; margin-bottom:16px;">{brand}</div>
            
            <div style="color:#d97706; font-weight:600; margin-bottom:8px;">‚öîÔ∏è Competitor News</div>
            <div style="font-size:13px;">{competitive}</div>
        </div>
    </div>
    """

# Progress tracking state
progress_state = {"hypotheses": {}, "queries": {}, "processed": {}, "step": ""}

def update_progress_display(question):
    """Update the display with current progress."""
    step = progress_state["step"]
    hyps = progress_state["hypotheses"]
    queries = progress_state["queries"]
    processed = progress_state.get("processed", {})
    
    # Build progress HTML
    steps_html = ""
    step_order = ["parsing", "hypotheses", "queries", "processing", "summarizing"]
    step_labels = {
        "parsing": "üìù Parsing question...",
        "hypotheses": "üí° Generating hypotheses...",
        "queries": "üîé Creating search queries...",
        "processing": "üîÑ Processing hypotheses (search ‚Üí validate) in parallel...",
        "summarizing": "üìä Combining results..."
    }
    
    for s in step_order:
        if step == s:
            steps_html += f"<div style='color:#3b82f6; font-size:13px;'>‚è≥ {step_labels[s]}</div>"
        elif step_order.index(s) < step_order.index(step) if step in step_order else False:
            steps_html += f"<div style='color:#10b981; font-size:12px;'>‚úì {step_labels[s].split('...')[0]}</div>"
    
    # Build hypotheses preview with their queries (if available)
    hyp_preview = ""
    if hyps:
        hyp_preview = "<div style='margin-top:8px; padding:8px; background:#f1f5f9; border-radius:6px; font-size:11px; color:#64748b;'>"
        hyp_preview += "<div style='font-weight:600; margin-bottom:4px;'>Hypotheses & Queries:</div>"
        for cat, h_list in hyps.items():
            if h_list:
                hyp_preview += f"<div style='margin-top:6px; font-weight:500;'>{cat}:</div>"
                for h in h_list[:2]:  # Show first 2 hypotheses per category
                    hyp_preview += f"<div style='margin-left:8px;'>‚Ä¢ {h.get('hypothesis', '')[:40]}...</div>"
                    h_queries = h.get('queries', []) or []
                    for q in h_queries[:1]:  # Show first query
                        hyp_preview += f"<div style='margin-left:16px; color:#94a3b8;'>‚Üí {q[:50]}...</div>"
        hyp_preview += "</div>"
    
    # No separate query preview needed
    query_preview = ""
    
    with output_area:
        clear_output(wait=True)
        display(HTML(f"""
        <div style="background:#3b82f6; color:white; border-radius:12px; padding:12px 16px; margin:8px 0; font-family:-apple-system,sans-serif;">
            <strong>You:</strong> {question}
        </div>
        <div style='padding:12px; font-family:-apple-system,sans-serif;'>
            {steps_html}
            {hyp_preview}
            {query_preview}
        </div>
        """))

def on_submit(b):
    question = question_input.value.strip()
    if not question:
        return
    
    # Reset progress state
    progress_state["hypotheses"] = {}
    progress_state["queries"] = {}
    progress_state["step"] = "parsing"
    
    def progress_callback(step, data):
        """Handle progress updates from the engine."""
        if step == "hypotheses_start":
            progress_state["step"] = "hypotheses"
        elif step == "hypotheses_done":
            progress_state["hypotheses"] = data or {}
            progress_state["step"] = "queries"
        elif step == "queries_done":
            progress_state["queries"] = data or {}
            progress_state["step"] = "processing"
        elif step == "processing":
            progress_state["step"] = "processing"
        elif step == "processed":
            progress_state["processed"] = data or {}
            progress_state["step"] = "summarizing"
        elif step == "summarizing":
            progress_state["step"] = "summarizing"
        
        # Update display
        update_progress_display(question)
    
    update_progress_display(question)
    
    try:
        # Run analysis with progress callback
        results = qa_engine.analyze(question, progress_callback=progress_callback)
        
        with output_area:
            clear_output(wait=True)
            
            # Show question
            display(HTML(f"""
            <div style="background:#3b82f6; color:white; border-radius:12px; padding:12px 16px; margin:8px 0; font-family:-apple-system,sans-serif;">
                <strong>You:</strong> {question}
            </div>
            """))
            
            # Show response
            display(HTML(format_chat_response(results)))
            
    except Exception as e:
        with output_area:
            clear_output(wait=True)
            display(HTML(f"""
            <div style="background:#3b82f6; color:white; border-radius:12px; padding:12px 16px; margin:8px 0;">
                <strong>You:</strong> {question}
            </div>
            <div style='color:#dc2626; padding:12px; background:#fee2e2; border-radius:8px; margin-top:8px;'>
                ‚ùå Error: {str(e)}
            </div>
            """))

submit_btn.on_click(on_submit)

# Display the chat interface
display(HTML("""
<div style="font-family:-apple-system,sans-serif; margin-bottom:16px;">
    <h2 style="margin:0;">üî¨ QA Engine</h2>
    <p style="color:#64748b; margin:4px 0 0 0; font-size:13px;">Ask about brand metric changes (Salient, mental availability)</p>
</div>
"""))
display(question_input)
display(submit_btn)
display(output_area)


Textarea(value='', layout=Layout(height='80px', width='100%'), placeholder='Ask about brand metric changes... ‚Ä¶

Button(button_style='primary', description='üîç Search', layout=Layout(width='120px'), style=ButtonStyle())

Output()