# Semantic Dissonance in RAG Systems

Semantic dissonance occurs when the relationships between a query and its retrieved knowledge are misaligned, producing unreliable or irrelevant results. This often happens in Retrieval-Augmented Generation (RAG) systems when embeddings fail to capture the true semantic intent of a query, resulting in comparisons that appear random or noisy.

This problem is particularly common when the task involves highly specific domains or when embeddings rely on general-purpose language models without domain-specific fine-tuning. Below, we’ll explore an example illustrating semantic dissonance and outline strategies to mitigate it.

---

## Use Case: RAG for SQL Table Retrieval

Imagine building a RAG system to help internal teams identify the most relevant SQL tables for specific business questions. In this example, we explore how different retrieval strategies can affect performance.

### Example Setup

The setup involves two distinct SQL table schemas and a series of hypothetical questions. The goal is to determine how well the RAG system retrieves the most relevant table based on the input query.

#### SQL Table Schemas:
1. **`sales.purchases`**: Contains highly detailed, raw user event data within product flows.
2. **`analytics.purchases`**: Summarized analytics with aggregated purchase data.

#### Hypothetical Questions:
1. What is the impact of IP address on the types of products viewed and purchased?
2. What is the overall trend in fourniture sales this quarter?
3. Is there unusual behavior within a few seconds of each hour?
4. How does user engagement change around major events like New Year’s?

#### Metadata for Tables:
- Brief descriptions of each table.
- Example questions that each table is uniquely qualified to answer.

---

## Exploring Noisy Cosine Similarity

To highlight semantic dissonance, we compared the queries against randomly generated embeddings ("garbage"). The noisy cosine similarity scores revealed that the system had no consistent ability to rank the most relevant tables. This inconsistency demonstrates how raw embeddings alone can fail to establish meaningful connections between queries and knowledge.

---

## Comparing Retrieval Strategies

To better understand how to reduce semantic dissonance, we evaluated four retrieval strategies, combining different levels of context and metadata:

### Strategy 1: **Table Schema Only**
- Uses just the raw schema definitions for comparisons.
- Performance: Minimal semantic alignment. Queries often fail to map to the intended tables.

### Strategy 2: **Table Schema + Brief Description**
- Augments schema definitions with a concise summary of the table's purpose.
- Performance: Slight improvement. Context from descriptions helps guide matches, but results remain inconsistent.

### Strategy 3: **Table Schema + Brief Description + Sample Questions**
- Adds example questions that each table is uniquely qualified to answer.
- Performance: Significant improvement. Sample questions create a bridge between the intent of the query and the table’s purpose.

### Strategy 4: **Sample Questions Only**
- Compares queries exclusively against the sample questions.
- Performance: Highly effective. Matching directly against example questions provides the most reliable semantic alignment.

In [28]:
import re
import numpy as np
import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

# Load the embedding model (gte-large from Hugging Face)
model = SentenceTransformer("Alibaba-NLP/gte-large-en-v1.5", trust_remote_code=True)

# Utility functions for embedding and cosine similarity
def get_embedding(text):
    """Generate an embedding for the input text."""
    return model.encode([text])[0]

def cosine_sim(v1, v2):
    """Calculate the cosine similarity between two vectors."""
    return cosine_similarity([v1], [v2])[0][0]

# Define table schemas, descriptions, and sample questions for the furniture industry
base_table_text_1 = """
CREATE TABLE customers.raw_customer_data (
    customer_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    phone_number VARCHAR(20),
    address TEXT,
    customer_segment VARCHAR(50),
    activity_data JSONB,
    preferences JSONB,
    history JSONB
);
"""

# Product sales summary schema
base_table_text_2 = """
CREATE TABLE analytics.product_sales_summary (
    category_id INT PRIMARY KEY,
    category_name VARCHAR(50),
    metrics JSONB,
    performance_data JSONB,
    market_data JSONB,
    trends JSONB
);
"""

# Enhanced descriptions with strong business context
desc1 = """Comprehensive customer intelligence platform that serves as the foundation for our personalized retail experience strategy. This data warehouse captures the complete customer lifecycle, from initial browsing patterns to long-term loyalty behavior, enabling sophisticated customer understanding and engagement optimization.

Strategic Applications:
- Identify customers entering new life stages (moving, marriage, etc.) for targeted campaigns
- Predict luxury segment expansion opportunities through browsing pattern analysis
- Track customer journey evolution from first purchase to brand advocate
- Monitor cross-channel engagement patterns for omnichannel optimization
- Analyze style preference shifts for merchandising insights
- Detect early churn signals through engagement pattern changes
- Profile high-value customer segments for VIP program expansion

Key Business Impact:
- Powers our 1:1 personalization engine for targeted recommendations
- Drives proactive customer retention through early warning systems
- Enables dynamic customer segmentation for marketing campaigns
- Supports customer lifetime value optimization strategies
- Guides service level customization based on customer profiles"""

desc2 = """Enterprise-wide market intelligence system integrating product performance analytics with market dynamics for strategic business planning. This platform combines historical performance data, competitive intelligence, and market trends to drive data-informed merchandising and inventory decisions.

Strategic Capabilities:
- Forecast market demand shifts based on leading indicators
- Analyze cross-category purchase patterns for merchandising optimization
- Track product lifecycle stages across different market segments
- Monitor market share evolution in key geographic regions
- Optimize assortment planning based on local preferences
- Evaluate promotion effectiveness across customer segments
- Plan inventory allocation based on regional dynamics

Business Applications:
- Guides seasonal collection planning and refresh cycles
- Drives market expansion and penetration strategies
- Informs pricing optimization across product categories
- Supports efficient inventory distribution networks
- Enables data-driven merchandising decisions
- Powers competitive positioning strategies
- Optimizes promotion planning and execution"""

# Sample questions enriched with business context
sq1 = """
- Which customers are showing early indicators of transitioning to luxury furniture segments?
- How can we identify customers likely to renovate their entire home based on recent browsing patterns?
- Which first-time buyers show potential for becoming lifetime customers?
- What customer segments are most responsive to our designer collaboration collections?
- Which loyal customers are at risk of switching to competitors based on engagement patterns?
- How do we identify customers ready for our premium design consultation service?
- Which customers' browsing patterns indicate upcoming major purchases?
- How can we predict which customers will respond best to our seasonal collection previews?
- What behavioral patterns indicate a customer's potential for VIP program enrollment?
- Which customers should receive priority access to our limited edition collections?
"""

sq2 = """
- How do weather patterns impact seasonal furniture preferences across regions?
- What market signals indicate emerging style trends in urban vs. suburban areas?
- Which product categories show complementary purchase patterns in premium segments?
- How does market saturation affect premium furniture pricing by region?
- What seasonal factors influence outdoor furniture performance in different climates?
- How do macroeconomic indicators affect luxury furniture segment performance?
- Which product combinations drive highest customer lifetime value?
- How do regional design preferences impact collection performance?
- What market conditions signal optimal timing for new collection launches?
- How does competitive pricing affect our premium line performance?
"""

# Questions requiring deep contextual understanding
qas = [
    ("customers.raw_customer_data", "Which customers show early signs of upgrading their entire home furnishing style?"),  
    ("analytics.product_sales_summary", "How should we adjust our collection launch timing for different climate zones?"),  
    ("customers.raw_customer_data", "Which customers are most likely to become brand advocates for our artisan collection?"),  
    ("analytics.product_sales_summary", "What product mix will maximize market share in emerging urban markets?"),  
    ("customers.raw_customer_data", "Which customers should receive priority access to our limited edition designer collaboration?"),  
    ("analytics.product_sales_summary", "How should we optimize our showroom layouts for the upcoming season across different regions?") 
]

In [29]:
class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'


class RAGMetrics:
    def __init__(self):
        self.correct = 0
        self.total = 0
        self.total_separation = 0.0
        self.high_confidence_correct = 0  # Correct predictions with high separation
        self.low_confidence_mistakes = 0  # Wrong predictions with low separation
        self.separations = []  # Store all separations for statistical analysis

    @property
    def accuracy(self):
        return (self.correct / self.total * 100) if self.total > 0 else 0

    @property
    def avg_separation(self):
        return self.total_separation / self.total if self.total > 0 else 0

    @property
    def high_confidence_accuracy(self):
        high_conf_total = len([s for s in self.separations if abs(s) >= 0.1])
        return (self.high_confidence_correct / high_conf_total * 100) if high_conf_total > 0 else 0


def get_confidence_color(separation: float) -> str:
    HIGH_CONFIDENCE = 0.1
    MEDIUM_CONFIDENCE = 0.05
    if abs(separation) >= HIGH_CONFIDENCE:
        return bcolors.OKGREEN
    elif abs(separation) >= MEDIUM_CONFIDENCE:
        return bcolors.WARNING
    return bcolors.FAIL


def print_confidence_indicator(separation: float) -> str:
    HIGH_CONFIDENCE = 0.1
    MEDIUM_CONFIDENCE = 0.05
    if abs(separation) >= HIGH_CONFIDENCE:
        return "HIGH CONFIDENCE"
    elif abs(separation) >= MEDIUM_CONFIDENCE:
        return "MEDIUM CONFIDENCE"
    return "LOW CONFIDENCE"

In [30]:
# Garbage inputs for comparison
garbage_inputs = [
    "E = mc^2",
    "Cristiano Ronaldo is the best football player in the world",
    "Fernando Alonso won the Formula 1 championship twice",
    "Unrelated text about quantum physics",
]

print(f"\n{bcolors.HEADER}### Step 1: Analyzing similarities between garbage inputs and questions/table schemas ###{bcolors.ENDC}\n")

for garbage in garbage_inputs:
    print(f"{bcolors.OKBLUE}Garbage Input: {garbage}{bcolors.ENDC}")
    emb_garbage = get_embedding(garbage)

    # Compare garbage input with questions
    print(f"\n{bcolors.BOLD}Comparing garbage input with questions:{bcolors.ENDC}")
    max_question_similarity = -1
    most_similar_question = None

    for i, (_, question) in enumerate(qas):
        emb_question = get_embedding(question)
        similarity = cosine_sim(emb_garbage, emb_question)
        if similarity > max_question_similarity:
            max_question_similarity = similarity
            most_similar_question = question
        similarity_color = bcolors.OKGREEN if similarity > 0.5 else bcolors.WARNING if similarity > 0.3 else bcolors.FAIL
        print(f"{similarity_color}  Q{i+1}: '{question}'\n    Similarity: {similarity:.3f}{bcolors.ENDC}")

    if most_similar_question:
        print(f"\n{bcolors.OKGREEN}Most similar question: '{most_similar_question}' "
              f"with similarity {max_question_similarity:.3f}{bcolors.ENDC}")

    # Compare garbage input with table schemas
    print(f"\n{bcolors.BOLD}Comparing garbage input with table schemas:{bcolors.ENDC}")
    max_table_similarity = -1
    most_similar_table = None

    for i, (table_name, table_text) in enumerate([
        ("Table 1", base_table_text_1),
        ("Table 2", base_table_text_2),
    ]):
        emb_table = get_embedding(table_text)
        similarity = cosine_sim(emb_garbage, emb_table)
        if similarity > max_table_similarity:
            max_table_similarity = similarity
            most_similar_table = table_name
        similarity_color = bcolors.OKGREEN if similarity > 0.5 else bcolors.WARNING if similarity > 0.3 else bcolors.FAIL
        print(f"{similarity_color}  {table_name}: Schema Similarity: {similarity:.3f}{bcolors.ENDC}")

    if most_similar_table:
        print(f"\n{bcolors.OKGREEN}Most similar table schema: {most_similar_table} "
              f"with similarity {max_table_similarity:.3f}{bcolors.ENDC}")

    print(f"\n{bcolors.BOLD}--- End of analysis for this garbage input ---{bcolors.ENDC}\n")



[95m### Step 1: Analyzing similarities between garbage inputs and questions/table schemas ###[0m

[94mGarbage Input: E = mc^2[0m

[1mComparing garbage input with questions:[0m
[93m  Q1: 'Which customers show early signs of upgrading their entire home furnishing style?'
    Similarity: 0.366[0m
[93m  Q2: 'How should we adjust our collection launch timing for different climate zones?'
    Similarity: 0.382[0m
[93m  Q3: 'Which customers are most likely to become brand advocates for our artisan collection?'
    Similarity: 0.327[0m
[93m  Q4: 'What product mix will maximize market share in emerging urban markets?'
    Similarity: 0.341[0m
[93m  Q5: 'Which customers should receive priority access to our limited edition designer collaboration?'
    Similarity: 0.320[0m
[91m  Q6: 'How should we optimize our showroom layouts for the upcoming season across different regions?'
    Similarity: 0.297[0m

[92mMost similar question: 'How should we adjust our collection launch timi

In [31]:
# Strategy 1: Table schema without enrichment
def run_strategy_1(base_table_text_1, base_table_text_2, qas, rankings_by_question):
    method = "table-schema-no-enrichment"
    metrics = RAGMetrics()  # Initialize metrics for this strategy

    # Handle the case where rankings_by_question is None
    if rankings_by_question is None:
        rankings_by_question = {}

    tables = [
        ("customers.raw_customer_data", base_table_text_1),
        ("analytics.product_sales_summary", base_table_text_2),
    ]

    # Generate embeddings for the tables
    emb_table1 = get_embedding(base_table_text_1)
    emb_table2 = get_embedding(base_table_text_2)

    print(f"\n{bcolors.HEADER}### Strategy 1: Table Schema Without Enrichment ###{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}Cosine similarity between Table 1 and Table 2 schemas: "
          f"{round(cosine_sim(emb_table1, emb_table2), 3)}{bcolors.ENDC}\n")

    HIGH_CONFIDENCE = 0.1
    MEDIUM_CONFIDENCE = 0.05

    for (correct_table, question) in qas:
        print(f"{bcolors.OKBLUE}Processing Question: '{question}'{bcolors.ENDC}")
        print(f"{bcolors.OKGREEN}Correct Table: {correct_table}{bcolors.ENDC}\n")

        question_emb = get_embedding(question)
        table_rankings = []

        for (table_name, table_schema) in tables:
            table_emb = get_embedding(table_schema)
            similarity = round(cosine_sim(question_emb, table_emb), 3)
            table_rankings.append((similarity, table_name))
            print(f"{bcolors.OKBLUE}  Cosine Similarity with {table_name}: {similarity:.3f}{bcolors.ENDC}")

        table_rankings = sorted(table_rankings, key=lambda x: x[0], reverse=True)

        # Calculate separation and update metrics
        separation = table_rankings[0][0] - table_rankings[1][0] if len(table_rankings) > 1 else 0
        metrics.total += 1
        metrics.total_separation += separation
        metrics.separations.append(separation)

        # Update metrics based on correctness and confidence
        correct_prediction = table_rankings[0][1] == correct_table
        if correct_prediction:
            metrics.correct += 1
            if abs(separation) >= HIGH_CONFIDENCE:
                metrics.high_confidence_correct += 1
        elif abs(separation) < MEDIUM_CONFIDENCE:
            metrics.low_confidence_mistakes += 1

        # Print rankings and confidence indicators
        for rank, (similarity, table_name) in enumerate(table_rankings, start=1):
            is_correct = table_name == correct_table
            confidence_color = get_confidence_color(separation)
            text = f"  Rank {rank}: {table_name} (Similarity: {similarity:.3f})"
            print(f"{bcolors.OKGREEN if is_correct else bcolors.FAIL}{text}{' ✓' if is_correct else ' ✗'}{bcolors.ENDC}")

        # Print separation with confidence level
        confidence_color = get_confidence_color(separation)
        confidence_text = print_confidence_indicator(separation)
        print(f"{confidence_color}  Separation: {separation:+.3f} - {confidence_text}{bcolors.ENDC}\n")

        # Store results in rankings_by_question
        if question not in rankings_by_question:
            rankings_by_question[question] = {}
        rankings_by_question[question][method] = table_rankings

    # Print strategy-level summary metrics
    print(f"{bcolors.HEADER}### Summary for {method} ###{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Accuracy: {metrics.accuracy:.1f}%{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Average Separation: {metrics.avg_separation:.3f}{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- High Confidence Accuracy: {metrics.high_confidence_accuracy:.1f}%{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Low Confidence Mistakes: {metrics.low_confidence_mistakes}{bcolors.ENDC}\n")

    return rankings_by_question


# Example usage:
# Ensure rankings_by_question is carried forward or passed appropriately
rankings_by_question = run_strategy_1(base_table_text_1, base_table_text_2, qas, rankings_by_question)



[95m### Strategy 1: Table Schema Without Enrichment ###[0m
[94mCosine similarity between Table 1 and Table 2 schemas: 0.8510000109672546[0m

[94mProcessing Question: 'Which customers show early signs of upgrading their entire home furnishing style?'[0m
[92mCorrect Table: customers.raw_customer_data[0m

[94m  Cosine Similarity with customers.raw_customer_data: 0.466[0m
[94m  Cosine Similarity with analytics.product_sales_summary: 0.456[0m
[92m  Rank 1: customers.raw_customer_data (Similarity: 0.466) ✓[0m
[91m  Rank 2: analytics.product_sales_summary (Similarity: 0.456) ✗[0m
[91m  Separation: +0.010 - LOW CONFIDENCE[0m

[94mProcessing Question: 'How should we adjust our collection launch timing for different climate zones?'[0m
[92mCorrect Table: analytics.product_sales_summary[0m

[94m  Cosine Similarity with customers.raw_customer_data: 0.420[0m
[94m  Cosine Similarity with analytics.product_sales_summary: 0.406[0m
[91m  Rank 1: customers.raw_customer_data (S

In [32]:
# Strategy 2: Table schema with enrichment
def run_strategy_2(base_table_text_1, base_table_text_2, desc1, desc2, qas, rankings_by_question):
    method = "table-schema-desc"
    metrics = RAGMetrics()  # Initialize metrics for this strategy

    # Handle the case where rankings_by_question is None
    if rankings_by_question is None:
        rankings_by_question = {}

    table1 = f"""
    Description: {desc1}

    {base_table_text_1}
    """
    table2 = f"""
    Description: {desc2}

    {base_table_text_2}
    """
    tables = [
        ("customers.raw_customer_data", table1),
        ("analytics.product_sales_summary", table2),
    ]

    # Generate embeddings for the tables with descriptions
    emb_table1 = get_embedding(table1)
    emb_table2 = get_embedding(table2)

    print(f"\n{bcolors.HEADER}### Strategy 2: Table Schema With Enrichment ###{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}Cosine similarity between Table 1 and Table 2 schemas (with descriptions): "
          f"{round(cosine_sim(emb_table1, emb_table2), 3)}{bcolors.ENDC}\n")

    HIGH_CONFIDENCE = 0.1
    MEDIUM_CONFIDENCE = 0.05

    for (correct_table, question) in qas:
        print(f"{bcolors.OKBLUE}Processing Question: '{question}'{bcolors.ENDC}")
        print(f"{bcolors.OKGREEN}Correct Table: {correct_table}{bcolors.ENDC}\n")

        question_emb = get_embedding(question)
        table_rankings = []

        for (table_name, enriched_table) in tables:
            table_emb = get_embedding(enriched_table)
            similarity = round(cosine_sim(question_emb, table_emb), 3)
            table_rankings.append((similarity, table_name))
            print(f"{bcolors.OKBLUE}  Cosine Similarity with {table_name}: {similarity:.3f}{bcolors.ENDC}")

        table_rankings = sorted(table_rankings, key=lambda x: x[0], reverse=True)

        # Calculate separation and update metrics
        separation = table_rankings[0][0] - table_rankings[1][0] if len(table_rankings) > 1 else 0
        metrics.total += 1
        metrics.total_separation += separation
        metrics.separations.append(separation)

        correct_prediction = table_rankings[0][1] == correct_table
        if correct_prediction:
            metrics.correct += 1
            if abs(separation) >= HIGH_CONFIDENCE:
                metrics.high_confidence_correct += 1
        elif abs(separation) < MEDIUM_CONFIDENCE:
            metrics.low_confidence_mistakes += 1

        # Print rankings with confidence indicators
        for rank, (similarity, table_name) in enumerate(table_rankings, start=1):
            is_correct = table_name == correct_table
            confidence_color = get_confidence_color(separation)
            text = f"  Rank {rank}: {table_name} (Similarity: {similarity:.3f})"
            print(f"{bcolors.OKGREEN if is_correct else bcolors.FAIL}{text}{' ✓' if is_correct else ' ✗'}{bcolors.ENDC}")

        # Print separation with confidence level
        confidence_color = get_confidence_color(separation)
        confidence_text = print_confidence_indicator(separation)
        print(f"{confidence_color}  Separation: {separation:+.3f} - {confidence_text}{bcolors.ENDC}\n")

        # Store results in rankings_by_question
        if question not in rankings_by_question:
            rankings_by_question[question] = {}
        rankings_by_question[question][method] = table_rankings

    # Print strategy-level summary metrics
    print(f"{bcolors.HEADER}### Summary for {method} ###{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Accuracy: {metrics.accuracy:.1f}%{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Average Separation: {metrics.avg_separation:.3f}{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- High Confidence Accuracy: {metrics.high_confidence_accuracy:.1f}%{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Low Confidence Mistakes: {metrics.low_confidence_mistakes}{bcolors.ENDC}\n")

    return rankings_by_question


# Example usage:
# Ensure rankings_by_question is carried forward or passed appropriately
rankings_by_question = run_strategy_2(
    base_table_text_1, base_table_text_2, desc1, desc2, qas, rankings_by_question
)



[95m### Strategy 2: Table Schema With Enrichment ###[0m
[94mCosine similarity between Table 1 and Table 2 schemas (with descriptions): 0.7710000276565552[0m

[94mProcessing Question: 'Which customers show early signs of upgrading their entire home furnishing style?'[0m
[92mCorrect Table: customers.raw_customer_data[0m

[94m  Cosine Similarity with customers.raw_customer_data: 0.560[0m
[94m  Cosine Similarity with analytics.product_sales_summary: 0.510[0m
[92m  Rank 1: customers.raw_customer_data (Similarity: 0.560) ✓[0m
[91m  Rank 2: analytics.product_sales_summary (Similarity: 0.510) ✗[0m
[93m  Separation: +0.050 - MEDIUM CONFIDENCE[0m

[94mProcessing Question: 'How should we adjust our collection launch timing for different climate zones?'[0m
[92mCorrect Table: analytics.product_sales_summary[0m

[94m  Cosine Similarity with customers.raw_customer_data: 0.439[0m
[94m  Cosine Similarity with analytics.product_sales_summary: 0.483[0m
[92m  Rank 1: analytics.

In [33]:
def run_strategy_3(sq1, sq2, qas, rankings_by_question):
    method = "sample-questions-only"
    metrics = RAGMetrics()  # Initialize metrics for this strategy

    # Handle the case where rankings_by_question is None
    if rankings_by_question is None:
        rankings_by_question = {}

    sample_questions = [
        ("customers.raw_customer_data", sq1),
        ("analytics.product_sales_summary", sq2),
    ]

    # Generate embeddings for sample questions
    emb_sq1 = get_embedding(sq1)
    emb_sq2 = get_embedding(sq2)

    print(f"\n{bcolors.HEADER}### Strategy 3: Sample Questions Only ###{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}Cosine similarity between sample questions for Table 1 and Table 2: "
          f"{round(cosine_sim(emb_sq1, emb_sq2), 3)}{bcolors.ENDC}\n")

    HIGH_CONFIDENCE = 0.1
    MEDIUM_CONFIDENCE = 0.05

    for (correct_table, question) in qas:
        print(f"{bcolors.OKBLUE}Processing Question: '{question}'{bcolors.ENDC}")
        print(f"{bcolors.OKGREEN}Correct Table: {correct_table}{bcolors.ENDC}\n")

        question_emb = get_embedding(question)
        table_rankings = []

        for (table_name, sample_qs) in sample_questions:
            sample_qs_emb = get_embedding(sample_qs)
            similarity = round(cosine_sim(question_emb, sample_qs_emb), 3)
            table_rankings.append((similarity, table_name))
            print(f"{bcolors.OKBLUE}  Cosine Similarity with {table_name} (sample questions): {similarity:.3f}{bcolors.ENDC}")

        table_rankings = sorted(table_rankings, key=lambda x: x[0], reverse=True)

        # Calculate separation and update metrics
        separation = table_rankings[0][0] - table_rankings[1][0] if len(table_rankings) > 1 else 0
        metrics.total += 1
        metrics.total_separation += separation
        metrics.separations.append(separation)

        correct_prediction = table_rankings[0][1] == correct_table
        if correct_prediction:
            metrics.correct += 1
            if abs(separation) >= HIGH_CONFIDENCE:
                metrics.high_confidence_correct += 1
        elif abs(separation) < MEDIUM_CONFIDENCE:
            metrics.low_confidence_mistakes += 1

        # Print rankings with confidence indicators
        for rank, (similarity, table_name) in enumerate(table_rankings, start=1):
            is_correct = table_name == correct_table
            confidence_color = get_confidence_color(separation)
            text = f"  Rank {rank}: {table_name} (Similarity: {similarity:.3f})"
            print(f"{bcolors.OKGREEN if is_correct else bcolors.FAIL}{text}{' ✓' if is_correct else ' ✗'}{bcolors.ENDC}")

        # Print separation with confidence level
        confidence_color = get_confidence_color(separation)
        confidence_text = print_confidence_indicator(separation)
        print(f"{confidence_color}  Separation: {separation:+.3f} - {confidence_text}{bcolors.ENDC}\n")

        # Store results in rankings_by_question
        if question not in rankings_by_question:
            rankings_by_question[question] = {}
        rankings_by_question[question][method] = table_rankings

    # Print strategy-level summary metrics
    print(f"{bcolors.HEADER}### Summary for {method} ###{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Accuracy: {metrics.accuracy:.1f}%{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Average Separation: {metrics.avg_separation:.3f}{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- High Confidence Accuracy: {metrics.high_confidence_accuracy:.1f}%{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Low Confidence Mistakes: {metrics.low_confidence_mistakes}{bcolors.ENDC}\n")

    return rankings_by_question

# Example usage:
rankings_by_question = run_strategy_3(sq1, sq2, qas, rankings_by_question)



[95m### Strategy 3: Sample Questions Only ###[0m
[94mCosine similarity between sample questions for Table 1 and Table 2: 0.7670000195503235[0m

[94mProcessing Question: 'Which customers show early signs of upgrading their entire home furnishing style?'[0m
[92mCorrect Table: customers.raw_customer_data[0m

[94m  Cosine Similarity with customers.raw_customer_data (sample questions): 0.737[0m
[94m  Cosine Similarity with analytics.product_sales_summary (sample questions): 0.640[0m
[92m  Rank 1: customers.raw_customer_data (Similarity: 0.737) ✓[0m
[91m  Rank 2: analytics.product_sales_summary (Similarity: 0.640) ✗[0m
[93m  Separation: +0.097 - MEDIUM CONFIDENCE[0m

[94mProcessing Question: 'How should we adjust our collection launch timing for different climate zones?'[0m
[92mCorrect Table: analytics.product_sales_summary[0m

[94m  Cosine Similarity with customers.raw_customer_data (sample questions): 0.486[0m
[94m  Cosine Similarity with analytics.product_sales_s

In [34]:
# Strategy 4: Table Schema, Short Description, and Sample Questions
def run_strategy_4(base_table_text_1, base_table_text_2, desc1, desc2, sq1, sq2, qas, rankings_by_question):
    method = "table-schema-desc-questions"
    metrics = RAGMetrics()  # Initialize metrics for this strategy

    # Handle the case where rankings_by_question is None
    if rankings_by_question is None:
        rankings_by_question = {}

    table1 = f"""
    Description: {desc1}

    Sample Questions:
    {sq1}

    {base_table_text_1}
    """

    table2 = f"""
    Description: {desc2}

    Sample Questions:
    {sq2}

    {base_table_text_2}
    """

    tables = [
        ("customers.raw_customer_data", table1),
        ("analytics.product_sales_summary", table2),
    ]

    # Generate embeddings for enriched table schemas
    emb_table1 = get_embedding(table1)
    emb_table2 = get_embedding(table2)

    print(f"\n{bcolors.HEADER}### Strategy 4: Table Schema, Short Description, and Sample Questions ###{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}Cosine similarity between enriched Table 1 and Table 2: "
          f"{round(cosine_sim(emb_table1, emb_table2), 3)}{bcolors.ENDC}\n")

    HIGH_CONFIDENCE = 0.1
    MEDIUM_CONFIDENCE = 0.05

    for (correct_table, question) in qas:
        print(f"{bcolors.OKBLUE}Processing Question: '{question}'{bcolors.ENDC}")
        print(f"{bcolors.OKGREEN}Correct Table: {correct_table}{bcolors.ENDC}\n")

        question_emb = get_embedding(question)
        table_rankings = []

        for (table_name, enriched_table) in tables:
            enriched_table_emb = get_embedding(enriched_table)
            similarity = round(cosine_sim(question_emb, enriched_table_emb), 3)
            table_rankings.append((similarity, table_name))
            print(f"{bcolors.OKBLUE}  Cosine Similarity with {table_name} (enriched schema): {similarity:.3f}{bcolors.ENDC}")

        table_rankings = sorted(table_rankings, key=lambda x: x[0], reverse=True)

        # Calculate separation and update metrics
        separation = table_rankings[0][0] - table_rankings[1][0] if len(table_rankings) > 1 else 0
        metrics.total += 1
        metrics.total_separation += separation
        metrics.separations.append(separation)

        correct_prediction = table_rankings[0][1] == correct_table
        if correct_prediction:
            metrics.correct += 1
            if abs(separation) >= HIGH_CONFIDENCE:
                metrics.high_confidence_correct += 1
        elif abs(separation) < MEDIUM_CONFIDENCE:
            metrics.low_confidence_mistakes += 1

        # Print rankings with confidence indicators
        for rank, (similarity, table_name) in enumerate(table_rankings, start=1):
            is_correct = table_name == correct_table
            confidence_color = get_confidence_color(separation)
            text = f"  Rank {rank}: {table_name} (Similarity: {similarity:.3f})"
            print(f"{bcolors.OKGREEN if is_correct else bcolors.FAIL}{text}{' ✓' if is_correct else ' ✗'}{bcolors.ENDC}")

        # Print separation with confidence level
        confidence_color = get_confidence_color(separation)
        confidence_text = print_confidence_indicator(separation)
        print(f"{confidence_color}  Separation: {separation:+.3f} - {confidence_text}{bcolors.ENDC}\n")

        # Store results in rankings_by_question
        if question not in rankings_by_question:
            rankings_by_question[question] = {}
        rankings_by_question[question][method] = table_rankings

    # Print strategy-level summary metrics
    print(f"{bcolors.HEADER}### Summary for {method} ###{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Accuracy: {metrics.accuracy:.1f}%{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Average Separation: {metrics.avg_separation:.3f}{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- High Confidence Accuracy: {metrics.high_confidence_accuracy:.1f}%{bcolors.ENDC}")
    print(f"{bcolors.OKBLUE}- Low Confidence Mistakes: {metrics.low_confidence_mistakes}{bcolors.ENDC}\n")

    return rankings_by_question


# Example usage:
# Initialize the required inputs
rankings_by_question = run_strategy_4(
    base_table_text_1, base_table_text_2, desc1, desc2, sq1, sq2, qas, rankings_by_question
)



[95m### Strategy 4: Table Schema, Short Description, and Sample Questions ###[0m
[94mCosine similarity between enriched Table 1 and Table 2: 0.7990000247955322[0m

[94mProcessing Question: 'Which customers show early signs of upgrading their entire home furnishing style?'[0m
[92mCorrect Table: customers.raw_customer_data[0m

[94m  Cosine Similarity with customers.raw_customer_data (enriched schema): 0.636[0m
[94m  Cosine Similarity with analytics.product_sales_summary (enriched schema): 0.567[0m
[92m  Rank 1: customers.raw_customer_data (Similarity: 0.636) ✓[0m
[91m  Rank 2: analytics.product_sales_summary (Similarity: 0.567) ✗[0m
[93m  Separation: +0.069 - MEDIUM CONFIDENCE[0m

[94mProcessing Question: 'How should we adjust our collection launch timing for different climate zones?'[0m
[92mCorrect Table: analytics.product_sales_summary[0m

[94m  Cosine Similarity with customers.raw_customer_data (enriched schema): 0.449[0m
[94m  Cosine Similarity with analytic

In [35]:
def print_results_comparison(rankings_by_question, qas):
    method_order = [
        "table-schema-no-enrichment",
        "table-schema-desc",
        "table-schema-desc-questions",
        "sample-questions-only",
    ]

    HIGH_CONFIDENCE = 0.1
    MEDIUM_CONFIDENCE = 0.05

    metrics = {method: RAGMetrics() for method in method_order}

    def print_comparison(question, correct_table, comps):
        print(f"\n{'=' * 100}")
        print(f"{bcolors.HEADER}{bcolors.BOLD}Question Analysis{bcolors.ENDC}")
        print(f"{bcolors.OKBLUE}Query: {question}{bcolors.ENDC}")
        print(f"{bcolors.OKGREEN}Expected Table: {correct_table}{bcolors.ENDC}")
        print(f"{'-' * 100}\n")

        best_confidence = -float('inf')
        best_method = None

        for method in method_order:
            if method not in comps:
                continue

            print(f"{bcolors.HEADER}Strategy: {method}{bcolors.ENDC}")
            ranking = comps[method]

            # Update metrics
            metrics[method].total += 1
            separation = ranking[0][0] - ranking[1][0] if len(ranking) > 1 else 0
            metrics[method].separations.append(separation)
            metrics[method].total_separation += separation

            correct_prediction = ranking[0][1] == correct_table
            if correct_prediction:
                metrics[method].correct += 1
                if abs(separation) >= HIGH_CONFIDENCE:
                    metrics[method].high_confidence_correct += 1
            elif abs(separation) < MEDIUM_CONFIDENCE:
                metrics[method].low_confidence_mistakes += 1

            # Print rankings with confidence indicators
            for rank, (similarity, table_name) in enumerate(ranking, start=1):
                is_correct = table_name == correct_table
                confidence_color = get_confidence_color(separation)
                text = f"  Rank {rank}: {table_name} (Similarity: {similarity:.3f})"
                print(f"{bcolors.OKGREEN if is_correct else bcolors.FAIL}{text}{' ✓' if is_correct else ' ✗'}{bcolors.ENDC}")

            # Print separation with confidence level
            confidence_color = get_confidence_color(separation)
            confidence_text = print_confidence_indicator(separation)
            print(f"{confidence_color}  Separation: {separation:+.3f} - {confidence_text}{bcolors.ENDC}\n")

            if abs(separation) > abs(best_confidence) and correct_prediction:
                best_confidence = separation
                best_method = method

        print(f"{bcolors.HEADER}Question Summary{bcolors.ENDC}")
        if best_method:
            print(f"{bcolors.OKGREEN}Best performing method: {best_method} (separation: {best_confidence:.3f}){bcolors.ENDC}")
        else:
            print(f"{bcolors.FAIL}No method predicted correctly with confidence{bcolors.ENDC}")

    # Generate comparison for all questions
    print(f"{bcolors.HEADER}\n=== Detailed Strategy Analysis ===\n{bcolors.ENDC}")
    for correct_table, question in qas:
        comps = {method: rankings_by_question[question][method]
                 for method in method_order
                 if method in rankings_by_question.get(question, {})}
        print_comparison(question, correct_table, comps)

    # Print enhanced summary statistics
    print(f"{bcolors.HEADER}\n=== Strategy Performance Analysis ==={bcolors.ENDC}")
    print("\nMethod                                  Accuracy  Avg Sep  High Conf  Low Conf Mistakes")
    print("-" * 90)

    for method in method_order:
        m = metrics[method]
        color = (bcolors.OKGREEN if m.accuracy >= 75 and m.avg_separation >= HIGH_CONFIDENCE
                 else bcolors.WARNING if m.accuracy >= 50 and m.avg_separation >= MEDIUM_CONFIDENCE
                 else bcolors.FAIL)

        print(f"{color}{method:40s} {m.accuracy:8.1f}% {m.avg_separation:8.3f} {m.high_confidence_accuracy:9.1f}% {m.low_confidence_mistakes:>16d}{bcolors.ENDC}")

    # Find best performing method considering both accuracy and confidence
    best_method = max(method_order,
                      key=lambda m: (metrics[m].accuracy * 0.4 +
                                     metrics[m].high_confidence_accuracy * 0.4 +
                                     metrics[m].avg_separation * 100 * 0.2))

    print(f"\n{bcolors.BOLD}Best Overall Method: {bcolors.OKGREEN}{best_method}{bcolors.ENDC}")
    print(f"{bcolors.BOLD}Performance Metrics:{bcolors.ENDC}")
    print(f"- Accuracy: {metrics[best_method].accuracy:.1f}%")
    print(f"- High Confidence Accuracy: {metrics[best_method].high_confidence_accuracy:.1f}%")
    print(f"- Average Separation: {metrics[best_method].avg_separation:.3f}")
    print(f"- Low Confidence Mistakes: {metrics[best_method].low_confidence_mistakes}")


        
print_results_comparison(rankings_by_question, qas)  

[95m
=== Detailed Strategy Analysis ===
[0m

[95m[1mQuestion Analysis[0m
[94mQuery: Which customers show early signs of upgrading their entire home furnishing style?[0m
[92mExpected Table: customers.raw_customer_data[0m
----------------------------------------------------------------------------------------------------

[95mStrategy: table-schema-no-enrichment[0m
[92m  Rank 1: customers.raw_customer_data (Similarity: 0.466) ✓[0m
[91m  Rank 2: analytics.product_sales_summary (Similarity: 0.456) ✗[0m
[91m  Separation: +0.010 - LOW CONFIDENCE[0m

[95mStrategy: table-schema-desc[0m
[92m  Rank 1: customers.raw_customer_data (Similarity: 0.560) ✓[0m
[91m  Rank 2: analytics.product_sales_summary (Similarity: 0.510) ✗[0m
[93m  Separation: +0.050 - MEDIUM CONFIDENCE[0m

[95mStrategy: table-schema-desc-questions[0m
[92m  Rank 1: customers.raw_customer_data (Similarity: 0.636) ✓[0m
[91m  Rank 2: analytics.product_sales_summary (Similarity: 0.567) ✗[0m
[93m  Separat

# Reflection Questions on Semantic Dissonance and RAG Systems

### Understanding the Problem
1. What is semantic dissonance, and how does it affect the reliability of a RAG system?
2. Why do general-purpose embedding models struggle with domain-specific tasks?
3. In what ways can noisy cosine similarity scores mislead a RAG system’s performance?

### Diagnosing Issues
4. How can you determine if your RAG system is experiencing semantic dissonance?
5. What are the limitations of using raw schema definitions for retrieval tasks?
6. How might incomplete or poorly structured metadata contribute to irrelevant results?

### Exploring Solutions
7. How does enriching table schemas with descriptions or sample questions improve retrieval accuracy?
8. What are the trade-offs between using sample questions alone versus combining schemas and metadata?
9. How could domain-specific fine-tuning of embedding models reduce semantic dissonance?

### Evaluating Strategies
10. Which of the four retrieval strategies presented (schema only, schema + description, schema + description + sample questions, sample questions only) do you think is most applicable to your domain, and why?
11. What additional metadata could be included to enhance the context for your RAG system?
12. How can you balance the complexity of adding metadata with the potential improvement in system performance?

### Broader Considerations
13. In what ways might the structure and quality of your knowledge base impact the effectiveness of a RAG system?
14. How can user feedback be incorporated into improving the ranking and retrieval strategies of your system?
15. What role does explainability play in diagnosing and addressing semantic dissonance in RAG systems?

### Practical Applications
16. How would you adapt the strategies discussed here for a different domain, such as legal, healthcare, or customer support?
17. If you were tasked with building a RAG system, what steps would you take to minimise semantic dissonance from the outset?
18. What metrics would you use to evaluate the effectiveness of a RAG system in addressing semantic dissonance?

# Reflection Questions and Suggested Answers on Semantic Dissonance and RAG Systems - This part will be hidden during accelerator session
 
# Reflection Questions and Detailed Answers on Semantic Dissonance in the Furniture Industry (with Expanded Metrics for Evaluation)

### Understanding the Problem

1. **What is semantic dissonance, and how does it affect the reliability of a RAG system?**  
   **Answer:** Semantic dissonance refers to a mismatch between the intent of a query, the system's understanding, and the retrieved results. In the furniture industry, this could mean a query like, *"What materials are best for outdoor furniture?"* retrieves results about indoor materials like velvet or leather instead of weather-resistant options like teak or aluminum. This reduces the system's reliability, leading to irrelevant answers that fail to meet user needs.

2. **Why do general-purpose embedding models struggle with domain-specific tasks?**  
   **Answer:** General-purpose models lack the nuanced understanding of industry-specific terminology and relationships. For example, in the furniture domain, the term "chair" might be associated with office chairs, dining chairs, or lounge chairs. Without domain-specific training, a general model might misinterpret *"ergonomic chairs for remote work"* and suggest irrelevant products like dining stools.

3. **In what ways can noisy cosine similarity scores mislead a RAG system’s performance?**  
   **Answer:** Noisy scores can rank irrelevant results higher. For instance, if the query is *"Modern Scandinavian coffee tables,"* and a table called *"Traditional Oak Side Table"* receives a higher cosine similarity score than a correctly labeled *"Minimalist Birch Coffee Table,"* the system fails to retrieve the most relevant item.

---

### Diagnosing Issues

4. **How can you determine if your RAG system is experiencing semantic dissonance?**  
   **Answer:** Test the system with real-world queries and evaluate retrieved results. Metrics like **context precision** and **context recall** are critical here:
   - **Context precision**: The percentage of retrieved results that are relevant and match the intended query context. For instance, if a query like *"Family-friendly sofas"* retrieves five items, but only three are stain-resistant and durable, the precision is 60%.
   - **Context recall**: The percentage of relevant items retrieved out of all possible relevant items in the database. If there are 10 family-friendly sofas in the database and the system retrieves five, the recall is 50%.
   These metrics help identify whether irrelevant or missing results are contributing to semantic dissonance.

5. **What are the limitations of using raw schema definitions for retrieval tasks?**  
   **Answer:** Raw schemas like *“product_name,” “material,” “dimensions”* lack context. For example, a schema field labeled *“material”* doesn’t convey whether it pertains to upholstery, frame, or finish. This ambiguity hinders the system’s ability to match queries like *“durable wooden bed frames”* with the correct items.

6. **How might incomplete or poorly structured metadata contribute to irrelevant results?**  
   **Answer:** Poorly structured metadata leads to misalignment between user queries and product attributes. For instance, if metadata doesn’t specify whether wood is FSC-certified, a query like *"Eco-friendly wooden chairs"* might retrieve irrelevant results. Incomplete metadata can also introduce **bias**, such as favoring items with more detailed descriptions over simpler products that may still be relevant.

---

### Exploring Solutions

7. **How does enriching table schemas with descriptions or sample questions improve retrieval accuracy?**  
   **Answer:** Adding descriptions and sample questions helps clarify context. For instance, a schema enriched with a description like *“material: primary material used for the furniture frame”* and sample questions like *“What types of wood are used in this table?”* bridges the gap between user intent and data structure.

8. **What are the trade-offs between using sample questions alone versus combining schemas and metadata?**  
   **Answer:** Sample questions alone focus on user intent but might overlook broader data attributes. Combining them with schemas and metadata ensures comprehensive coverage. For example, a query like *“Stain-resistant dining chairs”* benefits from metadata about materials (e.g., treated fabric) alongside user-friendly sample questions.

9. **How could domain-specific fine-tuning of embedding models reduce semantic dissonance?**  
   **Answer:** Fine-tuning embeddings on furniture-specific datasets allows the model to learn nuances like *“Scandinavian design”* or *“ergonomic features.”* This improves retrieval accuracy by aligning vector representations with industry-relevant semantics.

---

### Evaluating Strategies

10. **Which of the four retrieval strategies presented do you think is most applicable to your domain, and why?**  
    **Answer:** In the furniture industry, *“schema + description + sample questions”* is most effective. This approach balances technical detail (schemas), semantic context (descriptions), and user perspective (sample questions). For example, a query like *“Compact dining tables for small apartments”* can leverage dimensions, product descriptions, and relevant questions.

11. **What additional metadata could be included to enhance the context for your RAG system?**  
    **Answer:** Include metadata like:
    - **Usage context** (e.g., indoor/outdoor).
    - **Design style** (e.g., modern, traditional).
    - **Sustainability certifications** (e.g., FSC-certified wood).
    - **Target audience** (e.g., family-friendly, workspace).
    - **Bias-aware tags** (e.g., highlighting underrepresented product categories to promote fairness).
    This allows queries like *“Eco-friendly cribs for toddlers”* to retrieve precise results.

12. **How can you balance the complexity of adding metadata with the potential improvement in system performance?**  
    **Answer:** Start with high-priority metadata (e.g., materials, dimensions, style) and expand iteratively based on user feedback. For example, if customers frequently ask about sustainable options, prioritize adding sustainability-related metadata.

---

### Broader Considerations

13. **In what ways might the structure and quality of your knowledge base impact the effectiveness of a RAG system?**  
    **Answer:** A well-structured knowledge base ensures data consistency and relevance. For example, categorizing products by *room type, material, and design style* allows a query like *“Mid-century modern armchairs for living rooms”* to return focused results. Metrics like **fairness** can ensure underrepresented product categories (e.g., budget-friendly or minority-sourced products) are equally considered.

14. **How can user feedback be incorporated into improving the ranking and retrieval strategies of your system?**  
    **Answer:** Gather feedback on retrieved results (e.g., thumbs up/down or relevance scores) and use it to refine ranking algorithms. Evaluate user feedback through metrics like:
    - **Bias detection**: Check if certain categories (e.g., luxury furniture) are consistently overrepresented.
    - **User satisfaction**: Measure how often users find what they need on the first try.

15. **What role does explainability play in diagnosing and addressing semantic dissonance in RAG systems?**  
    **Answer:** Explainability helps identify mismatches. For example, if a query retrieves irrelevant results, inspecting vector similarities, metadata mappings, and fairness metrics can reveal whether the issue lies in embeddings, metadata, or ranking logic.

---

### Practical Applications

16. **If you were tasked with building a RAG system, what steps would you take to minimise semantic dissonance from the outset?**  
    **Answer:** Steps include:
    - Defining clear metadata for key attributes like style, material, and function.
    - Fine-tuning embeddings with domain-specific data.
    - Testing queries across common use cases (e.g., *“Space-saving furniture for studios”*).
    - Incorporating metrics like **fairness** (e.g., ensuring products from small or minority-owned businesses are fairly represented).

17. **What metrics would you use to evaluate the effectiveness of a RAG system in addressing semantic dissonance?**  
    **Answer:** Metrics include:
    - **Context precision**: Measures how well results align with query context.
    - **Context recall**: Measures how many relevant items are retrieved out of all possible relevant items.
    - **Fairness**: Evaluates if certain categories or groups (e.g., sustainable furniture) are underrepresented.
    - **Bias detection**: Identifies systematic over- or under-representation of certain product types.
    - **User satisfaction**: Tracks qualitative feedback and success rates.
    - **Explainability**: Assesses how well the system justifies its ranking and retrieval logic.
