# Explainable Search Results

**Phase 8, Notebook 4/4** - Making search results understandable

---

## Goal

Help users understand why they're seeing specific products. Show what matched,
what didn't, and give them ways to refine their search.

---

## What we're building

1. **Match explanations** - Why this product ranks where it does
2. **Factor breakdown** - Text score, image score, etc.
3. **Counterfactuals** - Why product X but not Y?
4. **Suggestions** - How to refine search

---

## Example explanation

```
Product #3: Nike Air Max 90

Why this ranked here:
✓ Matches color (beyaz) perfectly
✓ Exact brand match (nike)  
✓ High visual similarity (87%)
✓ Category match (spor ayakkabı)
~ Price slightly above average

Ranking factors:
- Text match: 35%
- Image match: 25%
- Brand: 20%
- Popularity: 15%
- User preference: 5%
```

---

In [1]:
from google.colab import drive
drive.mount("/content/drive", force_remount=False)

print("drive mounted")

Mounted at /content/drive
drive mounted


In [2]:
import os
import sys
import json
import numpy as np
import pandas as pd
from pathlib import Path
from typing import Dict, List, Tuple
from collections import defaultdict

PROJECT_ROOT = Path("/content/drive/MyDrive/ai_fashion_assistant_v2")
sys.path.insert(0, str(PROJECT_ROOT))

print("imports done")

imports done


In [3]:
DATA_DIR = PROJECT_ROOT / "data/processed"
LLM_DIR = PROJECT_ROOT / "llm"
EXPLAIN_DIR = LLM_DIR / "explanations"

EXPLAIN_DIR.mkdir(exist_ok=True)

print(f"working in: {EXPLAIN_DIR}")

# simple config
CONFIG = {
    'confidence_threshold': 0.7,
    'top_factors': 5,
    'min_explanation_length': 20,
}

print("\nconfig:")
for k, v in CONFIG.items():
    print(f"  {k}: {v}")

working in: /content/drive/MyDrive/ai_fashion_assistant_v2/llm/explanations

config:
  confidence_threshold: 0.7
  top_factors: 5
  min_explanation_length: 20


In [4]:
# ============================================================
# EXPLANATION GENERATOR
# ============================================================

print("building explanation generator...\n")
print("=" * 60)

class ExplanationGenerator:
    """generates explanations for search results"""

    def __init__(self, config: Dict):
        self.config = config

    def explain_match(self, product: Dict, query_slots: Dict, rank: int) -> str:
        """explain why this product matched"""

        explanation = f"Product #{rank + 1}: {product.get('name', 'Unknown')}\n\n"
        explanation += "Why this ranked here:\n"

        # check each slot
        matches = []
        partial = []

        for slot, value in query_slots.items():
            product_value = product.get(slot, '').lower()
            query_value = str(value).lower()

            if query_value in product_value:
                matches.append(f"✓ Matches {slot} ({value})")
            elif product_value:
                partial.append(f"~ {slot} is {product_value} (you wanted {value})")

        # add matches
        for m in matches:
            explanation += m + "\n"

        # add partials
        for p in partial:
            explanation += p + "\n"

        return explanation

    def explain_factors(self, scores: Dict[str, float]) -> str:
        """explain ranking factors"""

        explanation = "\nRanking factors:\n"

        # sort by score
        sorted_factors = sorted(scores.items(), key=lambda x: -x[1])

        for factor, score in sorted_factors[:self.config['top_factors']]:
            percentage = score * 100
            explanation += f"- {factor}: {percentage:.0f}%\n"

        return explanation

    def explain_counterfactual(
        self,
        shown_product: Dict,
        not_shown_product: Dict
    ) -> str:
        """explain why X was shown but not Y"""

        explanation = f"\nWhy {shown_product['name']} but not {not_shown_product['name']}?\n\n"

        # compare key attributes
        differences = []

        for attr in ['color', 'brand', 'price', 'category']:
            val1 = shown_product.get(attr, 'unknown')
            val2 = not_shown_product.get(attr, 'unknown')

            if val1 != val2:
                differences.append(f"- {attr}: {val1} vs {val2}")

        for diff in differences:
            explanation += diff + "\n"

        return explanation

    def suggest_refinements(self, query_slots: Dict, results_count: int) -> List[str]:
        """suggest ways to refine search"""

        suggestions = []

        # too many results
        if results_count > 50:
            if 'color' not in query_slots:
                suggestions.append("Try adding a color to narrow down")
            if 'brand' not in query_slots:
                suggestions.append("Specify a brand for more relevant results")
            if 'price_max' not in query_slots:
                suggestions.append("Set a budget to filter by price")

        # too few results
        elif results_count < 5:
            suggestions.append("Try removing some filters")
            if 'brand' in query_slots:
                suggestions.append("Try removing brand filter")

        return suggestions


explainer = ExplanationGenerator(CONFIG)

print("explanation generator ready")
print("\nfeatures:")
print("  - match explanations")
print("  - factor breakdown")
print("  - counterfactuals")
print("  - refinement suggestions")
print("\n" + "=" * 60)

building explanation generator...

explanation generator ready

features:
  - match explanations
  - factor breakdown
  - counterfactuals
  - refinement suggestions



In [5]:
# ============================================================
# TEST EXPLANATIONS
# ============================================================

print("\ntesting explanations...\n")
print("=" * 60)

# mock data for testing
test_product = {
    'name': 'Nike Air Max 90 Beyaz',
    'color': 'beyaz',
    'brand': 'nike',
    'category': 'ayakkabı',
    'gender': 'kadın',
    'price': 850
}

test_query = {
    'color': 'beyaz',
    'brand': 'nike',
    'category': 'ayakkabı'
}

test_scores = {
    'text_match': 0.35,
    'image_match': 0.25,
    'brand_match': 0.20,
    'popularity': 0.15,
    'user_pref': 0.05
}

# generate explanation
print("Example 1: Match explanation")
print("-" * 60)
match_explain = explainer.explain_match(test_product, test_query, rank=2)
print(match_explain)

print("\nExample 2: Factor breakdown")
print("-" * 60)
factor_explain = explainer.explain_factors(test_scores)
print(factor_explain)

print("\nExample 3: Refinement suggestions")
print("-" * 60)
suggestions = explainer.suggest_refinements(test_query, results_count=100)
print("Suggestions:")
for i, sug in enumerate(suggestions, 1):
    print(f"{i}. {sug}")

print("\n" + "=" * 60)
print("explanations working")


testing explanations...

Example 1: Match explanation
------------------------------------------------------------
Product #3: Nike Air Max 90 Beyaz

Why this ranked here:
✓ Matches color (beyaz)
✓ Matches brand (nike)
✓ Matches category (ayakkabı)


Example 2: Factor breakdown
------------------------------------------------------------

Ranking factors:
- text_match: 35%
- image_match: 25%
- brand_match: 20%
- popularity: 15%
- user_pref: 5%


Example 3: Refinement suggestions
------------------------------------------------------------
Suggestions:
1. Set a budget to filter by price

explanations working


In [6]:
# ============================================================
# CONFIDENCE SCORING
# ============================================================

print("\nadding confidence scores...\n")
print("=" * 60)

def compute_match_confidence(product: Dict, query_slots: Dict) -> float:
    """how confident are we this is a good match?"""

    if not query_slots:
        return 0.5  # no criteria = uncertain

    matches = 0
    total = len(query_slots)

    for slot, value in query_slots.items():
        product_value = str(product.get(slot, '')).lower()
        query_value = str(value).lower()

        if query_value in product_value:
            matches += 1

    confidence = matches / total
    return confidence


def get_confidence_label(score: float) -> str:
    """convert score to human label"""
    if score >= 0.9:
        return "High confidence - strong match"
    elif score >= 0.7:
        return "Good confidence - likely match"
    elif score >= 0.5:
        return "Medium confidence - might match"
    else:
        return "Low confidence - weak match"


# test it
conf = compute_match_confidence(test_product, test_query)
label = get_confidence_label(conf)

print(f"Match confidence: {conf:.2f}")
print(f"Label: {label}")

print("\n" + "=" * 60)
print("confidence scoring ready")


adding confidence scores...

Match confidence: 1.00
Label: High confidence - strong match

confidence scoring ready


In [7]:
# ============================================================
# FULL EXPLANATION SYSTEM
# ============================================================

print("\nbuilding full system...\n")
print("=" * 60)

class ExplainableSearch:
    """search with explanations"""

    def __init__(self, explainer: ExplanationGenerator):
        self.explainer = explainer

    def search_with_explanations(
        self,
        query: str,
        query_slots: Dict,
        k: int = 10
    ) -> Tuple[List[Dict], List[str]]:
        """search and return results with explanations"""

        # mock search (normally would call retrieval system)
        results = self._mock_search(query, k)

        # generate explanations
        explanations = []
        for i, product in enumerate(results):
            # match explanation
            match_exp = self.explainer.explain_match(product, query_slots, i)

            # confidence
            conf = compute_match_confidence(product, query_slots)
            conf_label = get_confidence_label(conf)

            # mock scores
            scores = {
                'text_match': np.random.uniform(0.2, 0.4),
                'image_match': np.random.uniform(0.15, 0.3),
                'brand': np.random.uniform(0.1, 0.25),
                'popularity': np.random.uniform(0.05, 0.2),
                'other': np.random.uniform(0.01, 0.1)
            }
            # normalize
            total = sum(scores.values())
            scores = {k: v/total for k, v in scores.items()}

            factor_exp = self.explainer.explain_factors(scores)

            # combine
            full_exp = match_exp + factor_exp
            full_exp += f"\nConfidence: {conf:.0%} - {conf_label}\n"

            explanations.append(full_exp)

        # add refinement suggestions
        suggestions = self.explainer.suggest_refinements(query_slots, len(results))

        return results, explanations, suggestions

    def _mock_search(self, query: str, k: int) -> List[Dict]:
        """mock search results"""
        return [
            {
                'name': f'Product {i+1}',
                'color': 'beyaz',
                'brand': 'nike' if i < 3 else 'adidas',
                'category': 'ayakkabı',
                'price': 500 + i * 100
            }
            for i in range(k)
        ]


explainable_search = ExplainableSearch(explainer)

print("explainable search ready")
print("\n" + "=" * 60)


building full system...

explainable search ready



In [8]:
# ============================================================
# TEST FULL SYSTEM
# ============================================================

print("\ntesting full system...\n")
print("=" * 60)

test_query_str = "beyaz nike ayakkabı"
test_slots = {
    'color': 'beyaz',
    'brand': 'nike',
    'category': 'ayakkabı'
}

results, explanations, suggestions = explainable_search.search_with_explanations(
    test_query_str,
    test_slots,
    k=3
)

print(f"Query: '{test_query_str}'")
print(f"Found: {len(results)} results\n")

# show first result with explanation
print("Top result with explanation:")
print("=" * 60)
print(explanations[0])

if suggestions:
    print("\nSuggestions to refine search:")
    for i, sug in enumerate(suggestions, 1):
        print(f"{i}. {sug}")

print("\n" + "=" * 60)
print("system working")


testing full system...

Query: 'beyaz nike ayakkabı'
Found: 3 results

Top result with explanation:
Product #1: Product 1

Why this ranked here:
✓ Matches color (beyaz)
✓ Matches brand (nike)
✓ Matches category (ayakkabı)

Ranking factors:
- text_match: 40%
- image_match: 22%
- brand: 18%
- popularity: 12%
- other: 9%

Confidence: 100% - High confidence - strong match


Suggestions to refine search:
1. Try removing some filters
2. Try removing brand filter

system working


In [9]:
# ============================================================
# EVALUATION
# ============================================================

print("\nevaluating explanations...\n")
print("=" * 60)

def evaluate_explanations(explanations: List[str]) -> Dict:
    """measure explanation quality"""

    metrics = {
        'avg_length': 0,
        'has_matches': 0,
        'has_factors': 0,
        'has_confidence': 0
    }

    lengths = []
    for exp in explanations:
        lengths.append(len(exp))

        if 'Matches' in exp or '✓' in exp:
            metrics['has_matches'] += 1

        if 'factors' in exp.lower():
            metrics['has_factors'] += 1

        if 'confidence' in exp.lower():
            metrics['has_confidence'] += 1

    metrics['avg_length'] = np.mean(lengths)
    metrics['has_matches'] = metrics['has_matches'] / len(explanations)
    metrics['has_factors'] = metrics['has_factors'] / len(explanations)
    metrics['has_confidence'] = metrics['has_confidence'] / len(explanations)

    return metrics


metrics = evaluate_explanations(explanations)

print("Explanation quality:")
print(f"  avg length: {metrics['avg_length']:.0f} chars")
print(f"  has matches: {metrics['has_matches']:.0%}")
print(f"  has factors: {metrics['has_factors']:.0%}")
print(f"  has confidence: {metrics['has_confidence']:.0%}")

print("\n" + "=" * 60)
print("evaluation done")


evaluating explanations...

Explanation quality:
  avg length: 271 chars
  has matches: 100%
  has factors: 100%
  has confidence: 100%

evaluation done


In [10]:
# ============================================================
# SAVE RESULTS
# ============================================================

print("\nsaving results...\n")
print("=" * 60)

# save example explanations
examples_path = EXPLAIN_DIR / "explanation_examples.json"
examples_data = {
    'query': test_query_str,
    'slots': test_slots,
    'results_count': len(results),
    'explanations': explanations,
    'suggestions': suggestions
}
with open(examples_path, 'w', encoding='utf-8') as f:
    json.dump(examples_data, f, indent=2, ensure_ascii=False)
print(f"saved: {examples_path}")

# save metrics
metrics_path = EXPLAIN_DIR / "explanation_metrics.json"
with open(metrics_path, 'w') as f:
    json.dump(metrics, f, indent=2)
print(f"saved: {metrics_path}")

# save readme
readme_path = EXPLAIN_DIR / "README.txt"
with open(readme_path, 'w', encoding='utf-8') as f:
    f.write("Explainable Search Results\n")
    f.write("=" * 60 + "\n\n")
    f.write("what this does:\n")
    f.write("- explains why products matched\n")
    f.write("- shows ranking factors\n")
    f.write("- gives confidence scores\n")
    f.write("- suggests refinements\n\n")
    f.write("files:\n")
    f.write("- explanation_examples.json: sample explanations\n")
    f.write("- explanation_metrics.json: quality metrics\n")
print(f"saved: {readme_path}")

print("\n" + "=" * 60)
print("all files saved")


saving results...

saved: /content/drive/MyDrive/ai_fashion_assistant_v2/llm/explanations/explanation_examples.json
saved: /content/drive/MyDrive/ai_fashion_assistant_v2/llm/explanations/explanation_metrics.json
saved: /content/drive/MyDrive/ai_fashion_assistant_v2/llm/explanations/README.txt

all files saved


In [11]:
# ============================================================
# SUMMARY
# ============================================================

print("\nsummary\n")
print("=" * 60)

print("what we built:")
print("  - match explanations (why this product?)")
print("  - factor breakdown (text 35%, image 25%, etc)")
print("  - confidence scores (high/medium/low)")
print("  - refinement suggestions (add color, etc)")

print("\nfiles created:")
print("  - explanation_examples.json")
print("  - explanation_metrics.json")
print("  - README.txt")

print("\nquality:")
print(f"  - avg explanation: {metrics['avg_length']:.0f} chars")
print(f"  - includes matches: {metrics['has_matches']:.0%}")
print(f"  - includes factors: {metrics['has_factors']:.0%}")
print(f"  - includes confidence: {metrics['has_confidence']:.0%}")

print("\nwhy this matters:")
print("  - users understand results better")
print("  - easier to refine searches")
print("  - builds trust in system")
print("  - reduces confusion")

print("\nnext steps:")
print("  - integrate with actual search")
print("  - add more explanation types")
print("  - test with real users")
print("  - measure impact on satisfaction")

print("\n" + "=" * 60)
print("PHASE 8 COMPLETE (all 4 notebooks done!)")
print("=" * 60)


summary

what we built:
  - match explanations (why this product?)
  - factor breakdown (text 35%, image 25%, etc)
  - confidence scores (high/medium/low)
  - refinement suggestions (add color, etc)

files created:
  - explanation_examples.json
  - explanation_metrics.json
  - README.txt

quality:
  - avg explanation: 271 chars
  - includes matches: 100%
  - includes factors: 100%
  - includes confidence: 100%

why this matters:
  - users understand results better
  - easier to refine searches
  - builds trust in system
  - reduces confusion

next steps:
  - integrate with actual search
  - add more explanation types
  - test with real users
  - measure impact on satisfaction

PHASE 8 COMPLETE (all 4 notebooks done!)


---

## Done

Built a system to explain search results to users. Shows what matched,
how confident we are, and how to refine.

### What we made

1. **Match explanations** - Why each product ranked where it did
2. **Factor breakdown** - Contribution of text, image, brand, etc
3. **Confidence scores** - High/medium/low with labels
4. **Suggestions** - How to get better results

### Files

```
llm/explanations/
├── explanation_examples.json
├── explanation_metrics.json
└── README.txt
```

### Impact

Users can now understand why they see what they see. Makes the system
less of a black box. Should improve satisfaction and trust.

### Phase 8 complete

All 4 notebooks done:
- NB1: LLM integration
- NB2: Multi-turn dialogue  
- NB3: Query rewriting (+12% recall)
- NB4: Explainability

Ready to move to Phase 9 (evaluation) or Phase 10 (cleanup).

---