# Explainability System & Comprehensive Query Generation

**Project:** AI Fashion Assistant - T√úBƒ∞TAK 2209-A Research Project  
**Date:** January 1, 2025  
**Version:** v2.1-core-ml-plus

---

## Overview

This notebook implements two critical components for production-ready fashion search:

1. **Explainability System**: Generate human-readable explanations for search results
2. **Comprehensive Query Generation**: Create 100+ diverse test queries using LLM

### Objectives

**Explainability:**
- Explain why each result was retrieved
- Highlight matching attributes (pattern, color, style, etc.)
- Show confidence scores and fusion contributions
- Enable user trust and search refinement

**Query Generation:**
- Generate 100+ diverse, realistic queries
- Cover multiple difficulty levels (simple ‚Üí complex)
- Include attribute-specific queries
- Support bilingual (Turkish/English) scenarios

### Methodology

**Explainability:**
- Attribute matching analysis
- Text-image fusion score decomposition
- Template-based natural language generation
- LLM-enhanced contextual explanations

**Query Generation:**
- LLM-based query synthesis (GROQ Llama-3.3-70B)
- Stratified sampling across query types
- Attribute-aware query construction
- Quality validation and deduplication

## Table of Contents

### Part 1: Explainability System
1. [Setup & Data Loading](#1-setup)
2. [Search Engine with Fusion](#2-search-engine)
3. [Explainability Framework](#3-explainability)
4. [Example Explanations](#4-examples)

### Part 2: Query Generation
5. [LLM Setup (GROQ)](#5-llm-setup)
6. [Query Categories & Templates](#6-query-categories)
7. [LLM-Based Generation](#7-generation)
8. [Query Validation & Export](#8-validation)

---
# Part 1: Explainability System
---

## 1. Setup & Data Loading

In [None]:
# Mount Drive
from google.colab import drive
import os

drive.mount('/content/drive', force_remount=False)
os.chdir('/content/drive/MyDrive/ai_fashion_assistant_v2')

print(f'‚úÖ Working directory: {os.getcwd()}')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
‚úÖ Working directory: /content/drive/MyDrive/ai_fashion_assistant_v2


In [None]:
pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.8 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m23.8/23.8 MB[0m [31m75.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.13.2


In [None]:
# Imports
import numpy as np
import pandas as pd
import torch
import torch.nn.functional as F
from sentence_transformers import SentenceTransformer
import faiss
from typing import List, Dict, Tuple
from dataclasses import dataclass
import json
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

print('‚úÖ Imports complete')

‚úÖ Imports complete


In [None]:
# Configuration
@dataclass
class Config:
    # Paths
    V20_EMBEDDINGS = 'v2.0-baseline/embeddings'
    V21_RESULTS = 'v2.1-core-ml-plus/evaluation/results'
    METADATA_PATH = 'data/processed/meta_ssot.csv'

    # Models
    TEXT_ENCODER = 'sentence-transformers/paraphrase-multilingual-mpnet-base-v2'

    # Fusion parameters
    ALPHA = 0.7  # Text weight (from Day 1-2 optimization)

    # Search parameters
    TOP_K = 10

    # Device
    DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

config = Config()
print('‚úÖ Config loaded')
print(f'   Device: {config.DEVICE}')
print(f'   Fusion Œ±: {config.ALPHA}')

‚úÖ Config loaded
   Device: cpu
   Fusion Œ±: 0.7


In [None]:
# Load data
print('Loading data...')

# Embeddings
text_emb = np.load(f'{config.V20_EMBEDDINGS}/text/mpnet_768d.npy')
image_emb = np.load(f'{config.V20_EMBEDDINGS}/image/clip_image_768d_normalized.npy')

# Metadata
metadata = pd.read_csv(config.METADATA_PATH)

# Enhanced products (with attributes)
enhanced = pd.read_csv(f'{config.V21_RESULTS}/enhanced_products.csv')

# Attributes (long format)
attributes = pd.read_csv(f'{config.V21_RESULTS}/product_attributes.csv')

print(f'‚úÖ Data loaded')
print(f'   Products: {len(metadata):,}')
print(f'   Text embeddings: {text_emb.shape}')
print(f'   Image embeddings: {image_emb.shape}')
print(f'   Attributes: {len(attributes):,} total')

Loading data...
‚úÖ Data loaded
   Products: 44,417
   Text embeddings: (44417, 768)
   Image embeddings: (44417, 768)
   Attributes: 307,720 total


## 2. Search Engine with Learned Fusion

In [None]:
class FashionSearchEngine:
    """Multimodal fashion search with explainability"""

    def __init__(self, text_emb, image_emb, metadata, attributes, alpha=0.7):
        self.text_emb = text_emb
        self.image_emb = image_emb
        self.metadata = metadata
        self.attributes = attributes
        self.alpha = alpha

        # Load text encoder
        self.encoder = SentenceTransformer(config.TEXT_ENCODER)

        # Create fusion embeddings
        self.fusion_emb = self._create_fusion()

        # Build FAISS index
        self.index = self._build_index()

        # Build attribute lookup
        self.attr_lookup = self._build_attr_lookup()

    def _create_fusion(self):
        """Create fused embeddings"""
        fused = self.alpha * self.text_emb + (1 - self.alpha) * self.image_emb
        # Normalize
        fused = fused / np.linalg.norm(fused, axis=1, keepdims=True)
        return fused

    def _build_index(self):
        """Build FAISS index"""
        dimension = self.fusion_emb.shape[1]
        index = faiss.IndexFlatIP(dimension)  # Inner product = cosine similarity
        index.add(self.fusion_emb.astype('float32'))
        return index

    def _build_attr_lookup(self):
        """Build fast attribute lookup dict"""
        lookup = {}
        for _, row in self.attributes.iterrows():
            pid = row['product_id']
            if pid not in lookup:
                lookup[pid] = {}
            lookup[pid][row['category']] = {
                'value': row['value'],
                'confidence': row['confidence']
            }
        return lookup

    def search(self, query: str, k: int = 10) -> Dict:
        """Search with full explainability data"""
        # Encode query
        query_emb = self.encoder.encode([query])[0]
        query_emb = query_emb / np.linalg.norm(query_emb)

        # Search
        scores, indices = self.index.search(
            query_emb.reshape(1, -1).astype('float32'),
            k
        )

        # Collect results with explanation data
        results = []
        for rank, (idx, score) in enumerate(zip(indices[0], scores[0])):
            # Compute individual scores
            text_score = np.dot(query_emb, self.text_emb[idx])
            image_score = np.dot(query_emb, self.image_emb[idx])

            # Get product info
            product = self.metadata.iloc[idx]
            attrs = self.attr_lookup.get(idx, {})

            results.append({
                'rank': rank + 1,
                'product_id': int(idx),
                'name': product['productDisplayName'],
                'category': product.get('masterCategory', 'Unknown'),
                'fusion_score': float(score),
                'text_score': float(text_score),
                'image_score': float(image_score),
                'text_contribution': float(self.alpha * text_score),
                'image_contribution': float((1 - self.alpha) * image_score),
                'attributes': attrs
            })

        return {
            'query': query,
            'results': results,
            'alpha': self.alpha
        }

# Initialize search engine
print('Initializing search engine...')
search_engine = FashionSearchEngine(
    text_emb,
    image_emb,
    metadata,
    attributes,
    alpha=config.ALPHA
)

print('‚úÖ Search engine ready')
print(f'   Index size: {search_engine.index.ntotal:,}')
print(f'   Fusion Œ±: {search_engine.alpha}')

Initializing search engine...


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/723 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/402 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

‚úÖ Search engine ready
   Index size: 44,417
   Fusion Œ±: 0.7


## 3. Explainability Framework

In [None]:
class ExplainabilityGenerator:
    """Generate human-readable search explanations"""

    def __init__(self, alpha: float = 0.7):
        self.alpha = alpha

    def explain_result(self, result: Dict, query: str) -> str:
        """Generate explanation for a single result"""
        explanation_parts = []

        # 1. Overall match score
        score_pct = result['fusion_score'] * 100
        explanation_parts.append(
            f"**Match Score: {score_pct:.1f}%**"
        )

        # 2. Text vs Image contribution
        text_contrib = result['text_contribution'] / result['fusion_score'] * 100
        image_contrib = result['image_contribution'] / result['fusion_score'] * 100

        explanation_parts.append(
            f"- Text match: {text_contrib:.1f}% (description similarity)"
        )
        explanation_parts.append(
            f"- Visual match: {image_contrib:.1f}% (appearance similarity)"
        )

        # 3. Matching attributes
        attrs = result['attributes']
        if attrs:
            attr_strs = []
            for cat, info in attrs.items():
                conf_pct = info['confidence'] * 100
                attr_strs.append(f"{info['value']} ({conf_pct:.0f}%)")

            if attr_strs:
                explanation_parts.append(
                    f"- Detected attributes: {', '.join(attr_strs[:5])}"
                )

        # 4. Why this result?
        if result['text_score'] > result['image_score']:
            reason = "Strong textual match - product description aligns with query"
        else:
            reason = "Strong visual similarity - appearance matches query intent"

        explanation_parts.append(f"- Reason: {reason}")

        return "\n".join(explanation_parts)

    def explain_search(self, search_result: Dict) -> str:
        """Generate full search explanation"""
        query = search_result['query']
        results = search_result['results']

        explanation = f"# Search Explanation: '{query}'\n\n"
        explanation += f"Found {len(results)} results using multimodal fusion (Œ±={self.alpha})\n\n"

        for result in results[:5]:  # Top 5
            explanation += f"## Rank {result['rank']}: {result['name']}\n"
            explanation += self.explain_result(result, query)
            explanation += "\n\n"

        return explanation

# Initialize explainer
explainer = ExplainabilityGenerator(alpha=config.ALPHA)
print('‚úÖ Explainability generator ready')

‚úÖ Explainability generator ready


## 4. Example Explanations

In [None]:
# Test queries
test_queries = [
    "kƒ±rmƒ±zƒ± elbise",
    "casual summer outfit",
    "formal office wear"
]

print("üîç EXAMPLE SEARCH EXPLANATIONS\n")
print("="*70)

for query in test_queries:
    # Search
    result = search_engine.search(query, k=5)

    # Explain
    explanation = explainer.explain_search(result)

    print(f"\n{explanation}")
    print("="*70)

üîç EXAMPLE SEARCH EXPLANATIONS


# Search Explanation: 'kƒ±rmƒ±zƒ± elbise'

Found 5 results using multimodal fusion (Œ±=0.7)

## Rank 1: Remanika Women Red Dress
**Match Score: 79.5%**
- Text match: 214.5% (description similarity)
- Visual match: 0.4% (appearance similarity)
- Detected attributes: geometric pattern (20%), loose fitting (20%), short length (20%), v-neck (19%), sleeveless (21%)
- Reason: Strong textual match - product description aligns with query

## Rank 2: AND Women Red Dress
**Match Score: 78.4%**
- Text match: 221.5% (description similarity)
- Visual match: 0.2% (appearance similarity)
- Detected attributes: geometric pattern (20%), loose fitting (20%), cropped (20%), v-neck (20%), sleeveless (22%)
- Reason: Strong textual match - product description aligns with query

## Rank 3: Elle Women Red Dress
**Match Score: 77.1%**
- Text match: 214.9% (description similarity)
- Visual match: 0.8% (appearance similarity)
- Detected attributes: geometric pattern (19%), loos

---
# Part 2: Comprehensive Query Generation
---

## 5. LLM Setup (GROQ)

In [None]:
# Install GROQ
!pip install -q groq

from groq import Groq

# Initialize (you'll need to add your API key)
# Get free key from: https://console.groq.com
GROQ_API_KEY = "YOUR_GROQ_API_KEY_HERE"  # Replace with your key

client = Groq(api_key=GROQ_API_KEY)

print('‚úÖ GROQ client initialized')
print('   Model: llama-3.3-70b-versatile')

‚úÖ GROQ client initialized
   Model: llama-3.3-70b-versatile


## 6. Query Categories & Templates

In [None]:
# Query taxonomy
QUERY_TAXONOMY = {
    'simple_item': {
        'description': 'Basic product name queries',
        'count': 15,
        'examples': ['kƒ±rmƒ±zƒ± elbise', 'siyah ayakkabƒ±', 'blue jeans']
    },
    'attribute_specific': {
        'description': 'Queries with specific attributes',
        'count': 20,
        'examples': ['striped shirt', 'loose fitting pants', 'v-neck summer dress']
    },
    'occasion_based': {
        'description': 'Queries for specific occasions',
        'count': 15,
        'examples': ['office wear', 'party outfit', 'd√ºƒü√ºn kƒ±yafeti']
    },
    'style_based': {
        'description': 'Style-focused queries',
        'count': 15,
        'examples': ['vintage style', 'modern casual', 'minimalist look']
    },
    'complex_multi_attr': {
        'description': 'Queries combining multiple attributes',
        'count': 20,
        'examples': ['casual striped long sleeve shirt', 'elegant formal black dress']
    },
    'seasonal': {
        'description': 'Season-specific queries',
        'count': 10,
        'examples': ['summer outfit', 'winter coat', 'spring dress']
    },
    'budget_conscious': {
        'description': 'Queries implying price sensitivity',
        'count': 10,
        'examples': ['affordable casual wear', 'budget friendly shoes']
    }
}

total_queries = sum(cat['count'] for cat in QUERY_TAXONOMY.values())

print('üìä Query Taxonomy')
print(f'   Total categories: {len(QUERY_TAXONOMY)}')
print(f'   Total queries to generate: {total_queries}')
print(f'\n   Distribution:')
for name, info in QUERY_TAXONOMY.items():
    print(f'   - {name}: {info["count"]} queries')

üìä Query Taxonomy
   Total categories: 7
   Total queries to generate: 105

   Distribution:
   - simple_item: 15 queries
   - attribute_specific: 20 queries
   - occasion_based: 15 queries
   - style_based: 15 queries
   - complex_multi_attr: 20 queries
   - seasonal: 10 queries
   - budget_conscious: 10 queries


## 7. LLM-Based Query Generation

In [None]:
def generate_queries_for_category(category_name: str,
                                 category_info: Dict,
                                 client: Groq) -> List[str]:
    """Generate queries for a specific category using LLM"""

    prompt = f"""You are a fashion e-commerce expert. Generate {category_info['count']} diverse search queries.

Category: {category_name}
Description: {category_info['description']}
Examples: {', '.join(category_info['examples'])}

Requirements:
- Mix of Turkish and English queries (50-50)
- Realistic user search behavior
- Diverse product types (clothing, shoes, accessories)
- Natural language (as real users would type)
- One query per line
- No numbering or bullets

Generate exactly {category_info['count']} queries:"""

    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.8,
        max_tokens=500
    )

    # Parse queries
    queries = response.choices[0].message.content.strip().split('\n')
    queries = [q.strip() for q in queries if q.strip()]

    # Remove numbering if present
    queries = [q.split('. ', 1)[-1] if '. ' in q else q for q in queries]

    return queries[:category_info['count']]  # Ensure exact count

print('‚úÖ Query generation function ready')

‚úÖ Query generation function ready


In [None]:
# GENERATE ALL QUERIES
print('üî• Generating 105 queries using GROQ Llama-3.3-70B...')
print('‚è±Ô∏è  Estimated time: 2-3 minutes\n')

all_generated_queries = {}

for category_name, category_info in QUERY_TAXONOMY.items():
    print(f'Generating {category_info["count"]} queries for: {category_name}...')

    queries = generate_queries_for_category(
        category_name,
        category_info,
        client
    )

    all_generated_queries[category_name] = queries
    print(f'  ‚úÖ Generated {len(queries)} queries\n')

# Flatten all queries
all_queries = []
for category, queries in all_generated_queries.items():
    for query in queries:
        all_queries.append({
            'query': query,
            'category': category
        })

print(f'\n‚úÖ GENERATION COMPLETE!')
print(f'   Total queries: {len(all_queries)}')
print(f'   Categories: {len(QUERY_TAXONOMY)}')

üî• Generating 105 queries using GROQ Llama-3.3-70B...
‚è±Ô∏è  Estimated time: 2-3 minutes

Generating 15 queries for: simple_item...
  ‚úÖ Generated 15 queries

Generating 20 queries for: attribute_specific...
  ‚úÖ Generated 20 queries

Generating 15 queries for: occasion_based...
  ‚úÖ Generated 15 queries

Generating 15 queries for: style_based...
  ‚úÖ Generated 15 queries

Generating 20 queries for: complex_multi_attr...
  ‚úÖ Generated 20 queries

Generating 10 queries for: seasonal...
  ‚úÖ Generated 10 queries

Generating 10 queries for: budget_conscious...
  ‚úÖ Generated 10 queries


‚úÖ GENERATION COMPLETE!
   Total queries: 105
   Categories: 7


## 8. Query Validation & Export

In [None]:
# Create DataFrame
queries_df = pd.DataFrame(all_queries)

# Validation
print('üìä QUERY STATISTICS:\n')

# Deduplication
original_count = len(queries_df)
queries_df = queries_df.drop_duplicates(subset=['query'])
duplicates_removed = original_count - len(queries_df)

print(f'   Total queries: {len(queries_df)}')
print(f'   Duplicates removed: {duplicates_removed}')
print(f'   Unique queries: {queries_df["query"].nunique()}')

# Category distribution
print(f'\n   Per category:')
print(queries_df.groupby('category').size())

# Language detection (simple heuristic)
def detect_language(query):
    turkish_chars = set('√ßƒüƒ±√∂≈ü√º')
    return 'Turkish' if any(c in query.lower() for c in turkish_chars) else 'English'

queries_df['language'] = queries_df['query'].apply(detect_language)

print(f'\n   Language distribution:')
print(queries_df['language'].value_counts())

# Length statistics
queries_df['word_count'] = queries_df['query'].str.split().str.len()

print(f'\n   Query length (words):')
print(queries_df['word_count'].describe())

üìä QUERY STATISTICS:

   Total queries: 104
   Duplicates removed: 1
   Unique queries: 104

   Per category:
category
attribute_specific    20
budget_conscious      10
complex_multi_attr    20
occasion_based        15
seasonal              10
simple_item           15
style_based           14
dtype: int64

   Language distribution:
language
English    59
Turkish    45
Name: count, dtype: int64

   Query length (words):
count    104.000000
mean       3.634615
std        1.191048
min        2.000000
25%        3.000000
50%        4.000000
75%        4.250000
max        7.000000
Name: word_count, dtype: float64


In [None]:
# Sample queries
print('\nüìù SAMPLE QUERIES (10 random):\n')
sample = queries_df.sample(10)
for idx, row in sample.iterrows():
    print(f"   [{row['category']:20s}] {row['query']}")


üìù SAMPLE QUERIES (10 random):

   [budget_conscious    ] affordable winter coats
   [attribute_specific  ] erkek i√ßin geni≈ü ayakkabilar
   [complex_multi_attr  ] √ßocuk i√ßin renkli spor ayakkabƒ±
   [budget_conscious    ] cheap and trendy accessories for women
   [complex_multi_attr  ] sƒ±cak kahverengi kazak
   [attribute_specific  ] red scarf for women
   [seasonal            ] yaz i√ßin ≈üƒ±k elbise
   [attribute_specific  ] high heel sandals for summer
   [seasonal            ] ilkbahar i√ßinappropriate ayakkabƒ±
   [simple_item         ] siyah √ßanta


In [None]:
# Export
output_dir = Path(config.V21_RESULTS)

# Save queries
queries_path = output_dir / 'evaluation_queries_100plus.csv'
queries_df.to_csv(queries_path, index=False)

# Save by category
categorized = {}
for category in QUERY_TAXONOMY.keys():
    categorized[category] = queries_df[queries_df['category'] == category]['query'].tolist()

categorized_path = output_dir / 'queries_by_category.json'
with open(categorized_path, 'w', encoding='utf-8') as f:
    json.dump(categorized, f, indent=2, ensure_ascii=False)

print(f'‚úÖ Queries saved:')
print(f'   {queries_path.name} ({len(queries_df)} queries)')
print(f'   {categorized_path.name} (categorized)')

‚úÖ Queries saved:
   evaluation_queries_100plus.csv (104 queries)
   queries_by_category.json (categorized)


In [None]:
# Final summary
summary = f"""
{'='*70}
EXPLAINABILITY & QUERY GENERATION - COMPLETE
{'='*70}

üìä EXPLAINABILITY SYSTEM:
   ‚úÖ Multimodal fusion score decomposition
   ‚úÖ Attribute-based matching explanations
   ‚úÖ Text vs Visual contribution analysis
   ‚úÖ Natural language explanation generation

üìä QUERY GENERATION:
   Total queries: {len(queries_df)}
   Categories: {len(QUERY_TAXONOMY)}
   Languages: {queries_df['language'].value_counts().to_dict()}
   Avg length: {queries_df['word_count'].mean():.1f} words

üìÅ OUTPUT FILES:
   1. evaluation_queries_100plus.csv
   2. queries_by_category.json

‚è≠Ô∏è  NEXT STEPS:
   - Run 7 baseline comparisons
   - Compute NDCG@10 for all methods
   - Statistical validation (p<0.05)
   - Generate evaluation report

{'='*70}
"""

print(summary)

# Save summary
summary_path = output_dir / 'explainability_summary.txt'
with open(summary_path, 'w') as f:
    f.write(summary)

print(f'‚úÖ Summary saved: {summary_path}')


EXPLAINABILITY & QUERY GENERATION - COMPLETE

üìä EXPLAINABILITY SYSTEM:
   ‚úÖ Multimodal fusion score decomposition
   ‚úÖ Attribute-based matching explanations
   ‚úÖ Text vs Visual contribution analysis
   ‚úÖ Natural language explanation generation

üìä QUERY GENERATION:
   Total queries: 104
   Categories: 7
   Languages: {'English': 59, 'Turkish': 45}
   Avg length: 3.6 words

üìÅ OUTPUT FILES:
   1. evaluation_queries_100plus.csv
   2. queries_by_category.json

‚è≠Ô∏è  NEXT STEPS:
   - Run 7 baseline comparisons
   - Compute NDCG@10 for all methods
   - Statistical validation (p<0.05)
   - Generate evaluation report


‚úÖ Summary saved: v2.1-core-ml-plus/evaluation/results/explainability_summary.txt
