# AI Fashion Assistant v2.4 - Personalization Engine

**Content-Based Filtering & Recommendation System**

---

**Project:** AI Fashion Assistant (TÜBİTAK 2209-A)  
**Student:** Hatice Baydemir  
**Date:** January 5, 2026  
**Version:** 2.4.0

---

## Overview

This notebook implements a personalization engine that generates recommendations based on:
- User preference profiles
- Search history patterns
- Favorite product analysis
- Content-based similarity

### Architecture

```
User Profile → Preference Encoder → Similarity Computation → Ranking
                     ↓
Search History → Pattern Extractor → Weighting → Top-N Selection
                     ↓
Favorites → Content Analyzer → Filtering
```

---

## PART 1: Setup & Data Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import os
os.chdir('/content/drive/MyDrive/ai_fashion_assistant_v2')

print('Drive mounted')
print(f'Working directory: {os.getcwd()}')

In [None]:
import json
import numpy as np
import pandas as pd
from pathlib import Path
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass
from collections import Counter, defaultdict
import pickle

print('Imports complete')

In [3]:
# Load user management system
USER_DATA_DIR = Path('v2.4-complete/data/users')

# Verify paths exist
assert USER_DATA_DIR.exists(), f"User data directory not found: {USER_DATA_DIR}"

print(f'User data: {USER_DATA_DIR}')

User data: v2.4-complete/data/users


---

## PART 2: Generate Mock Product Data

In [14]:
# PART 2: Generate Mock Product Data - REPLACE THIS CELL
# Generate mock product catalog matching user data format
np.random.seed(42)

categories = ['dress', 'shirt', 'pants', 'shoes', 'jacket', 'skirt', 'suit', 'accessories']
colors = ['red', 'blue', 'black', 'white', 'gray', 'navy', 'green', 'yellow', 'orange']
sizes = ['xs', 's', 'm', 'l', 'xl']

# Create specific products that match user favorites and history
specific_products = [
    # Alice's products
    {'product_id': 'P001', 'product_name': 'Blue Summer Dress', 'category': 'dress', 'dominant_color': 'blue', 'size': 's'},
    {'product_id': 'P045', 'product_name': 'White Sneakers', 'category': 'shoes', 'dominant_color': 'white', 'size': 's'},
    {'product_id': 'P123', 'product_name': 'Gray Jacket', 'category': 'jacket', 'dominant_color': 'gray', 'size': 's'},
    # Bob's products
    {'product_id': 'P567', 'product_name': 'Black Suit', 'category': 'suit', 'dominant_color': 'black', 'size': 'l'},
    {'product_id': 'P234', 'product_name': 'Navy Shirt', 'category': 'shirt', 'dominant_color': 'navy', 'size': 'l'},
    # Carol's products
    {'product_id': 'P789', 'product_name': 'Vintage Dress', 'category': 'dress', 'dominant_color': 'red', 'size': 'm'},
    {'product_id': 'P890', 'product_name': 'Retro Earrings', 'category': 'accessories', 'dominant_color': 'yellow', 'size': 'm'},
]

# Add 100 more random products
for i in range(100):
    specific_products.append({
        'product_id': f'P{i+1000:03d}',  # P1000, P1001, etc.
        'product_name': f'{np.random.choice(colors).title()} {np.random.choice(categories).title()}',
        'category': np.random.choice(categories),
        'dominant_color': np.random.choice(colors),
        'secondary_color': np.random.choice(colors),
        'size': np.random.choice(sizes),
    })

products_df = pd.DataFrame(specific_products)
product_lookup = products_df.set_index('product_id').to_dict('index')

print(f'Generated {len(products_df)} mock products')
print(f'User favorites/history products included: P001, P045, P123, P567, P234, P789, P890')
print(f'\nSample products:')
print(products_df.head(10)[['product_id', 'product_name', 'category', 'dominant_color']])

Generated 107 mock products
User favorites/history products included: P001, P045, P123, P567, P234, P789, P890

Sample products:
  product_id        product_name     category dominant_color
0       P001   Blue Summer Dress        dress           blue
1       P045      White Sneakers        shoes          white
2       P123         Gray Jacket       jacket           gray
3       P567          Black Suit         suit          black
4       P234          Navy Shirt        shirt           navy
5       P789       Vintage Dress        dress            red
6       P890      Retro Earrings  accessories         yellow
7      P1000         Green Shoes       jacket         yellow
8      P1001          Black Suit        pants         yellow
9      P1002  Yellow Accessories        pants           navy


---

## PART 3: Generate Product Embeddings

In [15]:
# Install sentence-transformers if needed
try:
    from sentence_transformers import SentenceTransformer
except ImportError:
    print('Installing sentence-transformers...')
    !pip install -q sentence-transformers
    from sentence_transformers import SentenceTransformer

# Load model
print('Loading sentence transformer model...')
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Create text descriptions
descriptions = [
    f"{row['product_name']} {row['category']} {row['dominant_color']} {row['size']}"
    for _, row in products_df.iterrows()
]

# Generate embeddings
print('Generating embeddings...')
product_embeddings = model.encode(descriptions, show_progress_bar=True)
product_ids = products_df['product_id'].tolist()

print(f'Generated {len(product_ids)} embeddings')
print(f'Embedding dimension: {product_embeddings.shape[1]}')

Loading sentence transformer model...
Generating embeddings...


Batches:   0%|          | 0/4 [00:00<?, ?it/s]

Generated 107 embeddings
Embedding dimension: 384


---

## PART 4: User Data Loader

In [16]:
class UserDataLoader:
    """Load user profiles, history, and favorites"""

    def __init__(self, data_dir: Path):
        self.data_dir = Path(data_dir)

    def load_user(self, user_id: str) -> Dict:
        """Load complete user data"""
        with open(self.data_dir / 'users.json', 'r') as f:
            users = json.load(f)

        if user_id not in users:
            raise ValueError(f"User {user_id} not found")

        return users[user_id]

    def load_history(self, user_id: str) -> List[Dict]:
        """Load user search history"""
        history_file = self.data_dir / f'history_{user_id}.json'
        if not history_file.exists():
            return []

        with open(history_file, 'r') as f:
            return json.load(f)

    def load_favorites(self, user_id: str) -> Dict[str, Dict]:
        """Load user favorites"""
        favorites_file = self.data_dir / f'favorites_{user_id}.json'
        if not favorites_file.exists():
            return {}

        with open(favorites_file, 'r') as f:
            return json.load(f)

# Initialize loader
user_loader = UserDataLoader(USER_DATA_DIR)
print('UserDataLoader initialized')

UserDataLoader initialized


---

## PART 5: Preference Encoder

In [17]:
class PreferenceEncoder:
    """Encode user preferences into feature vectors"""

    def __init__(self, products_df: pd.DataFrame):
        self.products_df = products_df
        self._build_vocabularies()

    def _build_vocabularies(self):
        """Build vocabulary for categorical features"""
        self.categories = sorted(self.products_df['category'].unique())
        self.all_colors = self._extract_all_colors()

    def _extract_all_colors(self) -> List[str]:
        """Extract unique colors from product data"""
        colors = set()
        for col in ['dominant_color', 'secondary_color']:
            if col in self.products_df.columns:
                colors.update(self.products_df[col].dropna().unique())
        return sorted(colors)

    def encode_preferences(self, preferences: Dict) -> np.ndarray:
        """Encode user preferences as feature vector"""
        features = []

        # Encode style preferences (one-hot)
        style_vec = np.zeros(len(self.categories))
        for style in preferences.get('style', []):
            if style in self.categories:
                idx = self.categories.index(style)
                style_vec[idx] = 1.0
        features.append(style_vec)

        # Encode color preferences (one-hot)
        color_vec = np.zeros(len(self.all_colors))
        for color in preferences.get('colors', []):
            if color in self.all_colors:
                idx = self.all_colors.index(color)
                color_vec[idx] = 1.0
        features.append(color_vec)

        return np.concatenate(features)

    def get_feature_dim(self) -> int:
        """Get total feature dimension"""
        return len(self.categories) + len(self.all_colors)

# Initialize encoder
encoder = PreferenceEncoder(products_df)
print(f'PreferenceEncoder initialized (dim={encoder.get_feature_dim()})')

PreferenceEncoder initialized (dim=17)


---

## PART 6: Content-Based Recommender

In [18]:
from sklearn.metrics.pairwise import cosine_similarity

class ContentBasedRecommender:
    """Content-based recommendation engine"""

    def __init__(self, embeddings: np.ndarray, product_ids: List[str],
                 product_lookup: Dict):
        self.embeddings = embeddings
        self.product_ids = product_ids
        self.product_lookup = product_lookup
        self.id_to_idx = {pid: idx for idx, pid in enumerate(product_ids)}

    def similar_to_products(self, product_ids: List[str], n: int = 10) -> List[Tuple[str, float]]:
        """Find products similar to given products"""
        # Get embeddings for input products
        input_indices = [self.id_to_idx[pid] for pid in product_ids if pid in self.id_to_idx]
        if not input_indices:
            return []

        input_emb = self.embeddings[input_indices].mean(axis=0, keepdims=True)

        # Compute similarities
        similarities = cosine_similarity(input_emb, self.embeddings)[0]

        # Get top-N excluding input products
        top_indices = np.argsort(similarities)[::-1]
        results = []

        for idx in top_indices:
            pid = self.product_ids[idx]
            if pid not in product_ids:
                results.append((pid, float(similarities[idx])))
                if len(results) >= n:
                    break

        return results

    def filter_by_preferences(self, product_ids: List[str],
                             preferences: Dict) -> List[str]:
        """Filter products based on user preferences"""
        filtered = []

        for pid in product_ids:
            if pid not in self.product_lookup:
                continue

            product = self.product_lookup[pid]

            # Check size match
            if 'size' in preferences and 'size' in product:
                if product['size'] != preferences['size']:
                    continue

            # Check color preference
            if preferences.get('colors'):
                product_colors = [product.get('dominant_color', ''),
                                product.get('secondary_color', '')]
                if not any(c in preferences['colors'] for c in product_colors if c):
                    continue

            filtered.append(pid)

        return filtered

# Initialize recommender
recommender = ContentBasedRecommender(product_embeddings, product_ids, product_lookup)
print('ContentBasedRecommender initialized')

ContentBasedRecommender initialized


---

## PART 7: Personalization Engine

In [19]:
class PersonalizationEngine:
    """Main personalization engine combining multiple strategies"""

    def __init__(self, recommender: ContentBasedRecommender,
                 user_loader: UserDataLoader):
        self.recommender = recommender
        self.user_loader = user_loader

    def recommend_for_user(self, user_id: str, n: int = 15) -> Dict[str, List[Tuple[str, float]]]:
        """Generate comprehensive recommendations for user"""
        user = self.user_loader.load_user(user_id)
        favorites = self.user_loader.load_favorites(user_id)
        history = self.user_loader.load_history(user_id)

        recommendations = {}

        # Strategy 1: Based on favorites
        if favorites:
            favorite_ids = list(favorites.keys())
            recommendations['from_favorites'] = self.recommender.similar_to_products(
                favorite_ids, n=n
            )

        # Strategy 2: Based on search history
        if history:
            # Extract products from search results
            viewed_products = []
            for entry in history:
                if entry.get('top_result_id'):
                    viewed_products.append(entry['top_result_id'])

            if viewed_products:
                recommendations['from_history'] = self.recommender.similar_to_products(
                    viewed_products, n=n
                )

        # Strategy 3: Based on preferences
        preferences = user.get('preferences', {})
        if preferences:
            # Get general recommendations and filter by preferences
            all_product_ids = self.recommender.product_ids[:100]  # Sample
            filtered = self.recommender.filter_by_preferences(
                all_product_ids, preferences
            )
            recommendations['from_preferences'] = [
                (pid, 1.0) for pid in filtered[:n]
            ]

        return recommendations

    def merge_recommendations(self, recommendations: Dict[str, List[Tuple[str, float]]],
                            weights: Dict[str, float] = None) -> List[Tuple[str, float]]:
        """Merge recommendations from multiple strategies"""
        if weights is None:
            weights = {
                'from_favorites': 0.5,
                'from_history': 0.3,
                'from_preferences': 0.2
            }

        # Aggregate scores
        scores = defaultdict(float)

        for strategy, items in recommendations.items():
            weight = weights.get(strategy, 0.0)
            for pid, score in items:
                scores[pid] += score * weight

        # Sort by score
        sorted_items = sorted(scores.items(), key=lambda x: x[1], reverse=True)
        return sorted_items

# Initialize engine
engine = PersonalizationEngine(recommender, user_loader)
print('PersonalizationEngine initialized')

PersonalizationEngine initialized


---

## PART 8: Test Recommendations

In [20]:
# Test for all users
test_users = ['U001', 'U002', 'U003']

print('Testing personalization engine...')
print('='*60)

for user_id in test_users:
    user = user_loader.load_user(user_id)
    print(f"\nUser: {user['name']} ({user_id})")

    # Get recommendations
    recs = engine.recommend_for_user(user_id, n=5)

    # Show each strategy
    for strategy, items in recs.items():
        print(f"\n  {strategy}:")
        for pid, score in items[:3]:
            if pid in product_lookup:
                prod = product_lookup[pid]
                print(f"    - {prod.get('product_name', 'N/A')} (score: {score:.3f})")

print('\n' + '='*60)

Testing personalization engine...

User: Alice Johnson (U001)

  from_favorites:
    - White Dress (score: 0.852)
    - Navy Jacket (score: 0.815)
    - Blue Jacket (score: 0.766)

  from_history:
    - White Dress (score: 0.852)
    - Navy Jacket (score: 0.815)
    - Blue Jacket (score: 0.766)

  from_preferences:

User: Bob Smith (U002)

  from_favorites:
    - Navy Jacket (score: 0.791)
    - Navy Accessories (score: 0.787)
    - Navy Jacket (score: 0.781)

  from_history:
    - Navy Jacket (score: 0.791)
    - Navy Accessories (score: 0.787)
    - Navy Jacket (score: 0.781)

  from_preferences:

User: Carol Williams (U003)

  from_favorites:
    - Green Dress (score: 0.817)
    - Red Accessories (score: 0.800)
    - Orange Jacket (score: 0.720)

  from_history:
    - Green Dress (score: 0.817)
    - Red Accessories (score: 0.800)
    - Orange Jacket (score: 0.720)

  from_preferences:



---

## PART 9: Generate "For You" Page

In [21]:
def generate_for_you_page(user_id: str, n: int = 15) -> pd.DataFrame:
    """Generate complete 'For You' recommendation page"""
    # Get all recommendations
    recs = engine.recommend_for_user(user_id, n=n)

    # Merge with weighted scores
    merged = engine.merge_recommendations(recs)

    # Create dataframe
    results = []
    for pid, score in merged[:n]:
        if pid in product_lookup:
            prod = product_lookup[pid]
            results.append({
                'product_id': pid,
                'product_name': prod.get('product_name', 'N/A'),
                'category': prod.get('category', 'N/A'),
                'score': score
            })

    return pd.DataFrame(results)

# Generate for all users
print('Generating "For You" pages...')
print('='*60)

for user_id in test_users:
    user = user_loader.load_user(user_id)
    print(f"\n{user['name']} ({user_id}):")

    for_you = generate_for_you_page(user_id, n=10)
    print(for_you.to_string(index=False))

print('\n' + '='*60)

Generating "For You" pages...

Alice Johnson (U001):
product_id      product_name    category    score
     P1007       White Dress       shoes 0.681880
     P1075       Navy Jacket       shoes 0.651631
     P1034       Blue Jacket       shoes 0.612484
     P1065     Yellow Jacket       shoes 0.598761
     P1006        Gray Pants        suit 0.598026
     P1078 White Accessories       shoes 0.592808
     P1004        Gray Dress       shoes 0.588572
     P1072        Navy Dress      jacket 0.586655
     P1000       Green Shoes      jacket 0.586129
     P1073        Gray Shoes accessories 0.571538

Bob Smith (U002):
product_id     product_name    category    score
     P1071      Navy Jacket        suit 0.632701
     P1037 Navy Accessories      jacket 0.629668
     P1089      Navy Jacket       pants 0.625032
     P1038      White Pants       shirt 0.620414
     P1072       Navy Dress      jacket 0.603898
     P1063     Orange Shirt        suit 0.603085
     P1099         Red Suit        

---

## PART 10: Evaluation Metrics

In [22]:
def evaluate_recommendations(user_id: str) -> Dict[str, float]:
    """Evaluate recommendation quality"""
    user = user_loader.load_user(user_id)
    favorites = user_loader.load_favorites(user_id)

    # Get recommendations
    recs = engine.recommend_for_user(user_id, n=20)
    merged = engine.merge_recommendations(recs)
    recommended_ids = [pid for pid, _ in merged[:20]]

    # Calculate metrics
    metrics = {}

    # Check if we have recommendations
    if not recommended_ids:
        metrics['coverage'] = 0.0
        metrics['diversity'] = 0.0
        metrics['preference_match'] = 0.0
        return metrics

    # Coverage: percentage of unique products recommended
    metrics['coverage'] = len(set(recommended_ids)) / len(recommended_ids)

    # Diversity: category diversity
    categories = [product_lookup[pid]['category'] for pid in recommended_ids
                 if pid in product_lookup]
    metrics['diversity'] = len(set(categories)) / len(categories) if categories else 0.0

    # Preference match: how many match user's preferred colors/styles
    preferences = user.get('preferences', {})
    if preferences.get('colors'):
        matches = 0
        for pid in recommended_ids:
            if pid in product_lookup:
                prod = product_lookup[pid]
                colors = [prod.get('dominant_color', ''), prod.get('secondary_color', '')]
                if any(c in preferences['colors'] for c in colors if c):
                    matches += 1
        metrics['preference_match'] = matches / len(recommended_ids)
    else:
        metrics['preference_match'] = 0.0

    return metrics

# Evaluate all users
print('Evaluation Results')
print('='*60)

all_metrics = []
for user_id in test_users:
    user = user_loader.load_user(user_id)
    metrics = evaluate_recommendations(user_id)
    metrics['user_id'] = user_id
    metrics['name'] = user['name']
    all_metrics.append(metrics)

    print(f"\n{user['name']} ({user_id}):")
    for key, value in metrics.items():
        if key not in ['user_id', 'name']:
            print(f"  {key}: {value:.3f}")

# Average metrics
print('\nAverage Metrics:')
avg_metrics = pd.DataFrame(all_metrics)
numeric_cols = avg_metrics.select_dtypes(include=[np.number]).columns
for col in numeric_cols:
    print(f"  {col}: {avg_metrics[col].mean():.3f}")

print('='*60)

Evaluation Results

Alice Johnson (U001):
  coverage: 1.000
  diversity: 0.250
  preference_match: 0.750

Bob Smith (U002):
  coverage: 1.000
  diversity: 0.350
  preference_match: 0.800

Carol Williams (U003):
  coverage: 1.000
  diversity: 0.300
  preference_match: 0.750

Average Metrics:
  coverage: 1.000
  diversity: 0.300
  preference_match: 0.767


---

## PART 11: Save Results

In [23]:
# Save personalization results
OUTPUT_DIR = Path('v2.4-complete/evaluation/results')
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# Save recommendations for each user
for user_id in test_users:
    for_you = generate_for_you_page(user_id, n=20)
    output_file = OUTPUT_DIR / f'recommendations_{user_id}.csv'
    for_you.to_csv(output_file, index=False)
    print(f"Saved: {output_file}")

# Save evaluation metrics
metrics_df = pd.DataFrame(all_metrics)
metrics_file = OUTPUT_DIR / 'personalization_metrics.csv'
metrics_df.to_csv(metrics_file, index=False)
print(f"Saved: {metrics_file}")

print('\nAll results saved successfully')

Saved: v2.4-complete/evaluation/results/recommendations_U001.csv
Saved: v2.4-complete/evaluation/results/recommendations_U002.csv
Saved: v2.4-complete/evaluation/results/recommendations_U003.csv
Saved: v2.4-complete/evaluation/results/personalization_metrics.csv

All results saved successfully


---

## Summary

In [24]:
print('='*60)
print('PERSONALIZATION ENGINE COMPLETE')
print('='*60)

print('\nImplemented Features:')
print('  - Content-based filtering')
print('  - Multi-strategy recommendations')
print('  - Preference-aware filtering')
print('  - "For You" page generation')

print('\nRecommendation Strategies:')
print('  1. Similar to favorites (50% weight)')
print('  2. Based on search history (30% weight)')
print('  3. Preference matching (20% weight)')

print('\nEvaluation Metrics:')
print(f'  - Coverage: {avg_metrics["coverage"].mean():.3f}')
print(f'  - Diversity: {avg_metrics["diversity"].mean():.3f}')
if 'preference_match' in avg_metrics.columns:
    print(f'  - Preference Match: {avg_metrics["preference_match"].mean():.3f}')

print('\nOutput Files:')
print(f'  - Recommendations: {OUTPUT_DIR}/')
print(f'  - Metrics: {metrics_file}')

print('\nNext Steps:')
print('  - Integrate with user interface')
print('  - A/B testing setup')
print('  - Advanced filtering options')

print('='*60)

PERSONALIZATION ENGINE COMPLETE

Implemented Features:
  - Content-based filtering
  - Multi-strategy recommendations
  - Preference-aware filtering
  - "For You" page generation

Recommendation Strategies:
  1. Similar to favorites (50% weight)
  2. Based on search history (30% weight)
  3. Preference matching (20% weight)

Evaluation Metrics:
  - Coverage: 1.000
  - Diversity: 0.300
  - Preference Match: 0.767

Output Files:
  - Recommendations: v2.4-complete/evaluation/results/
  - Metrics: v2.4-complete/evaluation/results/personalization_metrics.csv

Next Steps:
  - Integrate with user interface
  - A/B testing setup
  - Advanced filtering options
