# Step : Review Classification - Tutorial

**Purpose:** Classify reviews using OpenAI GPT- into 7 predefined categories

**What you'll learn:**
- How to use the ReviewClassifier
- Single review vs batch classification
- Wide format conversion (7 category columns)
- Confidence threshold filtering
- Sentiment analysis results

**For Junior Developers:**
- Clear examples of API usage
- Visual outputs showing classification results
- Performance monitoring and progress tracking
- Error handling demonstrations

## What's New: Enhanced Classification Logic

This notebook now uses **proven classification logic from the Scripts folder**:

### Scripts Folder Integration:
- **Enhanced System Prompt**: Detailed rules, abstention policy, constraints
- **Few-Shot Learning**: representative examples included in every API call
- **Threshold Filtering**: Removes predictions below {config.CONF_THRESHOLD} confidence
- **Fallback Mechanism**: Assigns "Autre (positif/négatif)" when no categories pass threshold
- **Temperature=0**: Deterministic output (same input same result)
- **Robust Extraction**: fallback methods to parse OpenAI responses

### Input Compatibility:
Auto-detects column names from both:
- **New format** (enhanced collection): `text`, `rating`, `_place_id`, `_city`, `_business`
- **Old format** (legacy): `review_snippet`, `review_rating`, `_bank`

### Key Benefits:
- More accurate classifications (few-shot learning improves edge cases)
- No empty results (fallback ensures every review gets classified)
- Reproducible output (temperature=0)
- Backward compatible (works with old data)

## Setup and Imports

In [None]:
# Add parent directory to path
import sys
from pathlib import Path

project_root = Path().resolve().parent
sys.path.insert(0, str(project_root / "src"))

print(f" Project root: {project_root}")
print(f" Python path updated")

In [None]:
# Import required libraries
import pandas as pd
import json
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Import our modules (with reload to pick up latest changes)
import importlib
import sys

# Remove cached modules to force fresh import
if 'review_analyzer.config' in sys.modules:
 del sys.modules['review_analyzer.config']
if 'review_analyzer.classify' in sys.modules:
 del sys.modules['review_analyzer.classify']
if 'review_analyzer' in sys.modules:
 del sys.modules['review_analyzer']

from review_analyzer.classify import ReviewClassifier
from review_analyzer import config

print(" All imports successful!")
print(f"\nAvailable categories: {len(config.CATEGORIES)}")
print(f"Confidence threshold: {config.CONF_THRESHOLD}")
print(f"\n Classification uses:")
print(f" - Few-shot learning ( examples)")
print(f" - Threshold filtering ({config.CONF_THRESHOLD})")
print(f" - Fallback to 'Autre (±)' when needed")
print(f" - Temperature=0 (deterministic)")

## Data Architecture Overview

The pipeline uses an organized folder structure:

```
data/
 00_config/ # Static configurations
 cities/ # City aliases, coordinates, regions.geojson
 templates/ # Business templates
 0_raw/ # Immutable source data
 0_interim/ # Recomputable cache
 collection/ # Collected reviews (input for transform)
 transform/ # Normalized reviews (input for classification)
 0_processed/ # Final outputs
 classification/ # Classified reviews (this notebook's output)
 0_analysis/ # Reports, figures, dashboards
 99_archive/ # Deprecated data

logs/ # Pipeline execution logs
```

**This notebook processes:**
- Input: `data/0_interim/transform/reviews_normalized.parquet` (after transform step)
- Or fallback: `data/0_interim/collection/reviews.csv` (if transform skipped)
- Output: `data/0_processed/classification/reviews_classified.csv`


## Test : View Categories

**What this does:** Shows all 7 review categories

**Purpose:** Understand what aspects of service are being classified

**Note:** Categories are now organized with embedded descriptions (Scripts folder logic)

In [None]:
# Display all categories
print(f" REVIEW CLASSIFICATION CATEGORIES")
print(f"="*80)
print(f"Total categories: {len(config.CATEGORIES)}\n")

# Categories are now strings with embedded sentiment indicators
# Group by detecting keywords in category labels
positive_cats = []
negative_cats = []
neutral_cats = []
autre_cats = []

for cat in config.CATEGORIES:
 cat_lower = cat.lower()
 if 'autre' in cat_lower:
 autre_cats.append(cat)
 elif 'hors-sujet' in cat_lower:
 neutral_cats.append(cat)
 elif any(neg_word in cat_lower for neg_word in ['attente', 'lenteur', 'injoignable', 'réclamation', 'incident', 'frais', 'insatisfaction', 'manque']):
 negative_cats.append(cat)
 else:
 # Assume positive if not explicitly negative/neutral/autre
 if cat not in autre_cats and cat not in neutral_cats:
 positive_cats.append(cat)

print(f" POSITIVE CATEGORIES ({len(positive_cats)}):")
for i, cat in enumerate(positive_cats, ):
 # Show first 60 chars of label
 display_text = cat[:60] + "..." if len(cat) > 60 else cat
 print(f" {i}. {display_text}")

print(f"\n NEGATIVE CATEGORIES ({len(negative_cats)}):")
for i, cat in enumerate(negative_cats, ):
 display_text = cat[:60] + "..." if len(cat) > 60 else cat
 print(f" {i}. {display_text}")

print(f"\n NEUTRAL CATEGORIES ({len(neutral_cats)}):")
for i, cat in enumerate(neutral_cats, ):
 display_text = cat[:60] + "..." if len(cat) > 60 else cat
 print(f" {i}. {display_text}")

print(f"\n FALLBACK CATEGORIES ({len(autre_cats)}):")
for i, cat in enumerate(autre_cats, ):
 print(f" {i}. {cat}")

print(f"\n Note: Categories now include descriptions (from Scripts folder)")
print(f" Example: 'Accueil chaleureux... (expérience humaine positive...)'")
print(f" This improves OpenAI classification accuracy with few-shot learning")

## Test : Classify Single Review

**What this does:** Classifies ONE review to show how the API works

**Use case:** Understanding the classification process

**Expected output:** Categories detected, sentiment, confidence scores

In [None]:
# Initialize classifier
classifier = ReviewClassifier(debug=True)

print(" ReviewClassifier initialized successfully!")
print(f" Debug mode: {classifier.debug}")
print(f" OpenAI client ready: {classifier.client is not None}")

In [None]:
# Test with sample review
sample_review = """
Excellent service! Le personnel est très chaleureux et professionnel. 
L'agence est propre et bien organisée. J'ai été servi rapidement sans attente.
Je recommande vivement cette banque.
"""
sample_rating = 

print(f" TEST: Single Review Classification")
print(f"="*80)
print(f"\nReview text:")
print(f"{sample_review.strip()}")
print(f"Rating: {sample_rating} ")
print(f"\n{'='*80}")
print(f"Classifying with few-shot examples + threshold filtering...\n")

# Classify (returns dict with sentiment, categories, language, rationale)
result = classifier.classify_review(sample_review, sample_rating)

if result:
 print(f"\n CLASSIFICATION RESULT")
 print(f"="*80)
 print(f"\nOverall sentiment: {result.get('sentiment', 'N/A').upper()}")
 print(f"Language detected: {result.get('language', 'N/A')}")
 print(f"Rationale: {result.get('rationale', 'N/A')}")
 
 if 'categories' in result:
 categories = result['categories']
 print(f"\nCategories detected: {len(categories)}")
 print(f"(Filtered by confidence >= {config.CONF_THRESHOLD})\n")
 
 for cat_dict in categories:
 cat_label = cat_dict.get('label', 'Unknown')
 confidence = cat_dict.get('confidence', 0)
 
 # Visual indicator based on category name
 if any(neg in cat_label.lower() for neg in ['attente', 'lenteur', 'injoignable', 'réclamation', 'incident', 'frais', 'insatisfaction', 'manque']):
 icon = ''
 elif 'hors-sujet' in cat_label.lower():
 icon = ''
 elif 'autre' in cat_label.lower():
 icon = ''
 else:
 icon = ''
 
 # Show confidence bar
 bar = '' * int(confidence * 0)
 # Truncate label for display
 display_label = cat_label[:0] + "..." if len(cat_label) > 0 else cat_label
 print(f" {icon} {display_label:} {bar} {confidence:.f}")
 else:
 print("\n No categories detected (or all below threshold)")
 print(f" This triggers fallback to 'Autre ({result.get('sentiment', 'N/A')})'")
else:
 print(" Classification failed")

In [None]:
# Test with Scripts folder logic - edge case
print(f" TEST: Edge Case (Generic Negative)")
print(f"="*80)

test_review = "Service nul, madame est arrogante et j'ai attendu h"
test_rating = 

print(f"\nReview: '{test_review}'")
print(f"Rating: {test_rating} ")
print(f"\nExpected behavior:")
print(f" - Few-shot examples help identify specific issues")
print(f" - Threshold filtering removes low-confidence categories")
print(f" - Fallback assigns 'Autre (négatif)' if nothing passes threshold\n")

result = classifier.classify_review(test_review, test_rating)

if result:
 print(f" RESULT:")
 print(f" Sentiment: {result.get('sentiment', 'N/A')}")
 print(f" Language: {result.get('language', 'N/A')}")
 print(f" Rationale: {result.get('rationale', 'N/A')}")
 print(f"\n Categories ({len(result.get('categories', []))}):")
 
 for cat_dict in result.get('categories', []):
 label = cat_dict.get('label', 'Unknown')
 conf = cat_dict.get('confidence', 0)
 # Truncate for display
 display_label = label[:0] + "..." if len(label) > 0 else label
 bar = '' * int(conf * 0)
 print(f" • {display_label:} {bar} {conf:.f}")
 
 print(f"\n This demonstrates the Scripts folder logic:")
 print(f" Enhanced prompt with detailed rules")
 print(f" Few-shot examples improve accuracy")
 print(f" Threshold filtering prevents low-confidence predictions")
 print(f" Fallback ensures every review gets classified")
else:
 print(" Classification failed")

## Test : Classify Multiple Sample Reviews

**What this does:** Classifies - sample reviews to see variation

**Purpose:** Understanding different review types and classifications

In [None]:
# Sample reviews with different sentiments
sample_reviews = [
 {
 'id': ,
 'text': "Service rapide et efficace. Personnel très accueillant.",
 'rating': 
 },
 {
 'id': ,
 'text': "Attente trop longue. Le personnel manque de professionnalisme.",
 'rating': 
 },
 {
 'id': ,
 'text': "Banque classique, rien de spécial. Services standards.",
 'rating': 
 }
]

print(f" TEST: Multiple Sample Reviews")
print(f"="*80)
print(f"Classifying {len(sample_reviews)} reviews...\n")

results = []
for review in sample_reviews:
 print(f"\nReview {review['id']} ({review['rating']} ):")
 print(f" Text: {review['text']}")
 
 result = classifier.classify_review(review['text'], review['rating'])
 
 if result:
 sentiment = result.get('sentiment', 'unknown')
 n_categories = len(result.get('categories', []))
 print(f" Sentiment: {sentiment.upper()}, Categories: {n_categories}")
 results.append({**review, 'classification': result})
 else:
 print(f" Classification failed")

print(f"\n Classified {len(results)}/{len(sample_reviews)} reviews successfully")

## Test : Batch Classification (Small File)

**What this does:** Classifies reviews from CSV file (0-0 reviews)

**Use case:** Testing batch processing before full run

**Expected output:** CSV with original columns + classification results

In [None]:
# Load reviews from previous steps (transform or collection)
# Priority: transformed data > collected data > legacy path
input_file = None

# Option : Load from transform step (preferred - normalized data)
transform_parquet = project_root / "data" / "0_interim" / "transform" / "reviews_normalized.parquet"
transform_csv = project_root / "data" / "0_interim" / "transform" / "reviews_normalized.csv"

# Option : Load from collection step (if transform was skipped)
collection_parquet = project_root / "data" / "0_interim" / "collection" / "reviews.parquet"
collection_csv = project_root / "data" / "0_interim" / "collection" / "reviews.csv"

# Option : Legacy path
legacy_file = project_root / "data" / "output" / "reviews_for_classification.csv"

# Try in order of preference
for candidate in [transform_parquet, transform_csv, collection_parquet, collection_csv, legacy_file]:
 if candidate.exists():
 input_file = candidate
 break

if input_file and input_file.exists():
 # Load based on file type
 if input_file.suffix == '.parquet':
 reviews_df = pd.read_parquet(input_file)
 else:
 reviews_df = pd.read_csv(input_file)
 
 # Detect column names (new vs old format)
 text_col = "text" if "text" in reviews_df.columns else "review_snippet" if "review_snippet" in reviews_df.columns else None
 rating_col = "rating" if "rating" in reviews_df.columns else "review_rating" if "review_rating" in reviews_df.columns else None
 business_col = "_business" if "_business" in reviews_df.columns else "_bank" if "_bank" in reviews_df.columns else None
 
 # Take small sample for testing
 test_df = reviews_df.head(0).copy()
 test_input = project_root / "data" / "0_interim" / "classification" / "test_classify_small.csv"
 test_input.parent.mkdir(parents=True, exist_ok=True)
 test_df.to_csv(test_input, index=False)
 
 print(f" INPUT DATA LOADED")
 print(f"="*80)
 print(f" Source: {input_file.relative_to(project_root)}")
 print(f" Total reviews available: {len(reviews_df)}")
 print(f" Test sample size: {len(test_df)}")
 print(f" Columns: {list(test_df.columns)}")
 
 # Show detected format
 print(f"\n Detected format:")
 if text_col:
 print(f" Review text column: '{text_col}'")
 if rating_col:
 print(f" Rating column: '{rating_col}'")
 if business_col:
 print(f" Business column: '{business_col}'")
 
 # Check if data has been transformed
 if 'created_at' in reviews_df.columns or 'region' in reviews_df.columns:
 print(f"\n Transformed data detected!")
 if 'created_at' in reviews_df.columns:
 print(f" Normalized dates")
 if 'region' in reviews_df.columns:
 print(f" Regions added")
 region_count = reviews_df['region'].notna().sum()
 print(f" {region_count}/{len(reviews_df)} reviews have region data")
 
 # Show sample (handle both formats)
 print(f"\n Sample reviews:")
 display_cols = []
 if text_col:
 display_cols.append(text_col)
 if rating_col:
 display_cols.append(rating_col)
 if business_col and business_col in test_df.columns:
 display_cols.insert(0, business_col)
 if display_cols:
 display(test_df[display_cols].head())
 else:
 display(test_df.head())
 
else:
 print(f" No reviews file found!")
 print(f"\n Tried locations:")
 print(f" . {transform_parquet.relative_to(project_root)} (preferred)")
 print(f" . {collection_parquet.relative_to(project_root)}")
 print(f" . {legacy_file.relative_to(project_root)} (legacy)")
 print(f"\n Please run collect_reviews.ipynb first, or run the transform step:")
 print(f" python -m review_analyzer.main transform")


In [None]:
# Run batch classification
output_path = project_root / "data" / "0_processed" / "classification" / f"test_classified_small_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
output_path.parent.mkdir(parents=True, exist_ok=True)

print(f" BATCH CLASSIFICATION")
print(f"="*80)
print(f" Input: {test_input.relative_to(project_root)}")
print(f" Output: {output_path.relative_to(project_root)}")
print(f"\n Classifying {len(test_df)} reviews...")
print(f" Note: Column names auto-detected\n")

# Read the test data
df_to_classify = pd.read_csv(test_input)

# Classify (auto-detects 'text'/'review_snippet' and 'rating'/'review_rating')
classified_df = classifier.classify_batch(df_to_classify)

# Save results
classified_df.to_csv(output_path, index=False)

print("\n" + "="*80)
print(" CLASSIFICATION COMPLETE")
print("="*80)
print(f" Reviews processed: {len(classified_df)}")
print(f" Output saved: {output_path.relative_to(project_root)}")
print(f" New columns added: sentiment, categories_json, language, rationale")


In [None]:
# Inspect classified results
if output_path.exists():
 classified_df = pd.read_csv(output_path)
 
 print(f"\n CLASSIFIED RESULTS")
 print(f"="*80)
 print(f" Total rows: {len(classified_df)}")
 print(f" Columns: {list(classified_df.columns)}\n")
 
 # Detect columns
 text_col = "text" if "text" in classified_df.columns else "review_snippet"
 rating_col = "rating" if "rating" in classified_df.columns else "review_rating"
 
 # Show sample
 print("Sample classified reviews:")
 display_cols = [text_col, rating_col, 'sentiment', 'language']
 display(classified_df[display_cols].head())
 
 # Sentiment distribution
 if 'sentiment' in classified_df.columns:
 print(f"\n Sentiment Distribution:")
 sentiment_counts = classified_df['sentiment'].value_counts()
 for sentiment, count in sentiment_counts.items():
 percentage = (count / len(classified_df)) * 00
 bar = '' * int(percentage / )
 print(f" {sentiment:0} {count:} {bar} {percentage:.f}%")
 
 # Top categories (parse categories_json)
 if 'categories_json' in classified_df.columns:
 print(f"\n Top Categories Detected:")
 all_categories = []
 for cats_json in classified_df['categories_json'].dropna():
 try:
 cats_list = json.loads(cats_json) if isinstance(cats_json, str) else cats_json
 for cat_dict in cats_list:
 all_categories.append(cat_dict.get('label', 'Unknown'))
 except:
 pass
 
 if all_categories:
 from collections import Counter
 cat_counts = Counter(all_categories).most_common(0)
 for cat, count in cat_counts:
 print(f" {cat[:0]:} {count:} occurrences")
else:
 print(f" Output file not found")

## Test : Wide Format Classification

**What this does:** Classifies reviews and creates 7 binary category columns

**Use case:** Analysis-ready format for statistical modeling

**Expected output:** CSV with columns like cat_service, cat_competence, etc. (0/)

In [None]:
# Classify in wide format
output_path_wide = project_root / "data" / "0_processed" / "classification" / f"test_classified_wide_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
output_path_wide.parent.mkdir(parents=True, exist_ok=True)

print(f" WIDE FORMAT CLASSIFICATION")
print(f"="*80)
print(f" This creates binary category columns (0/)")
print(f" Perfect for statistical analysis and modeling\n")

# Read test data and classify
df_to_classify_wide = pd.read_csv(test_input)
classified_df_wide = classifier.classify_batch(df_to_classify_wide)

# Convert to wide format
wide_df = classifier.convert_to_wide_format(classified_df_wide)

# Save
wide_df.to_csv(output_path_wide, index=False)

print("\n" + "="*80)
print(" CLASSIFICATION COMPLETE")
print("="*80)
print(f" Reviews processed: {len(wide_df)}")
print(f" Output saved: {output_path_wide.relative_to(project_root)}")
print(f" Format: Wide format with binary category columns")


In [None]:
# Inspect wide format results
if output_path_wide.exists():
 wide_df = pd.read_csv(output_path_wide)
 
 # Detect columns
 text_col = "text" if "text" in wide_df.columns else "review_snippet"
 rating_col = "rating" if "rating" in wide_df.columns else "review_rating"
 
 print(f"\n WIDE FORMAT RESULTS")
 print(f"="*80)
 print(f" Total rows: {len(wide_df)}")
 print(f" Total columns: {len(wide_df.columns)}\n")
 
 # Find category columns (they match CATEGORIES list from config)
 cat_cols = [col for col in wide_df.columns if col in config.CATEGORIES]
 
 # Show all columns
 print(f"Column overview:")
 print(f" Original columns: {text_col}, {rating_col}, sentiment, language, etc.")
 print(f" Category columns ({len(cat_cols)}): binary 0/ flags\n")
 
 # Show top categories by frequency
 if cat_cols:
 cat_sums = wide_df[cat_cols].sum().sort_values(ascending=False)
 
 print(f" Top 0 Most Frequent Categories:")
 for i, (cat, count) in enumerate(cat_sums.head(0).items(), ):
 percentage = (count / len(wide_df)) * 00
 bar = '' * int(percentage / )
 print(f" {i:}. {cat[:0]:} {int(count):} {bar} {percentage:.f}%")
 
 # Show sample rows
 print(f"\n Sample rows (first category columns):")
 sample_cat_cols = cat_sums.head().index.tolist()
 display(wide_df[[text_col, rating_col, 'sentiment'] + sample_cat_cols].head())
 else:
 print(" No category columns found")
else:
 print(f" Output file not found")

## Test 6: Visualize Classification Results

**What this does:** Creates visualizations of classified data

**Outputs:**
- Sentiment distribution pie chart
- Category frequency bar chart
- Rating vs Sentiment heatmap
- Category co-occurrence matrix

In [None]:
# Load wide format for visualization
if output_path_wide.exists():
 wide_df = pd.read_csv(output_path_wide)
 
 # Detect columns
 rating_col = "rating" if "rating" in wide_df.columns else "review_rating"
 
 print(f" CLASSIFICATION VISUALIZATIONS")
 print(f"="*80)
 print(f"Total reviews: {len(wide_df)}\n")
 
 # Find category columns
 cat_cols = [col for col in wide_df.columns if col in config.CATEGORIES]
 
 # Set style
 plt.style.use('seaborn-v0_8-whitegrid')
 
 # . Sentiment distribution
 if 'sentiment' in wide_df.columns:
 fig, ax = plt.subplots(figsize=(8, 8))
 sentiment_counts = wide_df['sentiment'].value_counts()
 colors = ['#ecc7', '#e7cc', '#9aa6'] # Green, red, gray
 ax.pie(sentiment_counts, labels=sentiment_counts.index, autopct='%.f%%',
 colors=colors, startangle=90)
 ax.set_title('Sentiment Distribution', fontsize=, fontweight='bold')
 plt.tight_layout()
 plt.show()
 
 # . Top 0 categories
 if cat_cols:
 cat_sums = wide_df[cat_cols].sum().sort_values(ascending=True).tail(0)
 
 fig, ax = plt.subplots(figsize=(, 6))
 cat_sums.plot(kind='barh', ax=ax, color='steelblue')
 ax.set_title('Top 0 Most Frequent Categories', fontsize=, fontweight='bold')
 ax.set_xlabel('Number of Reviews', fontsize=)
 ax.set_ylabel('Category', fontsize=)
 ax.grid(axis='x', alpha=0.)
 plt.tight_layout()
 plt.show()
 
 # . Rating vs Sentiment
 if rating_col in wide_df.columns and 'sentiment' in wide_df.columns:
 crosstab = pd.crosstab(wide_df[rating_col], wide_df['sentiment'])
 
 fig, ax = plt.subplots(figsize=(0, 6))
 sns.heatmap(crosstab, annot=True, fmt='d', cmap='YlOrRd', ax=ax)
 ax.set_title('Rating vs Sentiment Analysis', fontsize=, fontweight='bold')
 ax.set_xlabel('Sentiment', fontsize=)
 ax.set_ylabel('Rating (Stars)', fontsize=)
 plt.tight_layout()
 plt.show()
 
 # . Category co-occurrence (top 8 categories)
 if cat_cols and len(cat_cols) >= 8:
 top_8_cats = wide_df[cat_cols].sum().sort_values(ascending=False).head(8).index
 corr_matrix = wide_df[top_8_cats].corr()
 
 fig, ax = plt.subplots(figsize=(0, 8))
 sns.heatmap(corr_matrix, annot=True, fmt='.f', cmap='coolwarm', 
 center=0, ax=ax, square=True)
 ax.set_title('Category Co-occurrence Matrix (Top 8)', fontsize=, fontweight='bold')
 plt.tight_layout()
 plt.show()
 
 print("\n Interpretation:")
 print(" Positive values (red) = categories often appear together")
 print(" Negative values (blue) = categories rarely appear together")
 print(" Values close to 0 (white) = no correlation")
else:
 print(" No classified data found. Run Test first!")

## Test 7: Confidence Threshold Impact

**What this does:** Shows how confidence threshold affects classification

**Purpose:** Understanding the trade-off between precision and recall

In [None]:
# Analyze confidence threshold impact
thresholds = [0., 0., 0., 0., 0.6, 0.7, 0.8]

print(f" CONFIDENCE THRESHOLD IMPACT")
print(f"="*80)
print(f"Current threshold: {config.CONF_THRESHOLD}")
print(f"\nHigher threshold Fewer but more confident predictions")
print(f"Lower threshold More predictions but less confident\n")

if output_path_wide.exists():
 wide_df = pd.read_csv(output_path_wide)
 
 # Find category columns (match CATEGORIES list)
 cat_cols = [col for col in wide_df.columns if col in config.CATEGORIES]
 
 if cat_cols:
 current_categories = wide_df[cat_cols].sum().sum()
 avg_per_review = current_categories / len(wide_df)
 
 print(f"With current threshold ({config.CONF_THRESHOLD}):")
 print(f" Total category assignments: {int(current_categories)}")
 print(f" Average categories per review: {avg_per_review:.f}")
 print(f" Reviews with at least category: {(wide_df[cat_cols].sum(axis=) > 0).sum()}")
 
 # Estimate impact of different thresholds
 print(f"\n Estimated impact of different thresholds:")
 for threshold in thresholds:
 # Rough estimation based on normal distribution
 factor = - (threshold - config.CONF_THRESHOLD) * 0.
 estimated_cats = int(current_categories * max(0., factor))
 bar = '' * int(factor * 0)
 
 marker = ' CURRENT' if abs(threshold - config.CONF_THRESHOLD) < 0.0 else ''
 print(f" {threshold:.f}: ~{estimated_cats:} total categories {bar} {marker}")
 
 print(f"\n Recommendation:")
 print(f" - Use {config.CONF_THRESHOLD} (current) for balanced results")
 print(f" - Use 0.70+ for high-confidence predictions only")
 print(f" - Use 0.0- for exploratory analysis")
 
 print(f"\n Fallback mechanism:")
 print(f" - If no categories pass threshold assigns 'Autre (positif/négatif)'")
 print(f" - Ensures every review gets at least one classification")
 print(f" - Based on overall sentiment from OpenAI")
 else:
 print(" No category columns found")
else:
 print(" No data available. Run Test first!")

## Export Final Results

**What this does:** Prepares classified data for analysis

**Outputs:**
- CSV with wide format (ready for analysis)
- Summary statistics
- Export confirmation

In [None]:
# Export final results
if output_path_wide.exists():
 wide_df = pd.read_csv(output_path_wide)
 
 # Detect columns
 rating_col = "rating" if "rating" in wide_df.columns else "review_rating" if "review_rating" in wide_df.columns else None
 business_col = "_business" if "_business" in wide_df.columns else "_bank" if "_bank" in wide_df.columns else None
 
 # Save to new data architecture
 final_export = project_root / "data" / "0_processed" / "classification" / "reviews_classified_final.csv"
 final_export.parent.mkdir(parents=True, exist_ok=True)
 wide_df.to_csv(final_export, index=False)
 
 # Also save to legacy path for backward compatibility
 legacy_export = project_root / "data" / "output" / "reviews_classified_final.csv"
 legacy_export.parent.mkdir(parents=True, exist_ok=True)
 wide_df.to_csv(legacy_export, index=False)
 
 print(f" EXPORT COMPLETE")
 print(f"="*80)
 print(f" Primary: {final_export.relative_to(project_root)}")
 print(f" Legacy: {legacy_export.relative_to(project_root)}")
 print(f" Records: {len(wide_df)}")
 print(f" Columns: {len(wide_df.columns)}")
 
 # Category columns
 cat_cols = [col for col in wide_df.columns if col in config.CATEGORIES]
 print(f" Category columns: {len(cat_cols)}")
 
 # Summary stats
 print(f"\n SUMMARY STATISTICS:")
 if 'sentiment' in wide_df.columns:
 sentiment_dist = wide_df['sentiment'].value_counts()
 print(f"\n Sentiment:")
 for sent, count in sentiment_dist.items():
 print(f" {sent}: {count} ({count/len(wide_df)*00:.f}%)")
 
 if rating_col and rating_col in wide_df.columns:
 print(f"\n Rating:")
 print(f" Average: {wide_df[rating_col].mean():.f} ")
 print(f" Median: {wide_df[rating_col].median():.f} ")
 
 if cat_cols:
 total_assignments = wide_df[cat_cols].sum().sum()
 avg_per_review = total_assignments / len(wide_df)
 print(f"\n Categories:")
 print(f" Total assignments: {int(total_assignments)}")
 print(f" Average per review: {avg_per_review:.f}")
 print(f" Reviews with categories: {(wide_df[cat_cols].sum(axis=) > 0).sum()}")
 
 # Show business breakdown if available
 if business_col and business_col in wide_df.columns:
 business_counts = wide_df[business_col].value_counts()
 print(f"\n Reviews by business/location:")
 for business, count in business_counts.head().items():
 print(f" - {business}: {count}")
 if len(business_counts) > :
 print(f" ... and {len(business_counts) - } more")
 
 # Show region breakdown if available
 if 'region' in wide_df.columns:
 region_counts = wide_df['region'].value_counts()
 print(f"\n Reviews by region:")
 for region, count in region_counts.head().items():
 print(f" - {region}: {count}")
 if len(region_counts) > :
 print(f" ... and {len(region_counts) - } more")
 
 print(f"\n Data saved to: data/0_processed/classification/")
 print(f"\n Ready for analysis! ")
 print(f" You can now:")
 print(f" • Open in Excel for pivot tables")
 print(f" • Load in Python/R for statistical analysis")
 print(f" • Create dashboards with Power BI/Tableau")
 print(f" • Build aggregates by region, city, or business")
else:
 print(" No classified data found. Run Test first!")


## Summary for New Developers

**What you learned:**

. **ReviewClassifier** - AI-powered review classification using OpenAI GPT-
. **Flexible Input Format** - Auto-detects column names from both new and old formats:
 - New: `text`, `rating`, `_place_id`, `_city`, `_business`
 - Old: `review_snippet`, `review_rating`, `_bank`
. **Scripts Folder Logic Integration** - The classifier now uses proven logic from Scripts:
 - Enhanced system prompt with detailed rules and abstention policy
 - Few-shot learning ( representative examples included in every API call)
 - Threshold filtering at {config.CONF_THRESHOLD} confidence
 - Fallback mechanism: assigns "Autre (positif/négatif)" when no categories pass threshold
 - Temperature=0 for deterministic, reproducible results
 - Robust message extraction ( fallback methods for OpenAI responses)
. **Classification Categories** - 7 categories with embedded descriptions
. **Output formats:**
 - Long format: nested JSON with categories (categories_json column)
 - Wide format: binary columns (0/) for each category - perfect for analysis
6. **Sentiment analysis** - Overall Positif/Négatif/Neutre classification
7. **Multi-language support** - Handles French, English, Arabic/Darija, emojis
8. **New data architecture** - Organized folder structure for better data management

**Key takeaways:**
- Column names are auto-detected - no manual specification needed
- Test with single reviews first using `classify_review(text, rating)`
- Use `classify_batch(df)` for DataFrames - auto-detects columns
- Use wide format for statistical analysis (`convert_to_wide_format()`)
- Confidence threshold balances precision vs recall
- Fallback ensures every review gets classified (no empty results)
- Few-shot examples improve accuracy on edge cases
- Checkpoint system prevents data loss during long runs (built into classify_batch)
- Data flows through organized pipeline stages

**Understanding the results:**
- **sentiment**: Overall feeling (Positif/Négatif/Neutre)
- **categories_json**: List of detected categories with confidence scores (filtered by threshold)
- **language**: Detected language of the review
- **rationale**: Brief explanation of the classification
- Wide format: Each category becomes a binary column ( if detected, 0 if not)

**Scripts Folder Logic (Now Integrated):**
- Enhanced prompt with RÈGLES DE CLASSIFICATION and POLITIQUE D'ABSTENTION
- Few-shot examples ( samples covering mixed, digital, generic, off-topic cases)
- Threshold filtering removes low-confidence predictions
- Fallback mechanism assigns "Autre (±)" based on sentiment
- Message extraction with fallback methods for robustness
- Temperature=0 for deterministic output
- CATEGORIES and CONF_THRESHOLD naming convention

**Data Flow (New Architecture):**
```
discover collect transform classify
 ↓ ↓ ↓ ↓
0_raw/ 0_interim/ 0_interim/ 0_processed/
discovery collection transform classification
```

**This notebook processes:**
- **Input**: `data/0_interim/transform/reviews_normalized.parquet` (after transform)
 - Or: `data/0_interim/collection/reviews.csv` (if transform skipped)
- **Output**: `data/0_processed/classification/reviews_classified_final.csv`

**Backward Compatibility:**
- Works with old collection format (review_snippet, review_rating, _bank)
- Works with new collection format (text, rating, _business)
- Works with transformed data (includes region, created_at, normalized fields)
- Seamlessly integrates with enhanced discovery (canonical place IDs, OSM data)
- Old config names still work (REVIEW_CATEGORIES, CONFIDENCE_THRESHOLD)

**Next steps:**
. You've discovered place_ids (Step : discover_placeids.ipynb)
. You've collected reviews (Step : collect_reviews.ipynb)
. Transform reviews - normalize fields, add regions (Optional but recommended)
. You've classified reviews (Step : classify_reviews.ipynb - this notebook!)
. Next: Analyze results, create reports, build dashboards!

**For production runs:**
- Use `classify_batch(df)` for automatic column detection
- Convert to wide format with `convert_to_wide_format(df)` for analysis
- Monitor API usage and costs (temperature=0 helps with consistency)
- Validate results with sample reviews
- Trust the fallback mechanism - it prevents empty classifications

**Pipeline Command (Run all steps):**
```bash
python -m review_analyzer.main pipeline \
 --businesses "Attijariwafa Bank" \
 --cities "Casablanca" \
 --business-type "bank"
```

**Troubleshooting:**
- API errors? Check .env file has OPENAI_API_KEY set
- No categories? Check threshold - might be too high. Fallback should assign "Autre (±)"
- High API costs? Reduce number of reviews or batch size
- Column not found errors? Check your input CSV has either text/review_snippet and rating/review_rating columns
- Inconsistent results? Temperature is set to 0 for reproducibility - same input = same output

Happy classifying! 
