# Week 7: Advanced EDA with Business Intelligence - Part 2: Product Performance Metrics and Insights

## Learning Objectives
By the end of this session, you will be able to:
- Conduct comprehensive product performance analysis using advanced EDA techniques
- Implement product lifecycle and portfolio analysis strategies
- Apply advanced metrics for product success measurement
- Create data-driven product recommendations and optimization strategies
- Build product performance dashboards for business intelligence

## Business Context
Building on our customer behavior insights, we now focus on **product performance analytics** to understand:
- **Product Success Drivers**: What makes certain products outperform others?
- **Category Performance**: Which product categories drive the most value?
- **Cross-selling Opportunities**: Which products are frequently bought together?
- **Inventory Optimization**: How to optimize product mix based on performance data?

**Key Business Questions:**
- Which products and categories generate the highest revenue and profit?
- What are the patterns in product performance across different dimensions?
- How can we identify products with cross-selling potential?
- Which products are underperforming and why?

## 1. Environment Setup and Secure Data Connection

In [3]:
# Essential imports for product performance analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Advanced analytics libraries
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from scipy import stats
from scipy.stats import chi2_contingency

# Network analysis for product relationships
import networkx as nx
from itertools import combinations

# Advanced visualization
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.offline as pyo
pyo.init_notebook_mode(connected=True)

# Database connection (secure)
import os
from sqlalchemy import create_engine

# Display and plotting settings
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (14, 8)

print("‚úÖ Environment setup complete for product performance analysis!")

‚úÖ Environment setup complete for product performance analysis!


In [None]:
# Secure Database Connection Using Environment Variables
# Best practice: Never expose credentials in code

# Load environment variables from .env file
from dotenv import load_dotenv
load_dotenv()

# Read database credentials from environment variables
SUPABASE_URL = os.getenv('SUPABASE_URL')
SUPABASE_KEY = os.getenv('SUPABASE_KEY')

# Alternative: Use legacy postgres connection if needed
POSTGRES_HOST = os.getenv('POSTGRES_HOST')
POSTGRES_PORT = os.getenv('POSTGRES_PORT', '6543')
POSTGRES_DB = os.getenv('POSTGRES_DATABASE', 'postgres')
POSTGRES_USER = os.getenv('POSTGRES_USER')
POSTGRES_PASSWORD = os.getenv('POSTGRES_PASSWORD')

# Construct secure database URL using environment variables
if POSTGRES_HOST and POSTGRES_USER and POSTGRES_PASSWORD:
    DATABASE_URL = f"postgresql://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}:{POSTGRES_PORT}/{POSTGRES_DB}"
else:
    print("‚ùå Database credentials not found in environment variables")
    print("Please check your .env file contains the required database credentials")

# Create database engine
engine = create_engine(DATABASE_URL)

# Test connection
try:
    with engine.connect() as conn:
        from sqlalchemy import text
        result = conn.execute(text("SELECT count(*) FROM olist_sales_data_set.olist_geolocation_dataset"))
        count = result.scalar()
        print(f"‚úÖ Secure database connection established! ({count:,} records in geolocation table)")
except Exception as e:
    print(f"‚ùå Connection failed: {e}")

print("üîí Security Note: Database credentials loaded from .env file")

‚ùå Connection failed: (psycopg2.OperationalError) connection to server at "aws-0-us-east-1.pooler.supabase.com" (52.45.94.125), port 6543 failed: FATAL:  Tenant or user not found
connection to server at "aws-0-us-east-1.pooler.supabase.com" (52.45.94.125), port 6543 failed: FATAL:  Tenant or user not found

(Background on this error at: https://sqlalche.me/e/20/e3q8)
üîí Security Note: Database credentials loaded from environment variables


## 2. Comprehensive Product Data Loading

Load detailed product performance data for analysis.

In [None]:
# Comprehensive Product Performance Dataset
print("üîÑ Loading comprehensive product performance dataset...")

# Product performance and analytics query
product_performance_query = """
WITH product_sales AS (
    SELECT 
        p.product_id,
        p.product_category_name,
        COALESCE(pt.product_category_name_english, p.product_category_name) as category_english,
        p.product_weight_g,
        p.product_length_cm,
        p.product_height_cm,
        p.product_width_cm,
        p.product_photos_qty,
        p.product_name_lenght,
        p.product_description_lenght,
        (p.product_length_cm * p.product_height_cm * p.product_width_cm) / 1000.0 as product_volume_liters,
        
        oi.order_id,
        oi.price,
        oi.freight_value,
        (oi.price + oi.freight_value) as total_item_value,
        oi.freight_value / NULLIF(oi.price, 0) as freight_ratio,
        
        o.order_purchase_timestamp,
        o.customer_id,
        EXTRACT(YEAR FROM o.order_purchase_timestamp) as order_year,
        EXTRACT(MONTH FROM o.order_purchase_timestamp) as order_month,
        EXTRACT(QUARTER FROM o.order_purchase_timestamp) as order_quarter,
        
        c.customer_state,
        c.customer_city,
        
        r.review_score,
        CASE 
            WHEN r.review_score >= 4 THEN 'High'
            WHEN r.review_score = 3 THEN 'Medium'
            WHEN r.review_score <= 2 THEN 'Low'
            ELSE 'No Review'
        END as satisfaction_category
        
    FROM olist_sales_data_set.olist_products_dataset p
    JOIN olist_sales_data_set.olist_order_items_dataset oi ON p.product_id = oi.product_id
    JOIN olist_sales_data_set.olist_orders_dataset o ON oi.order_id = o.order_id
    JOIN olist_sales_data_set.olist_customers_dataset c ON o.customer_id = c.customer_id
    LEFT JOIN olist_sales_data_set.product_category_name_translation pt 
        ON p.product_category_name = pt.product_category_name
    LEFT JOIN olist_sales_data_set.olist_order_reviews_dataset r ON o.order_id = r.order_id
    
    WHERE o.order_status = 'delivered'
    AND oi.price > 0
)
SELECT * FROM product_sales
LIMIT 30000;
"""

# Load the data
product_df = pd.read_sql(product_performance_query, engine)

# Data preprocessing
product_df['order_purchase_timestamp'] = pd.to_datetime(product_df['order_purchase_timestamp'])
product_df['category_clean'] = product_df['category_english'].fillna('Unknown').str.title()

# Calculate additional business metrics
product_df['profit_margin'] = (product_df['price'] - product_df['freight_value']) / product_df['price']
product_df['price_per_gram'] = product_df['price'] / (product_df['product_weight_g'] + 1)  # +1 to avoid division by zero
product_df['price_per_liter'] = product_df['price'] / (product_df['product_volume_liters'] + 1)

# Remove extreme outliers for better analysis
price_q99 = product_df['price'].quantile(0.99)
product_df = product_df[product_df['price'] <= price_q99].copy()

print(f"‚úÖ Product performance dataset loaded successfully!")
print(f"   üìä Total records: {len(product_df):,}")
print(f"   üì¶ Unique products: {product_df['product_id'].nunique():,}")
print(f"   üõí Unique orders: {product_df['order_id'].nunique():,}")
print(f"   üè∑Ô∏è Product categories: {product_df['category_clean'].nunique()}")
print(f"   üìÖ Analysis period: {product_df['order_purchase_timestamp'].min().date()} to {product_df['order_purchase_timestamp'].max().date()}")

# Display sample data
print("\nüìã Sample Product Performance Data:")
display(product_df[['product_id', 'category_clean', 'price', 'total_item_value', 
                  'review_score', 'satisfaction_category']].head())

## 3. Product Portfolio Analysis

Comprehensive analysis of product performance across multiple dimensions.

In [None]:
# Product Portfolio Performance Analysis
print("üìä Product Portfolio Performance Analysis")
print("=" * 45)

def analyze_product_portfolio(data):
    """
    Comprehensive product portfolio analysis
    """
    portfolio_analysis = {}
    
    # Product-level metrics
    product_metrics = data.groupby('product_id').agg({
        'price': ['count', 'mean', 'std'],
        'total_item_value': ['sum', 'mean'],
        'review_score': ['mean', 'count'],
        'order_id': 'nunique',  # Number of unique orders
        'customer_id': 'nunique',  # Number of unique customers
        'freight_value': 'mean',
        'profit_margin': 'mean',
        'category_clean': 'first',
        'product_weight_g': 'first',
        'product_volume_liters': 'first'
    }).reset_index()
    
    # Flatten column names
    product_metrics.columns = [
        'product_id', 'sales_volume', 'avg_price', 'price_std', 
        'total_revenue', 'avg_order_value', 'avg_review_score', 'review_count',
        'unique_orders', 'unique_customers', 'avg_freight', 'avg_profit_margin',
        'category', 'weight_g', 'volume_liters'
    ]
    
    # Calculate additional metrics
    product_metrics['revenue_per_customer'] = product_metrics['total_revenue'] / product_metrics['unique_customers']
    product_metrics['repeat_rate'] = product_metrics['sales_volume'] / product_metrics['unique_customers']
    product_metrics['review_engagement'] = product_metrics['review_count'] / product_metrics['sales_volume']
    
    # Category-level analysis
    category_metrics = data.groupby('category_clean').agg({
        'product_id': 'nunique',
        'price': ['count', 'mean', 'std'],
        'total_item_value': ['sum', 'mean'],
        'review_score': ['mean', 'std'],
        'customer_id': 'nunique',
        'freight_value': 'mean',
        'profit_margin': 'mean'
    }).round(2)
    
    # Flatten category column names
    category_metrics.columns = [
        'unique_products', 'total_sales', 'avg_price', 'price_std',
        'total_revenue', 'avg_order_value', 'avg_review_score', 'review_std',
        'unique_customers', 'avg_freight', 'avg_profit_margin'
    ]
    
    # Calculate category market share
    total_revenue = category_metrics['total_revenue'].sum()
    category_metrics['revenue_share'] = (category_metrics['total_revenue'] / total_revenue * 100).round(2)
    
    portfolio_analysis['product_metrics'] = product_metrics
    portfolio_analysis['category_metrics'] = category_metrics
    
    return portfolio_analysis

# Perform portfolio analysis
portfolio_results = analyze_product_portfolio(product_df)
product_metrics = portfolio_results['product_metrics']
category_metrics = portfolio_results['category_metrics']

print(f"üìà Portfolio Analysis Results:")
print(f"   ‚Ä¢ Products analyzed: {len(product_metrics):,}")
print(f"   ‚Ä¢ Categories analyzed: {len(category_metrics)}")
print(f"   ‚Ä¢ Total revenue: R$ {category_metrics['total_revenue'].sum():,.2f}")
print(f"   ‚Ä¢ Average product revenue: R$ {product_metrics['total_revenue'].mean():.2f}")

# Top performing categories
top_categories = category_metrics.sort_values('total_revenue', ascending=False).head(10)
print(f"\nüèÜ Top 10 Categories by Revenue:")
display(top_categories[['total_revenue', 'avg_price', 'avg_review_score', 'revenue_share', 'unique_customers']])

# Product performance distribution analysis
print(f"\nüìä Product Performance Distribution:")
performance_stats = product_metrics[['total_revenue', 'sales_volume', 'avg_review_score', 'repeat_rate']].describe()
display(performance_stats)

# Identify star products (high revenue + high satisfaction)
revenue_threshold = product_metrics['total_revenue'].quantile(0.8)
satisfaction_threshold = 4.0

star_products = product_metrics[
    (product_metrics['total_revenue'] >= revenue_threshold) & 
    (product_metrics['avg_review_score'] >= satisfaction_threshold)
].sort_values('total_revenue', ascending=False)

print(f"\n‚≠ê Star Products (High Revenue + High Satisfaction):")
print(f"   ‚Ä¢ Star products identified: {len(star_products)}")
print(f"   ‚Ä¢ Revenue contribution: R$ {star_products['total_revenue'].sum():,.2f}")
print(f"   ‚Ä¢ Average satisfaction: {star_products['avg_review_score'].mean():.2f}")

if len(star_products) > 0:
    print(f"\n   Top 5 Star Products:")
    display(star_products[['product_id', 'category', 'total_revenue', 'avg_review_score', 'sales_volume']].head())

In [None]:
# Product Lifecycle and Performance Matrix Analysis
print("üîÑ Product Lifecycle and Performance Matrix Analysis")
print("=" * 55)

def create_product_performance_matrix(product_metrics):
    """
    Create BCG-style product performance matrix
    """
    # Calculate market share and growth proxies
    product_metrics['market_share_proxy'] = product_metrics['sales_volume'] / product_metrics['sales_volume'].sum()
    product_metrics['growth_proxy'] = product_metrics['repeat_rate']  # Using repeat rate as growth indicator
    
    # Define thresholds (median splits)
    share_median = product_metrics['market_share_proxy'].median()
    growth_median = product_metrics['growth_proxy'].median()
    
    # Classify products into matrix quadrants
    def classify_product(row):
        share = row['market_share_proxy']
        growth = row['growth_proxy']
        
        if share >= share_median and growth >= growth_median:
            return 'Stars'  # High share, high growth
        elif share >= share_median and growth < growth_median:
            return 'Cash Cows'  # High share, low growth
        elif share < share_median and growth >= growth_median:
            return 'Question Marks'  # Low share, high growth
        else:
            return 'Dogs'  # Low share, low growth
    
    product_metrics['portfolio_category'] = product_metrics.apply(classify_product, axis=1)
    
    return product_metrics

# Create performance matrix
product_metrics_matrix = create_product_performance_matrix(product_metrics)

# Analyze portfolio distribution
portfolio_distribution = product_metrics_matrix['portfolio_category'].value_counts()
portfolio_revenue = product_metrics_matrix.groupby('portfolio_category')['total_revenue'].sum().sort_values(ascending=False)

print(f"üìä Product Portfolio Matrix Distribution:")
print(f"\nProduct Count by Category:")
for category, count in portfolio_distribution.items():
    percentage = (count / len(product_metrics_matrix)) * 100
    revenue = portfolio_revenue.get(category, 0)
    print(f"   ‚Ä¢ {category}: {count:,} products ({percentage:.1f}%) - Revenue: R$ {revenue:,.2f}")

# Detailed analysis by portfolio category
portfolio_analysis = product_metrics_matrix.groupby('portfolio_category').agg({
    'total_revenue': ['sum', 'mean', 'count'],
    'avg_review_score': 'mean',
    'sales_volume': 'mean',
    'repeat_rate': 'mean',
    'avg_profit_margin': 'mean'
}).round(2)

# Flatten columns
portfolio_analysis.columns = ['total_revenue', 'avg_revenue', 'product_count', 
                             'avg_satisfaction', 'avg_sales_volume', 'avg_repeat_rate', 'avg_profit_margin']

print(f"\nüìà Portfolio Category Performance Metrics:")
display(portfolio_analysis)

# Strategic recommendations by category
print(f"\nüéØ Strategic Recommendations by Portfolio Category:")

for category in portfolio_distribution.index:
    category_data = product_metrics_matrix[product_metrics_matrix['portfolio_category'] == category]
    avg_revenue = category_data['total_revenue'].mean()
    avg_satisfaction = category_data['avg_review_score'].mean()
    
    print(f"\n   {category.upper()}:")
    
    if category == 'Stars':
        print(f"     Strategy: Invest and grow")
        print(f"     Actions: Increase marketing spend, expand inventory, optimize pricing")
        print(f"     Monitor: Maintain high satisfaction while scaling")
    
    elif category == 'Cash Cows':
        print(f"     Strategy: Harvest and maintain")
        print(f"     Actions: Optimize costs, maintain quality, steady promotion")
        print(f"     Monitor: Profit margins and customer retention")
    
    elif category == 'Question Marks':
        print(f"     Strategy: Selective investment")
        print(f"     Actions: Test market expansion, improve product features")
        print(f"     Monitor: Growth trajectory and market response")
    
    else:  # Dogs
        print(f"     Strategy: Divest or revitalize")
        print(f"     Actions: Discontinue poor performers, reposition promising ones")
        print(f"     Monitor: Cost vs. revenue break-even")
    
    print(f"     Current performance: R$ {avg_revenue:.2f} avg revenue, {avg_satisfaction:.2f} satisfaction")

## 4. Advanced Category Performance Analysis

Deep dive into category-level performance patterns and insights.

In [None]:
# Advanced Category Performance Analysis
print("üè∑Ô∏è Advanced Category Performance Analysis")
print("=" * 45)

def analyze_category_performance(data):
    """
    Comprehensive category performance analysis
    """
    # Enhanced category metrics
    category_analysis = data.groupby('category_clean').agg({
        'product_id': 'nunique',
        'price': ['count', 'mean', 'std', 'min', 'max'],
        'total_item_value': ['sum', 'mean'],
        'review_score': ['mean', 'std', 'count'],
        'customer_id': 'nunique',
        'order_id': 'nunique',
        'freight_value': 'mean',
        'profit_margin': 'mean',
        'product_weight_g': 'mean',
        'product_volume_liters': 'mean',
        'order_year': lambda x: x.mode().iloc[0] if len(x.mode()) > 0 else x.iloc[0],
        'order_month': lambda x: x.mode().iloc[0] if len(x.mode()) > 0 else x.iloc[0]
    }).round(2)
    
    # Flatten column names
    category_analysis.columns = [
        'unique_products', 'total_sales', 'avg_price', 'price_std', 'min_price', 'max_price',
        'total_revenue', 'avg_order_value', 'avg_review_score', 'review_std', 'review_count',
        'unique_customers', 'unique_orders', 'avg_freight', 'avg_profit_margin',
        'avg_weight', 'avg_volume', 'peak_year', 'peak_month'
    ]
    
    # Calculate additional business metrics
    total_revenue = category_analysis['total_revenue'].sum()
    category_analysis['revenue_share'] = (category_analysis['total_revenue'] / total_revenue * 100).round(2)
    category_analysis['price_range'] = category_analysis['max_price'] - category_analysis['min_price']
    category_analysis['sales_per_product'] = category_analysis['total_sales'] / category_analysis['unique_products']
    category_analysis['customer_penetration'] = category_analysis['unique_customers'] / data['customer_id'].nunique() * 100
    category_analysis['repeat_purchase_rate'] = category_analysis['total_sales'] / category_analysis['unique_customers']
    
    return category_analysis

# Perform enhanced category analysis
category_performance = analyze_category_performance(product_df)

# Sort by revenue for display
category_performance_sorted = category_performance.sort_values('total_revenue', ascending=False)

print(f"üìä Enhanced Category Performance Results:")
print(f"   ‚Ä¢ Categories analyzed: {len(category_performance)}")
print(f"   ‚Ä¢ Total revenue analyzed: R$ {category_performance['total_revenue'].sum():,.2f}")

# Display top performing categories
print(f"\nüèÜ Top 10 Categories - Comprehensive Performance:")
top_categories_enhanced = category_performance_sorted.head(10)
display(top_categories_enhanced[['total_revenue', 'revenue_share', 'avg_price', 'avg_review_score', 
                               'unique_customers', 'repeat_purchase_rate']].round(2))

# Category performance insights
print(f"\nüí° Category Performance Insights:")

# Most profitable category
most_profitable = category_performance_sorted.index[0]
most_profitable_margin = category_performance_sorted.loc[most_profitable, 'avg_profit_margin']
print(f"   ‚Ä¢ Most profitable category: {most_profitable} ({most_profitable_margin:.1%} margin)")

# Highest customer penetration
highest_penetration = category_performance['customer_penetration'].idxmax()
penetration_rate = category_performance.loc[highest_penetration, 'customer_penetration']
print(f"   ‚Ä¢ Highest customer penetration: {highest_penetration} ({penetration_rate:.1f}% of customers)")

# Most loyal customers (highest repeat rate)
most_loyal = category_performance['repeat_purchase_rate'].idxmax()
loyalty_rate = category_performance.loc[most_loyal, 'repeat_purchase_rate']
print(f"   ‚Ä¢ Highest customer loyalty: {most_loyal} ({loyalty_rate:.2f} purchases per customer)")

# Price analysis
highest_avg_price = category_performance['avg_price'].idxmax()
highest_price = category_performance.loc[highest_avg_price, 'avg_price']
print(f"   ‚Ä¢ Premium category: {highest_avg_price} (R$ {highest_price:.2f} avg price)")

# Satisfaction analysis
highest_satisfaction = category_performance['avg_review_score'].idxmax()
satisfaction_score = category_performance.loc[highest_satisfaction, 'avg_review_score']
print(f"   ‚Ä¢ Highest satisfaction: {highest_satisfaction} ({satisfaction_score:.2f} avg rating)")

In [None]:
# Category Competitive Position Analysis
print("‚öîÔ∏è Category Competitive Position Analysis")
print("=" * 45)

def analyze_category_competitive_position(category_data):
    """
    Analyze competitive position of categories using multiple dimensions
    """
    # Create competitive metrics
    competitive_analysis = category_data.copy()
    
    # Calculate competitive scores (percentile rankings)
    competitive_analysis['revenue_rank'] = category_data['total_revenue'].rank(pct=True)
    competitive_analysis['satisfaction_rank'] = category_data['avg_review_score'].rank(pct=True)
    competitive_analysis['growth_potential_rank'] = category_data['repeat_purchase_rate'].rank(pct=True)
    competitive_analysis['margin_rank'] = category_data['avg_profit_margin'].rank(pct=True)
    competitive_analysis['penetration_rank'] = category_data['customer_penetration'].rank(pct=True)
    
    # Calculate overall competitive score
    competitive_analysis['competitive_score'] = (
        competitive_analysis['revenue_rank'] * 0.3 +
        competitive_analysis['satisfaction_rank'] * 0.2 +
        competitive_analysis['growth_potential_rank'] * 0.2 +
        competitive_analysis['margin_rank'] * 0.15 +
        competitive_analysis['penetration_rank'] * 0.15
    )
    
    # Classify competitive positions
    def classify_position(score):
        if score >= 0.8:
            return 'Market Leader'
        elif score >= 0.6:
            return 'Strong Competitor'
        elif score >= 0.4:
            return 'Market Follower'
        elif score >= 0.2:
            return 'Niche Player'
        else:
            return 'Struggling'
    
    competitive_analysis['competitive_position'] = competitive_analysis['competitive_score'].apply(classify_position)
    
    return competitive_analysis

# Perform competitive analysis
competitive_positions = analyze_category_competitive_position(category_performance)

# Display competitive positioning
position_distribution = competitive_positions['competitive_position'].value_counts()
print(f"\nüèÅ Competitive Position Distribution:")
for position, count in position_distribution.items():
    percentage = (count / len(competitive_positions)) * 100
    print(f"   ‚Ä¢ {position}: {count} categories ({percentage:.1f}%)")

# Top performers by competitive score
top_competitive = competitive_positions.sort_values('competitive_score', ascending=False)
print(f"\nü•á Top 10 Categories by Competitive Score:")
display(top_competitive[['competitive_score', 'competitive_position', 'total_revenue', 
                       'avg_review_score', 'avg_profit_margin']].head(10).round(3))

# Detailed analysis by competitive position
print(f"\nüìä Performance by Competitive Position:")
position_analysis = competitive_positions.groupby('competitive_position').agg({
    'total_revenue': ['mean', 'sum'],
    'avg_review_score': 'mean',
    'avg_profit_margin': 'mean',
    'customer_penetration': 'mean',
    'repeat_purchase_rate': 'mean'
}).round(3)

# Flatten columns
position_analysis.columns = ['avg_revenue', 'total_revenue', 'avg_satisfaction', 
                           'avg_margin', 'avg_penetration', 'avg_loyalty']

display(position_analysis)

# Strategic recommendations by position
print(f"\nüéØ Strategic Recommendations by Competitive Position:")

for position in position_distribution.index:
    position_categories = competitive_positions[competitive_positions['competitive_position'] == position]
    category_count = len(position_categories)
    
    print(f"\n   {position.upper()} ({category_count} categories):")
    
    if position == 'Market Leader':
        print(f"     Strategy: Defend and extend market leadership")
        print(f"     Actions: Innovation, premium positioning, market expansion")
        print(f"     Focus: Maintain quality while scaling operations")
    
    elif position == 'Strong Competitor':
        print(f"     Strategy: Challenge for leadership or find differentiation")
        print(f"     Actions: Competitive pricing, feature enhancement, niche focus")
        print(f"     Focus: Identify and exploit leader weaknesses")
    
    elif position == 'Market Follower':
        print(f"     Strategy: Follow leader or find profitable niches")
        print(f"     Actions: Cost optimization, selective competition, specialization")
        print(f"     Focus: Efficiency and targeted customer segments")
    
    elif position == 'Niche Player':
        print(f"     Strategy: Dominate specific market segments")
        print(f"     Actions: Deep specialization, customer intimacy, premium service")
        print(f"     Focus: Become indispensable to target customers")
    
    else:  # Struggling
        print(f"     Strategy: Turnaround or exit")
        print(f"     Actions: Cost reduction, repositioning, or discontinuation")
        print(f"     Focus: Determine viability and act decisively")

# Example categories for each position
print(f"\nüìã Example Categories by Position:")
for position in position_distribution.index:
    examples = competitive_positions[competitive_positions['competitive_position'] == position].index[:3].tolist()
    print(f"   {position}: {', '.join(examples)}")

## 5. Product Relationship and Cross-Selling Analysis

Analyze product relationships and identify cross-selling opportunities.

In [None]:
# Market Basket Analysis and Product Relationships
print("üõí Market Basket Analysis and Product Relationships")
print("=" * 55)

def perform_market_basket_analysis(data):
    """
    Perform market basket analysis to identify product relationships
    """
    # Create order-product matrix
    order_products = data.groupby('order_id')['category_clean'].apply(list).reset_index()
    order_products['category_count'] = order_products['category_clean'].apply(len)
    
    # Focus on multi-item orders for association analysis
    multi_item_orders = order_products[order_products['category_count'] > 1].copy()
    
    print(f"üìä Market Basket Analysis Overview:")
    print(f"   ‚Ä¢ Total orders: {len(order_products):,}")
    print(f"   ‚Ä¢ Multi-item orders: {len(multi_item_orders):,} ({len(multi_item_orders)/len(order_products)*100:.1f}%)")
    print(f"   ‚Ä¢ Average items per multi-item order: {multi_item_orders['category_count'].mean():.1f}")
    
    # Calculate category co-occurrence matrix
    from collections import defaultdict, Counter
    
    # Count individual category frequency
    category_counts = Counter()
    for categories in multi_item_orders['category_clean']:
        for category in set(categories):  # Use set to count each category once per order
            category_counts[category] += 1
    
    # Count category pairs
    pair_counts = Counter()
    for categories in multi_item_orders['category_clean']:
        unique_categories = list(set(categories))
        for pair in combinations(sorted(unique_categories), 2):
            pair_counts[pair] += 1
    
    # Calculate association metrics
    associations = []
    total_multi_orders = len(multi_item_orders)
    
    for (cat_a, cat_b), pair_count in pair_counts.items():
        if pair_count >= 5:  # Minimum support threshold
            support_a = category_counts[cat_a] / total_multi_orders
            support_b = category_counts[cat_b] / total_multi_orders
            support_ab = pair_count / total_multi_orders
            
            # Calculate confidence and lift
            confidence_a_to_b = support_ab / support_a
            confidence_b_to_a = support_ab / support_b
            lift = support_ab / (support_a * support_b)
            
            associations.append({
                'category_a': cat_a,
                'category_b': cat_b,
                'support': support_ab,
                'confidence_a_to_b': confidence_a_to_b,
                'confidence_b_to_a': confidence_b_to_a,
                'lift': lift,
                'pair_count': pair_count
            })
    
    associations_df = pd.DataFrame(associations)
    
    return associations_df, category_counts, multi_item_orders

# Perform market basket analysis
associations, category_freq, multi_orders = perform_market_basket_analysis(product_df)

if len(associations) > 0:
    # Sort by lift for most interesting associations
    top_associations = associations.sort_values('lift', ascending=False)
    
    print(f"\nüîç Top 15 Product Category Associations (by Lift):")
    display(top_associations.head(15)[['category_a', 'category_b', 'lift', 'confidence_a_to_b', 
                                     'confidence_b_to_a', 'support', 'pair_count']].round(3))
    
    # Identify strongest cross-selling opportunities
    strong_associations = top_associations[
        (top_associations['lift'] > 1.5) & 
        (top_associations['confidence_a_to_b'] > 0.3) &
        (top_associations['pair_count'] >= 10)
    ]
    
    print(f"\nüí° Strong Cross-Selling Opportunities:")
    print(f"   Criteria: Lift > 1.5, Confidence > 30%, Min 10 occurrences")
    print(f"   Opportunities identified: {len(strong_associations)}")
    
    if len(strong_associations) > 0:
        print(f"\nüéØ Recommended Cross-Selling Pairs:")
        for idx, row in strong_associations.head(10).iterrows():
            print(f"   ‚Ä¢ {row['category_a']} ‚Üí {row['category_b']}")
            print(f"     Lift: {row['lift']:.2f}, Confidence: {row['confidence_a_to_b']:.1%}")
            print(f"     Interpretation: Customers buying {row['category_a']} are {row['lift']:.1f}x more likely to buy {row['category_b']}")
            print()
    
    # Category popularity in multi-item orders
    print(f"\nüìä Most Popular Categories in Multi-Item Orders:")
    popular_categories = pd.DataFrame(list(category_freq.items()), columns=['Category', 'Frequency'])
    popular_categories['Percentage'] = (popular_categories['Frequency'] / len(multi_orders) * 100).round(1)
    popular_categories = popular_categories.sort_values('Frequency', ascending=False)
    
    display(popular_categories.head(10))
    
else:
    print("\n‚ö†Ô∏è Insufficient data for meaningful association analysis")
    print("   This could indicate that customers typically buy single items")
    print("   or that the minimum support threshold is too high.")

In [None]:
# Product Network Analysis for Relationship Visualization
print("üï∏Ô∏è Product Category Network Analysis")
print("=" * 40)

def create_product_network(associations_df, min_lift=1.2, min_confidence=0.2):
    """
    Create a network graph of product category relationships
    """
    if len(associations_df) == 0:
        print("No associations available for network analysis")
        return None, None
    
    # Filter for meaningful relationships
    filtered_associations = associations_df[
        (associations_df['lift'] >= min_lift) & 
        (associations_df['confidence_a_to_b'] >= min_confidence)
    ]
    
    if len(filtered_associations) == 0:
        print(f"No associations meet the criteria (lift >= {min_lift}, confidence >= {min_confidence})")
        return None, None
    
    # Create network graph
    G = nx.Graph()
    
    # Add edges with weights
    for _, row in filtered_associations.iterrows():
        G.add_edge(
            row['category_a'], 
            row['category_b'], 
            weight=row['lift'],
            confidence=row['confidence_a_to_b'],
            support=row['support']
        )
    
    return G, filtered_associations

# Create product network
if len(associations) > 0:
    product_network, network_associations = create_product_network(associations)
    
    if product_network is not None:
        print(f"\nüîó Product Category Network Statistics:")
        print(f"   ‚Ä¢ Nodes (categories): {product_network.number_of_nodes()}")
        print(f"   ‚Ä¢ Edges (relationships): {product_network.number_of_edges()}")
        print(f"   ‚Ä¢ Network density: {nx.density(product_network):.3f}")
        
        # Calculate network centrality measures
        if product_network.number_of_nodes() > 0:
            centrality_measures = {
                'degree_centrality': nx.degree_centrality(product_network),
                'betweenness_centrality': nx.betweenness_centrality(product_network),
                'closeness_centrality': nx.closeness_centrality(product_network),
                'eigenvector_centrality': nx.eigenvector_centrality(product_network, max_iter=1000)
            }
            
            # Create centrality DataFrame
            centrality_df = pd.DataFrame(centrality_measures).round(3)
            centrality_df['avg_centrality'] = centrality_df.mean(axis=1)
            centrality_df = centrality_df.sort_values('avg_centrality', ascending=False)
            
            print(f"\nüìä Category Network Centrality (Top 10):")
            print(f"   Higher centrality = more connected to other categories")
            display(centrality_df.head(10))
            
            # Identify hub categories (high degree centrality)
            hub_categories = centrality_df.sort_values('degree_centrality', ascending=False).head(5)
            print(f"\nüéØ Hub Categories (Most Connected):")
            for category, metrics in hub_categories.iterrows():
                connections = product_network.degree(category)
                print(f"   ‚Ä¢ {category}: {connections} connections (centrality: {metrics['degree_centrality']:.3f})")
            
            # Bridge categories (high betweenness centrality)
            bridge_categories = centrality_df.sort_values('betweenness_centrality', ascending=False).head(5)
            print(f"\nüåâ Bridge Categories (Connect Different Groups):")
            for category, metrics in bridge_categories.iterrows():
                print(f"   ‚Ä¢ {category}: {metrics['betweenness_centrality']:.3f} betweenness centrality")
        
        # Visualize network (if not too large)
        if product_network.number_of_nodes() <= 20:
            print(f"\nüé® Creating Network Visualization...")
            
            plt.figure(figsize=(15, 10))
            
            # Create layout
            pos = nx.spring_layout(product_network, k=3, iterations=50)
            
            # Draw nodes
            node_sizes = [product_network.degree(node) * 300 + 300 for node in product_network.nodes()]
            nx.draw_networkx_nodes(product_network, pos, node_size=node_sizes, 
                                 node_color='lightblue', alpha=0.7)
            
            # Draw edges with weights
            edges = product_network.edges()
            edge_weights = [product_network[u][v]['weight'] for u, v in edges]
            nx.draw_networkx_edges(product_network, pos, width=[w*2 for w in edge_weights], 
                                 alpha=0.6, edge_color='gray')
            
            # Draw labels
            labels = {node: node[:15] + '...' if len(node) > 15 else node for node in product_network.nodes()}
            nx.draw_networkx_labels(product_network, pos, labels, font_size=8)
            
            plt.title('Product Category Relationship Network\n(Node size = connections, Edge width = association strength)', 
                     fontsize=14, fontweight='bold')
            plt.axis('off')
            plt.tight_layout()
            plt.show()
        
        else:
            print(f"\nüìä Network too large for visualization ({product_network.number_of_nodes()} nodes)")
            print(f"   Consider using specialized network visualization tools")
    
    else:
        print("\n‚ö†Ô∏è No relationships meet the network criteria")
        print("   Try lowering the minimum lift or confidence thresholds")
else:
    print("\n‚ö†Ô∏è No association data available for network analysis")

## 6. Product Performance Optimization and Recommendations

Synthesize insights into actionable product optimization strategies.

In [None]:
# Comprehensive Product Performance Optimization
print("üéØ Comprehensive Product Performance Optimization")
print("=" * 55)

def generate_product_optimization_recommendations(product_metrics, category_performance, competitive_positions, associations):
    """
    Generate comprehensive product optimization recommendations
    """
    recommendations = {
        'portfolio_optimization': [],
        'category_strategies': [],
        'cross_selling': [],
        'pricing_optimization': [],
        'inventory_management': []
    }
    
    # Portfolio optimization recommendations
    total_revenue = product_metrics['total_revenue'].sum()
    
    # Identify underperforming products
    low_performers = product_metrics[
        (product_metrics['total_revenue'] < product_metrics['total_revenue'].quantile(0.2)) &
        (product_metrics['avg_review_score'] < 3.5)
    ]
    
    # Identify star performers
    star_performers = product_metrics[
        (product_metrics['total_revenue'] >= product_metrics['total_revenue'].quantile(0.8)) &
        (product_metrics['avg_review_score'] >= 4.0)
    ]
    
    recommendations['portfolio_optimization'] = [
        f"Focus resources on {len(star_performers)} star performers generating high revenue and satisfaction",
        f"Review {len(low_performers)} underperforming products for potential discontinuation or improvement",
        f"Star performers contribute R$ {star_performers['total_revenue'].sum():,.2f} ({star_performers['total_revenue'].sum()/total_revenue*100:.1f}% of revenue)",
        f"Consider expanding product lines in categories with star performers"
    ]
    
    # Category-specific strategies
    top_categories = category_performance.sort_values('total_revenue', ascending=False).head(5)
    
    for category in top_categories.index:
        position = competitive_positions.loc[category, 'competitive_position']
        revenue_share = category_performance.loc[category, 'revenue_share']
        avg_margin = category_performance.loc[category, 'avg_profit_margin']
        
        strategy = f"{category} ({position}): {revenue_share:.1f}% revenue share, {avg_margin:.1%} margin - "
        
        if position == 'Market Leader':
            strategy += "Defend leadership through innovation and premium positioning"
        elif position == 'Strong Competitor':
            strategy += "Invest in differentiation and competitive advantages"
        else:
            strategy += "Focus on efficiency and niche opportunities"
        
        recommendations['category_strategies'].append(strategy)
    
    # Cross-selling recommendations
    if len(associations) > 0:
        top_associations = associations.sort_values('lift', ascending=False).head(5)
        
        for _, assoc in top_associations.iterrows():
            recommendation = f"Cross-sell {assoc['category_b']} to {assoc['category_a']} customers "
            recommendation += f"(Lift: {assoc['lift']:.2f}, Confidence: {assoc['confidence_a_to_b']:.1%})"
            recommendations['cross_selling'].append(recommendation)
    else:
        recommendations['cross_selling'].append("Insufficient data for cross-selling analysis")
    
    # Pricing optimization
    high_margin_categories = category_performance[category_performance['avg_profit_margin'] > 0.3]
    low_margin_categories = category_performance[category_performance['avg_profit_margin'] < 0.1]
    
    recommendations['pricing_optimization'] = [
        f"Maintain premium pricing for {len(high_margin_categories)} high-margin categories",
        f"Review pricing strategy for {len(low_margin_categories)} low-margin categories",
        f"Consider dynamic pricing based on demand patterns and competition",
        f"Implement value-based pricing for products with high customer satisfaction"
    ]
    
    # Inventory management
    fast_movers = product_metrics[product_metrics['sales_volume'] >= product_metrics['sales_volume'].quantile(0.8)]
    slow_movers = product_metrics[product_metrics['sales_volume'] <= product_metrics['sales_volume'].quantile(0.2)]
    
    recommendations['inventory_management'] = [
        f"Optimize inventory levels for {len(fast_movers)} fast-moving products",
        f"Consider reducing inventory for {len(slow_movers)} slow-moving products",
        f"Implement demand forecasting based on historical sales patterns",
        f"Focus inventory investment on star performers and market leaders"
    ]
    
    return recommendations

# Generate optimization recommendations
optimization_recommendations = generate_product_optimization_recommendations(
    product_metrics, category_performance, competitive_positions, associations
)

# Display comprehensive recommendations
print(f"\nüìã COMPREHENSIVE PRODUCT OPTIMIZATION RECOMMENDATIONS")
print(f"=" * 65)

for category, recommendations in optimization_recommendations.items():
    print(f"\nüéØ {category.replace('_', ' ').upper()}:")
    for i, recommendation in enumerate(recommendations, 1):
        print(f"   {i}. {recommendation}")

# Calculate potential impact metrics
print(f"\n\nüí∞ POTENTIAL BUSINESS IMPACT ANALYSIS")
print(f"=" * 50)

total_revenue = product_metrics['total_revenue'].sum()
total_products = len(product_metrics)

# Portfolio optimization impact
star_revenue = product_metrics[
    (product_metrics['total_revenue'] >= product_metrics['total_revenue'].quantile(0.8)) &
    (product_metrics['avg_review_score'] >= 4.0)
]['total_revenue'].sum()

underperformer_revenue = product_metrics[
    (product_metrics['total_revenue'] < product_metrics['total_revenue'].quantile(0.2)) &
    (product_metrics['avg_review_score'] < 3.5)
]['total_revenue'].sum()

print(f"üìä Current Portfolio Metrics:")
print(f"   ‚Ä¢ Total revenue: R$ {total_revenue:,.2f}")
print(f"   ‚Ä¢ Star performer revenue: R$ {star_revenue:,.2f} ({star_revenue/total_revenue*100:.1f}%)")
print(f"   ‚Ä¢ Underperformer revenue: R$ {underperformer_revenue:,.2f} ({underperformer_revenue/total_revenue*100:.1f}%)")

# Estimated optimization impact
print(f"\nüéØ Estimated Optimization Impact (12-month projection):")
print(f"   ‚Ä¢ Focus on star performers: +15-25% revenue growth = R$ {star_revenue * 0.2:,.2f}")
print(f"   ‚Ä¢ Cross-selling implementation: +5-10% revenue growth = R$ {total_revenue * 0.075:,.2f}")
print(f"   ‚Ä¢ Underperformer optimization: +3-8% margin improvement")
print(f"   ‚Ä¢ Category positioning: +10-20% in targeted categories")

# Implementation priorities
print(f"\n‚ö° IMPLEMENTATION PRIORITIES")
print(f"=" * 35)
print(f"\nü•á HIGH PRIORITY (Immediate - 1-2 months):")
print(f"   ‚Ä¢ Optimize inventory for star performers")
print(f"   ‚Ä¢ Implement cross-selling for top associations")
print(f"   ‚Ä¢ Review pricing for low-margin categories")

print(f"\nü•à MEDIUM PRIORITY (3-6 months):")
print(f"   ‚Ä¢ Develop category-specific strategies")
print(f"   ‚Ä¢ Improve or discontinue underperformers")
print(f"   ‚Ä¢ Enhance product mix in market leader categories")

print(f"\nü•â LOW PRIORITY (6-12 months):")
print(f"   ‚Ä¢ Explore new product development")
print(f"   ‚Ä¢ Long-term competitive positioning")
print(f"   ‚Ä¢ Advanced analytics implementation")

print(f"\n‚úÖ Product Performance Analysis Complete!")
print(f"   Ready for time series analysis and advanced forecasting.")

## Summary - Product Performance Metrics and Insights

### What We've Accomplished

1. **‚úÖ Comprehensive Product Portfolio Analysis**: Analyzed 30,000+ product transactions across multiple performance dimensions
2. **‚úÖ BCG-Style Performance Matrix**: Classified products into Stars, Cash Cows, Question Marks, and Dogs
3. **‚úÖ Advanced Category Analysis**: Deep competitive positioning and market share analysis
4. **‚úÖ Market Basket Analysis**: Identified cross-selling opportunities through association rules
5. **‚úÖ Product Network Analysis**: Visualized category relationships and identified hub categories
6. **‚úÖ Strategic Optimization Framework**: Created actionable recommendations with impact projections

### Key Business Insights Discovered

**Product Performance:**
- Clear identification of star performers driving revenue and satisfaction
- Underperformer analysis for optimization or discontinuation
- Category competitive positioning and market dynamics

**Cross-Selling Opportunities:**
- Data-driven product association rules
- Network analysis revealing category relationship patterns
- Quantified cross-selling potential with confidence metrics

**Strategic Recommendations:**
- Portfolio optimization priorities with revenue impact
- Category-specific competitive strategies
- Pricing and inventory optimization opportunities

### Advanced Techniques Mastered

- **BCG Matrix Analysis**: Strategic product portfolio classification
- **Association Rule Mining**: Market basket analysis with lift and confidence
- **Network Analysis**: Product relationship visualization and centrality measures
- **Competitive Positioning**: Multi-dimensional performance scoring
- **Business Impact Modeling**: Quantified optimization recommendations

### Next Steps
In Part 3, we'll explore:
- Time series patterns in order data
- Seasonal trends and forecasting
- Advanced temporal analysis for business planning

## üéØ Practice Exercises - Product Performance Analysis

Master product analytics techniques:

1. **Custom Portfolio Matrix**: Create your own product classification system using different metrics

2. **Advanced Association Analysis**: Implement additional association metrics (conviction, cosine similarity)

3. **Price Elasticity Analysis**: Analyze how price changes affect demand patterns

4. **Product Lifecycle Modeling**: Create models to identify product lifecycle stages

5. **Recommendation Engine**: Build a simple product recommendation system based on associations

In [None]:
# Exercise Space - Product Performance Analysis
# Use this space to practice the product analytics techniques

# Exercise 1: Custom Portfolio Matrix
# Create alternative product classification systems

# Exercise 2: Advanced Association Analysis
# Implement additional association metrics

# Exercise 3: Price Elasticity Analysis
# Analyze price-demand relationships

# Exercise 4: Product Lifecycle Modeling
# Identify product lifecycle stages

# Exercise 5: Recommendation Engine
# Build product recommendation system