# PREDICTIVE ANALYTICS: "WHAT WILL HAPPEN?"
## E-commerce Customer Analytics - Part 3 of 4

OBJECTIVE: Build predictive models to forecast future outcomes
- Customer churn prediction
- Sales forecasting
- Customer lifetime value prediction
- Product demand forecasting

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
from datetime import datetime, timedelta
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, classification_report, mean_squared_error, r2_score
from sklearn.impute import SimpleImputer
import warnings
warnings.filterwarnings('ignore')

# Set styling
plt.style.use('seaborn-v0_8')
sns.set_palette("viridis")

### 1. DATA LOADING & FEATURE ENGINEERING

In [2]:
print("\n📊 STEP 1: Data Loading & Feature Engineering")
print("-" * 45)

# Load datasets
customers = pd.read_csv('dataset/customers.csv')
products = pd.read_csv('dataset/products.csv')
transactions = pd.read_csv('dataset/transactions.csv')
tickets = pd.read_csv('dataset/support_tickets.csv')

# Convert date columns
transactions['transaction_date'] = pd.to_datetime(transactions['transaction_date'])
customers['registration_date'] = pd.to_datetime(customers['registration_date'])

print(f"✅ Loaded data successfully")

# Feature engineering for customers
print("\n🔧 Engineering Customer Features...")

# Customer transaction features
customer_features = transactions.groupby('customer_id').agg({
    'total_amount': ['sum', 'mean', 'std', 'count'],
    'quantity': ['sum', 'mean'],
    'discount': 'mean',
    'transaction_date': ['min', 'max']
}).round(2)

# Flatten column names
customer_features.columns = ['total_spent', 'avg_order_value', 'order_value_std', 'transaction_count',
                           'total_quantity', 'avg_quantity', 'avg_discount', 'first_purchase', 'last_purchase']

# Calculate additional features
customer_features['days_active'] = (customer_features['last_purchase'] - customer_features['first_purchase']).dt.days
customer_features['days_since_last'] = (datetime.now() - customer_features['last_purchase']).dt.days
customer_features['purchase_frequency'] = customer_features['transaction_count'] / (customer_features['days_active'] + 1)
customer_features['order_value_std'] = customer_features['order_value_std'].fillna(0)

# Customer support features
support_features = tickets.groupby('customer_id').agg({
    'ticket_id': 'count',
    'resolution_time_hours': 'mean',
    'priority': lambda x: (x.isin(['High', 'Critical'])).sum()
}).round(2)

support_features.columns = ['support_tickets', 'avg_resolution_time', 'high_priority_tickets']

# Rebuild customers_ml step-by-step
customers_ml = customers[['customer_id', 'first_name', 'last_name', 'email', 'age', 'gender', 'city', 'state', 'registration_date', 'customer_segment', 'is_churned']].copy()

customers_ml = customers_ml.merge(customer_features, on='customer_id', how='left')
customers_ml = customers_ml.merge(support_features, on='customer_id', how='left')

# Fill missing values
customers_ml = customers_ml.fillna(0)

print(f"✅ Created {len(customers_ml.columns)} features for {len(customers_ml)} customers")
print(f"Columns after merge: {customers_ml.columns.tolist()}")


📊 STEP 1: Data Loading & Feature Engineering
---------------------------------------------
✅ Loaded data successfully

🔧 Engineering Customer Features...
✅ Created 26 features for 5000 customers
Columns after merge: ['customer_id', 'first_name', 'last_name', 'email', 'age', 'gender', 'city', 'state', 'registration_date', 'customer_segment', 'is_churned', 'total_spent', 'avg_order_value', 'order_value_std', 'transaction_count', 'total_quantity', 'avg_quantity', 'avg_discount', 'first_purchase', 'last_purchase', 'days_active', 'days_since_last', 'purchase_frequency', 'support_tickets', 'avg_resolution_time', 'high_priority_tickets']


### 2. CHURN PREDICTION MODEL

In [3]:
print("\n\n🎯 STEP 2: Customer Churn Prediction Model (Skipped)")
print("-" * 45)

# Prepare features for churn prediction
churn_features = ['age', 'total_spent', 'avg_order_value', 'transaction_count', 
                 'purchase_frequency', 'avg_discount', 'days_since_last',
                 'support_tickets', 'avg_resolution_time', 'high_priority_tickets']

# Encode categorical variables
le = LabelEncoder()
customers_ml['gender_encoded'] = le.fit_transform(customers_ml['gender'])
customers_ml['segment_encoded'] = le.fit_transform(customers_ml['customer_segment'])

churn_features.extend(['gender_encoded', 'segment_encoded'])

# Prepare data
X_churn = customers_ml[churn_features].fillna(0)
y_churn = customers_ml['is_churned']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X_churn, y_churn, test_size=0.2, random_state=42, stratify=y_churn)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"📊 Training Set: {len(X_train)} customers")
print(f"📊 Test Set: {len(X_test)} customers")
print(f"📊 Churn Rate in Training: {y_train.mean()*100:.1f}%")

# Train models
print("\n🤖 Training Churn Prediction Models...")

# Random Forest Model
rf_churn = RandomForestClassifier(n_estimators=100, random_state=42, max_depth=10)
rf_churn.fit(X_train, y_train)

# Logistic Regression Model
lr_churn = LogisticRegression(random_state=42, max_iter=1000)
lr_churn.fit(X_train_scaled, y_train)

# Predictions
rf_pred = rf_churn.predict(X_test)
lr_pred = lr_churn.predict(X_test)

rf_proba = rf_churn.predict_proba(X_test)[:, 1]
lr_proba = lr_churn.predict_proba(X_test)[:, 1]

# Evaluate models
print(f"\n📈 Churn Prediction Results:")
print(f"Random Forest Accuracy: {accuracy_score(y_test, rf_pred):.3f}")
print(f"Logistic Regression Accuracy: {accuracy_score(y_test, lr_pred):.3f}")

# Feature importance
feature_importance = pd.DataFrame({
    'feature': churn_features,
    'importance': rf_churn.feature_importances_
}).sort_values('importance', ascending=False)

print(f"\n🔍 Top 5 Churn Prediction Features:")
print(feature_importance.head().round(3))

# Identify high-risk customers
customers_ml['churn_probability'] = rf_churn.predict_proba(X_churn)[:, 1]
high_risk_customers = customers_ml[customers_ml['churn_probability'] > 0.7].copy()

print(f"\n⚠️  High Risk Customers (>70% churn probability):")
print(f"   Count: {len(high_risk_customers)}")
print(f"   Avg Churn Probability: {high_risk_customers['churn_probability'].mean():.1%}")
print(f"   Total Value at Risk: ${high_risk_customers['total_spent'].sum():,.2f}")




🎯 STEP 2: Customer Churn Prediction Model (Skipped)
---------------------------------------------
📊 Training Set: 4000 customers
📊 Test Set: 1000 customers
📊 Churn Rate in Training: 29.8%

🤖 Training Churn Prediction Models...

📈 Churn Prediction Results:
Random Forest Accuracy: 0.702
Logistic Regression Accuracy: 0.703

🔍 Top 5 Churn Prediction Features:
              feature  importance
4  purchase_frequency       0.153
2     avg_order_value       0.142
1         total_spent       0.140
6     days_since_last       0.137
0                 age       0.113

⚠️  High Risk Customers (>70% churn probability):
   Count: 9
   Avg Churn Probability: 74.6%
   Total Value at Risk: $5,126.54


### 3. CUSTOMER LIFETIME VALUE PREDICTION

In [4]:
print("\n\n💰 STEP 3: Customer Lifetime Value Prediction")
print("-" * 45)

# Calculate current CLV for active customers
active_customers = customers_ml[customers_ml['is_churned'] == 0].copy()

# Features for CLV prediction
clv_features = ['age', 'transaction_count', 'avg_order_value', 'purchase_frequency',
               'days_active', 'avg_discount', 'gender_encoded', 'segment_encoded']

# Prepare CLV data
X_clv = active_customers[clv_features].fillna(0)
y_clv = active_customers['total_spent']

# Split data
X_train_clv, X_test_clv, y_train_clv, y_test_clv = train_test_split(X_clv, y_clv, test_size=0.2, random_state=42)

# Scale features
scaler_clv = StandardScaler()
X_train_clv_scaled = scaler_clv.fit_transform(X_train_clv)
X_test_clv_scaled = scaler_clv.transform(X_test_clv)

print(f"📊 CLV Training Set: {len(X_train_clv)} customers")

# Train CLV models
print("\n🤖 Training CLV Prediction Models...")

# Random Forest Regressor
rf_clv = RandomForestRegressor(n_estimators=100, random_state=42, max_depth=10)
rf_clv.fit(X_train_clv, y_train_clv)

# Linear Regression
lr_clv = LinearRegression()
lr_clv.fit(X_train_clv_scaled, y_train_clv)

# Predictions
rf_clv_pred = rf_clv.predict(X_test_clv)
lr_clv_pred = lr_clv.predict(X_test_clv_scaled)

# Evaluate CLV models
rf_r2 = r2_score(y_test_clv, rf_clv_pred)
lr_r2 = r2_score(y_test_clv, lr_clv_pred)

rf_rmse = np.sqrt(mean_squared_error(y_test_clv, rf_clv_pred))
lr_rmse = np.sqrt(mean_squared_error(y_test_clv, lr_clv_pred))

print(f"\n📈 CLV Prediction Results:")
print(f"Random Forest R² Score: {rf_r2:.3f}, RMSE: ${rf_rmse:.2f}")
print(f"Linear Regression R² Score: {lr_r2:.3f}, RMSE: ${lr_rmse:.2f}")

# Predict future CLV for all active customers
active_customers['predicted_clv'] = rf_clv.predict(X_clv)
active_customers['clv_segment'] = pd.qcut(active_customers['predicted_clv'], 
                                         q=4, labels=['Low', 'Medium', 'High', 'Premium'])

print(f"\n💎 CLV Segmentation:")
clv_segments = active_customers.groupby('clv_segment').agg({
    'customer_id': 'count',
    'predicted_clv': 'mean',
    'total_spent': 'mean'
}).round(2)
clv_segments.columns = ['Customer Count', 'Predicted CLV', 'Current Spent']
print(clv_segments)



💰 STEP 3: Customer Lifetime Value Prediction
---------------------------------------------
📊 CLV Training Set: 2810 customers

🤖 Training CLV Prediction Models...

📈 CLV Prediction Results:
Random Forest R² Score: 0.986, RMSE: $26.26
Linear Regression R² Score: 0.930, RMSE: $59.40

💎 CLV Segmentation:
             Customer Count  Predicted CLV  Current Spent
clv_segment                                              
Low                     879         208.92         208.43
Medium                  878         356.76         356.33
High                    878         495.99         497.00
Premium                 878         764.26         766.94


### 4. SALES FORECASTING

In [5]:
print("\n\n📈 STEP 4: Sales Forecasting")
print("-" * 30)

# Prepare time series data
daily_sales = transactions.groupby('transaction_date').agg({
    'total_amount': 'sum',
    'transaction_id': 'count',
    'customer_id': 'nunique'
}).round(2)

daily_sales.columns = ['daily_revenue', 'daily_orders', 'daily_customers']

# Create date features
daily_sales['day_of_week'] = daily_sales.index.dayofweek
daily_sales['month'] = daily_sales.index.month
daily_sales['quarter'] = daily_sales.index.quarter
daily_sales['day_of_year'] = daily_sales.index.dayofyear

# Create lag features
daily_sales['revenue_lag_1'] = daily_sales['daily_revenue'].shift(1)
daily_sales['revenue_lag_7'] = daily_sales['daily_revenue'].shift(7)
daily_sales['revenue_ma_7'] = daily_sales['daily_revenue'].rolling(window=7).mean()
daily_sales['revenue_ma_30'] = daily_sales['daily_revenue'].rolling(window=30).mean()

# Remove rows with NaN values
daily_sales_clean = daily_sales.dropna()

print(f"📊 Sales Forecasting Data: {len(daily_sales_clean)} days")

# Features for forecasting
forecast_features = ['day_of_week', 'month', 'quarter', 'day_of_year',
                    'revenue_lag_1', 'revenue_lag_7', 'revenue_ma_7', 'revenue_ma_30']

X_forecast = daily_sales_clean[forecast_features]
y_forecast = daily_sales_clean['daily_revenue']

# Split chronologically (last 30 days for testing)
split_date = daily_sales_clean.index[-30]
X_train_forecast = X_forecast[X_forecast.index < split_date]
X_test_forecast = X_forecast[X_forecast.index >= split_date]
y_train_forecast = y_forecast[y_forecast.index < split_date]
y_test_forecast = y_forecast[y_forecast.index >= split_date]

print(f"📊 Forecast Training: {len(X_train_forecast)} days")
print(f"📊 Forecast Testing: {len(X_test_forecast)} days")

# Train forecasting model
print("\n🤖 Training Sales Forecasting Model...")

rf_forecast = RandomForestRegressor(n_estimators=100, random_state=42)
rf_forecast.fit(X_train_forecast, y_train_forecast)

# Predict
forecast_pred = rf_forecast.predict(X_test_forecast)

# Evaluate forecasting
forecast_r2 = r2_score(y_test_forecast, forecast_pred)
forecast_rmse = np.sqrt(mean_squared_error(y_test_forecast, forecast_pred))

print(f"\n📈 Sales Forecasting Results:")
print(f"R² Score: {forecast_r2:.3f}")
print(f"RMSE: ${forecast_rmse:.2f}")
print(f"Mean Actual Daily Revenue: ${y_test_forecast.mean():.2f}")
print(f"Mean Predicted Daily Revenue: ${forecast_pred.mean():.2f}")

# Future forecasting (next 7 days)
last_known_data = daily_sales_clean.iloc[-1:].copy()
future_predictions = []

print(f"\n🔮 Next 7 Days Revenue Forecast:")
for i in range(7):
    # Use last known values for prediction
    future_date = last_known_data.index[0] + timedelta(days=i+1)
    
    # Create features for future day
    future_features = {
        'day_of_week': future_date.dayofweek,
        'month': future_date.month,
        'quarter': (future_date.month - 1) // 3 + 1,
        'day_of_year': future_date.dayofyear,
        'revenue_lag_1': last_known_data['daily_revenue'].iloc[0] if i == 0 else future_predictions[-1],
        'revenue_lag_7': last_known_data['daily_revenue'].iloc[0],
        'revenue_ma_7': last_known_data['revenue_ma_7'].iloc[0],
        'revenue_ma_30': last_known_data['revenue_ma_30'].iloc[0]
    }
    
    # Predict future revenue
    future_X = pd.DataFrame([future_features])
    predicted_revenue = rf_forecast.predict(future_X)[0]
    future_predictions.append(predicted_revenue)
    
    print(f"   {future_date.strftime('%Y-%m-%d')}: ${predicted_revenue:,.2f}")




📈 STEP 4: Sales Forecasting
------------------------------
📊 Sales Forecasting Data: 1066 days
📊 Forecast Training: 1036 days
📊 Forecast Testing: 30 days

🤖 Training Sales Forecasting Model...

📈 Sales Forecasting Results:
R² Score: -0.190
RMSE: $420.95
Mean Actual Daily Revenue: $2107.45
Mean Predicted Daily Revenue: $2122.92

🔮 Next 7 Days Revenue Forecast:
   2024-12-31: $2,039.53
   2025-01-01: $2,177.60
   2025-01-02: $2,188.39
   2025-01-03: $2,224.05
   2025-01-04: $2,285.61
   2025-01-05: $2,316.92
   2025-01-06: $2,204.26


### 5. PRODUCT DEMAND FORECASTING

In [6]:
print("\n\n📦 STEP 5: Product Demand Forecasting")
print("-" * 40)

# Analyze top products for demand forecasting
top_products = transactions.groupby('product_id')['quantity'].sum().nlargest(20)
print(f"📊 Forecasting demand for top 20 products by volume")

# Product demand features
product_demand_data = []

for product_id in top_products.index:
    product_transactions = transactions[transactions['product_id'] == product_id].copy()
    product_transactions = product_transactions.set_index('transaction_date')
    
    # Daily demand
    daily_demand = product_transactions.groupby('transaction_date')['quantity'].sum()
    
    if len(daily_demand) > 30:  # Only products with sufficient history
        # Create features
        demand_df = daily_demand.to_frame('demand')
        demand_df['day_of_week'] = demand_df.index.dayofweek
        demand_df['month'] = demand_df.index.month
        demand_df['demand_lag_1'] = demand_df['demand'].shift(1)
        demand_df['demand_ma_7'] = demand_df['demand'].rolling(window=7).mean()
        
        # Get product info
        product_info = products[products['product_id'] == product_id].iloc[0]
        demand_df['price'] = product_info['price']
        demand_df['category'] = product_info['category']
        
        product_demand_data.append({
            'product_id': product_id,
            'product_name': product_info['product_name'],
            'category': product_info['category'],
            'avg_daily_demand': daily_demand.mean(),
            'demand_std': daily_demand.std(),
            'total_demand': daily_demand.sum()
        })

demand_forecast_df = pd.DataFrame(product_demand_data)
print(f"\n📊 Product Demand Analysis:")
print(demand_forecast_df.head().round(2))

# Category demand patterns
category_demand = demand_forecast_df.groupby('category').agg({
    'avg_daily_demand': 'mean',
    'demand_std': 'mean',
    'total_demand': 'sum'
}).round(2)

print(f"\n🏷️ Category Demand Patterns:")
print(category_demand.sort_values('total_demand', ascending=False))



📦 STEP 5: Product Demand Forecasting
----------------------------------------
📊 Forecasting demand for top 20 products by volume

📊 Product Demand Analysis:
   product_id                                      product_name  \
0         173  Triple-buffered logistical functionalities Crime   
1         665               Implemented background matrices Way   
2         331               Customer-focused secondary core Man   
3         600          Persistent mobile Internet solution Meet   
4         889            Public-key tangible data-warehouse Cup   

        category  avg_daily_demand  demand_std  total_demand  
0    Electronics              1.87        1.24           129  
1           Toys              2.25        1.29           128  
2  Home & Garden              1.85        1.11           126  
3         Beauty              1.61        0.99           122  
4  Home & Garden              1.69        0.98           120  

🏷️ Category Demand Patterns:
               avg_daily_deman

### 6. CUSTOMER ACQUISITION PREDICTION

In [7]:


print("\n\n👥 STEP 6: Customer Acquisition Forecasting")
print("-" * 45)

# Analyze customer acquisition patterns
customers['reg_date'] = pd.to_datetime(customers['registration_date'])
monthly_acquisitions = customers.groupby(customers['reg_date'].dt.to_period('M')).size()

print(f"📊 Monthly Customer Acquisition Trend:")
print(monthly_acquisitions.tail(12))

# Simple trend analysis
recent_months = monthly_acquisitions.tail(6)
acquisition_trend = recent_months.pct_change().mean()

print(f"\n📈 Recent Acquisition Trend: {acquisition_trend*100:+.1f}% monthly change")

# Predict next 3 months acquisition
last_3_avg = recent_months.tail(3).mean()
predicted_acquisitions = []

for i in range(3):
    predicted = last_3_avg * (1 + acquisition_trend) ** (i + 1)
    predicted_acquisitions.append(int(predicted))

print(f"\n🔮 Next 3 Months Acquisition Forecast:")
for i, pred in enumerate(predicted_acquisitions, 1):
    print(f"   Month +{i}: {pred} new customers")



👥 STEP 6: Customer Acquisition Forecasting
---------------------------------------------
📊 Monthly Customer Acquisition Trend:
reg_date
2024-09    153
2024-10    140
2024-11    115
2024-12    128
2025-01    150
2025-02    122
2025-03    170
2025-04    142
2025-05    141
2025-06    137
2025-07    139
2025-08     70
Freq: M, dtype: int64

📈 Recent Acquisition Trend: -13.6% monthly change

🔮 Next 3 Months Acquisition Forecast:
   Month +1: 99 new customers
   Month +2: 86 new customers
   Month +3: 74 new customers


### 7. RISK SCORING & SEGMENTATION

In [8]:
print("\n\n⚠️  STEP 7: Customer Risk Scoring & Segmentation")
print("-" * 50)

# Create comprehensive risk score
customers_ml['risk_score'] = (
    customers_ml['churn_probability'] * 0.4 +  # Churn risk
    (customers_ml['days_since_last'] / customers_ml['days_since_last'].max()) * 0.3 +  # Recency risk
    (1 - customers_ml['total_spent'] / customers_ml['total_spent'].max()) * 0.2 +  # Value risk
    (customers_ml['support_tickets'] / customers_ml['support_tickets'].max()) * 0.1  # Support risk
)

# Risk segments
customers_ml['risk_segment'] = pd.cut(customers_ml['risk_score'], 
                                     bins=[0, 0.3, 0.6, 1.0], 
                                     labels=['Low Risk', 'Medium Risk', 'High Risk'])

risk_analysis = customers_ml.groupby('risk_segment').agg({
    'customer_id': 'count',
    'total_spent': ['sum', 'mean'],
    'churn_probability': 'mean',
    'is_churned': 'mean'
}).round(3)

print(f"⚠️  Customer Risk Segmentation:")
print(risk_analysis)

# High-value at-risk customers
high_value_at_risk = customers_ml[
    (customers_ml['risk_segment'] == 'High Risk') & 
    (customers_ml['total_spent'] > customers_ml['total_spent'].quantile(0.75))
]

print(f"\n🚨 High-Value At-Risk Customers:")
print(f"   Count: {len(high_value_at_risk)}")
print(f"   Total Value: ${high_value_at_risk['total_spent'].sum():,.2f}")
print(f"   Avg Churn Probability: {high_value_at_risk['churn_probability'].mean():.1%}")



⚠️  STEP 7: Customer Risk Scoring & Segmentation
--------------------------------------------------
⚠️  Customer Risk Segmentation:
             customer_id total_spent          churn_probability is_churned
                   count         sum     mean              mean       mean
risk_segment                                                              
Low Risk             408   347002.18  850.496             0.248      0.064
Medium Risk         4577  1927150.91  421.051             0.300      0.316
High Risk             15     3110.49  207.366             0.602      0.867

🚨 High-Value At-Risk Customers:
   Count: 0
   Total Value: $0.00
   Avg Churn Probability: nan%


### 8. MODEL PERFORMANCE SUMMARY

In [9]:


print("\n\n📊 MODEL PERFORMANCE SUMMARY")
print("="*50)

model_performance = pd.DataFrame({
    'Model': ['Churn Prediction (RF)', 'Churn Prediction (LR)', 
              'CLV Prediction (RF)', 'CLV Prediction (LR)',
              'Sales Forecasting (RF)'],
    'Metric': ['Accuracy', 'Accuracy', 'R²', 'R²', 'R²'],
    'Score': [accuracy_score(y_test, rf_pred), 
              accuracy_score(y_test, lr_pred),
              rf_r2, lr_r2, forecast_r2],
    'Status': ['✅ Good' if accuracy_score(y_test, rf_pred) > 0.8 else '⚠️  Fair',
               '✅ Good' if accuracy_score(y_test, lr_pred) > 0.8 else '⚠️  Fair',
               '✅ Good' if rf_r2 > 0.7 else '⚠️  Fair',
               '✅ Good' if lr_r2 > 0.7 else '⚠️  Fair',
               '✅ Good' if forecast_r2 > 0.7 else '⚠️  Fair']
})

print(model_performance.round(3))




📊 MODEL PERFORMANCE SUMMARY
                    Model    Metric  Score    Status
0   Churn Prediction (RF)  Accuracy  0.702  ⚠️  Fair
1   Churn Prediction (LR)  Accuracy  0.703  ⚠️  Fair
2     CLV Prediction (RF)        R²  0.986    ✅ Good
3     CLV Prediction (LR)        R²  0.930    ✅ Good
4  Sales Forecasting (RF)        R² -0.190  ⚠️  Fair


### 9. BUSINESS IMPACT PREDICTIONS

In [10]:


print("\n\n💼 BUSINESS IMPACT PREDICTIONS")
print("="*40)

# Revenue at risk from churn
revenue_at_risk = high_risk_customers['total_spent'].sum()
monthly_revenue_loss = revenue_at_risk * 0.1  # Assume 10% monthly churn

print(f"💰 Financial Impact Predictions:")
print(f"   Revenue at Risk (High Churn Probability): ${revenue_at_risk:,.2f}")
print(f"   Estimated Monthly Loss: ${monthly_revenue_loss:,.2f}")

# CLV predictions
total_predicted_clv = active_customers['predicted_clv'].sum()
current_total_spent = active_customers['total_spent'].sum()
clv_growth_potential = total_predicted_clv - current_total_spent

print(f"\n📈 Customer Lifetime Value Insights:")
print(f"   Current Total Customer Value: ${current_total_spent:,.2f}")
print(f"   Predicted Total CLV: ${total_predicted_clv:,.2f}")
print(f"   Growth Potential: ${clv_growth_potential:,.2f}")

# Sales forecasting impact
weekly_forecast = sum(future_predictions)
print(f"\n📊 Short-term Revenue Forecast:")
print(f"   Next 7 Days Predicted Revenue: ${weekly_forecast:,.2f}")
print(f"   Daily Average: ${weekly_forecast/7:,.2f}")




💼 BUSINESS IMPACT PREDICTIONS
💰 Financial Impact Predictions:
   Revenue at Risk (High Churn Probability): $5,126.54
   Estimated Monthly Loss: $512.65

📈 Customer Lifetime Value Insights:
   Current Total Customer Value: $1,605,801.97
   Predicted Total CLV: $1,603,370.27
   Growth Potential: $-2,431.70

📊 Short-term Revenue Forecast:
   Next 7 Days Predicted Revenue: $15,436.36
   Daily Average: $2,205.19


### 10. ACTIONABLE INSIGHTS & RECOMMENDATIONS

In [11]:
print("\n\n💡 PREDICTIVE INSIGHTS & RECOMMENDATIONS")
print("="*50)

print("🔮 WHAT WILL HAPPEN - KEY PREDICTIONS:")
print(f"\n🎯 Customer Churn:")
print(f"   • {len(high_risk_customers)} customers likely to churn (>70% probability)")
print(f"   • ${revenue_at_risk:,.2f} in revenue at risk")
print(f"   • Primary churn drivers: {', '.join(feature_importance.head(3)['feature'].tolist())}")

print(f"\n💰 Customer Value:")
print(f"   • Premium CLV segment: {len(active_customers[active_customers['clv_segment'] == 'Premium'])} customers")
print(f"   • Growth potential: ${clv_growth_potential:,.2f}")

print(f"\n📈 Sales Forecast:")
print(f"   • Next week revenue: ${weekly_forecast:,.2f}")
print(f"   • Model confidence: {forecast_r2:.1%}")

print(f"\n📦 Product Demand:")
print(f"   • Top category by demand: {category_demand.index[0]}")
print(f"   • Demand variability highest in: {demand_forecast_df.nlargest(1, 'demand_std')['category'].iloc[0]}")

print(f"\n🎯 IMMEDIATE ACTIONS RECOMMENDED:")
print("   ✅ Launch retention campaign for high-risk customers")
print("   ✅ Focus on Premium CLV segment for upselling")
print("   ✅ Optimize inventory for forecasted demand")
print("   ✅ Implement early warning system for churn indicators")

print(f"\n⚡ NEXT: Prescriptive Analytics - What Should We Do?")
print("="*50)

# Add predicted CLV to main dataframe for all customers
customers_ml['predicted_clv'] = 0.0  # Default for churned customers
customers_ml.loc[customers_ml['is_churned'] == 0, 'predicted_clv'] = active_customers['predicted_clv'].values

# Save predictions for prescriptive analysis
predictions_summary = {
    'high_risk_customers': len(high_risk_customers),
    'revenue_at_risk': float(revenue_at_risk),
    'weekly_forecast': float(weekly_forecast),
    'clv_growth_potential': float(clv_growth_potential),
    'top_churn_features': feature_importance.head(5)['feature'].tolist(),
    'analysis_date': datetime.now().isoformat()
}

with open('predictions_summary.json', 'w') as f:
    json.dump(predictions_summary, f, default=str)

# Save customer predictions
customers_ml[['customer_id', 'churn_probability', 'predicted_clv', 'risk_segment']].to_csv('customer_predictions.csv', index=False)

print("\n✅ Predictive analysis complete! Predictions saved for prescriptive recommendations.")



💡 PREDICTIVE INSIGHTS & RECOMMENDATIONS
🔮 WHAT WILL HAPPEN - KEY PREDICTIONS:

🎯 Customer Churn:
   • 9 customers likely to churn (>70% probability)
   • $5,126.54 in revenue at risk
   • Primary churn drivers: purchase_frequency, avg_order_value, total_spent

💰 Customer Value:
   • Premium CLV segment: 878 customers
   • Growth potential: $-2,431.70

📈 Sales Forecast:
   • Next week revenue: $15,436.36
   • Model confidence: -19.0%

📦 Product Demand:
   • Top category by demand: Beauty
   • Demand variability highest in: Toys

🎯 IMMEDIATE ACTIONS RECOMMENDED:
   ✅ Launch retention campaign for high-risk customers
   ✅ Focus on Premium CLV segment for upselling
   ✅ Optimize inventory for forecasted demand

⚡ NEXT: Prescriptive Analytics - What Should We Do?

✅ Predictive analysis complete! Predictions saved for prescriptive recommendations.
