# User Behavior & Conversion Analysis

**Analysis of user behavior, conversion funnels, and upgrade patterns for an AI content generation web application.**

---

## Table of Contents
1. [Setup & Data Loading](#setup)
2. [Funnel Analysis](#funnel)
3. [Segmentation Analysis](#segmentation)
4. [Behavioral Signals](#behavioral)
5. [Key Insights & Recommendations](#insights)

<a id='setup'></a>
## 1. Setup & Data Loading

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Add scripts to path
sys.path.insert(0, '../scripts')

from data_loader import load_all_data, validate_data
from funnel_analysis import build_funnel, calculate_retention, calculate_30day_upgrade_rate
from segmentation import segment_by_country, segment_by_device, segment_by_source
from behavioral_metrics import behavioral_metrics, high_intent_analysis, calculate_engagement_score
from visualization import *

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úì Libraries imported successfully")

### Load Data

In [None]:
# Load all datasets
users, events, payments = load_all_data(
    users_path='../data/raw/users.csv',
    events_path='../data/raw/events.csv',
    payments_path='../data/raw/payments.csv'
)

### Data Overview

In [None]:
print("Users Dataset:")
print(users.head())
print(f"\nShape: {users.shape}")
print(f"\nData Types:\n{users.dtypes}")

In [None]:
print("Events Dataset:")
print(events.head())
print(f"\nShape: {events.shape}")
print(f"\nEvent Types:\n{events['event_name'].value_counts()}")

In [None]:
print("Payments Dataset:")
print(payments.head())
print(f"\nShape: {payments.shape}")
print(f"\nPlan Types:\n{payments['plan_type'].value_counts()}")

### Validate Data Quality

In [None]:
validate_data(users, events, payments)

---
<a id='funnel'></a>
## 2. Funnel Analysis

### Build 4-Step Conversion Funnel

**Funnel Steps:**
1. **Signed Up** - All users in users.csv
2. **Viewed a Feature** - Users with at least one 'viewed_feature' event
3. **Returned within 7 days** - Users with any event within 7 days of signup
4. **Upgraded to Paid** - Users present in payments.csv

In [None]:
# Build funnel
funnel = build_funnel(users, events, payments)

# Display funnel metrics
display_funnel = funnel[['Step', 'Users', 'Conversion_Rate', 'Pct_of_Signups']].copy()
print("\nüìä CONVERSION FUNNEL\n")
print(display_funnel.to_string(index=False))

# Save to CSV
display_funnel.to_csv('../outputs/tables/funnel_metrics.csv', index=False)

### Visualize Funnel

In [None]:
plot_funnel(funnel)

### Drop-off Analysis

In [None]:
# Calculate drop-offs between steps
print("\nüìâ DROP-OFF ANALYSIS\n")
for i in range(1, len(funnel)):
    step_from = funnel.iloc[i-1]
    step_to = funnel.iloc[i]
    drop_off = step_from['Users'] - step_to['Users']
    drop_off_pct = (drop_off / step_from['Users']) * 100
    
    print(f"{step_from['Step']} ‚Üí {step_to['Step']}")
    print(f"  Lost: {drop_off:,} users ({drop_off_pct:.1f}%)")
    print(f"  Conversion Rate: {step_to['Conversion_Rate']:.1f}%\n")

### Weekly Retention Analysis

In [None]:
# Calculate weekly retention
retention = calculate_retention(users, events, weeks=12)

print("\nüìà WEEKLY RETENTION RATES\n")
print(retention.head(8).to_string(index=False))

# Save to CSV
retention.to_csv('../outputs/tables/retention_metrics.csv', index=False)

In [None]:
# Visualize retention curve
plot_retention_curve(retention)

### 30-Day Upgrade Rate

In [None]:
# Calculate 30-day upgrade rate
upgrade_30d = calculate_30day_upgrade_rate(users, payments)

print("\nüéØ 30-DAY UPGRADE RATE\n")
print(f"Total Users: {upgrade_30d['total_users']:,}")
print(f"Upgraded within 30 days: {upgrade_30d['upgraded_30d']:,}")
print(f"Upgrade Rate: {upgrade_30d['upgrade_rate_30d']:.2f}%")

---
<a id='segmentation'></a>
## 3. Segmentation Analysis

### Segment by Country

In [None]:
# Segment by country
country_seg = segment_by_country(users, events, payments)
print("\nüåç SEGMENTATION BY COUNTRY\n")
print(country_seg[['country', 'Signups', 'Viewed_Feature', 'Returned_7d', 'Upgraded', 'Upgrade_Rate']].to_string(index=False))

# Save to CSV
country_seg.to_csv('../outputs/tables/segment_country.csv', index=False)

In [None]:
# Visualize country segmentation
plot_segment_comparison(country_seg, 'country')

### Segment by Device

In [None]:
# Segment by device
device_seg = segment_by_device(users, events, payments)

print("\nüì± SEGMENTATION BY DEVICE\n")
print(device_seg[['device', 'Signups', 'Viewed_Feature', 'Returned_7d', 'Upgraded', 'Upgrade_Rate']].to_string(index=False))

# Save to CSV
device_seg.to_csv('../outputs/tables/segment_device.csv', index=False)

In [None]:
# Visualize device segmentation
plot_segment_comparison(device_seg, 'device')

### Segment by Acquisition Source

In [None]:
# Segment by source
source_seg = segment_by_source(users, events, payments)

print("\nüîç SEGMENTATION BY SOURCE\n")
print(source_seg[['source', 'Signups', 'Viewed_Feature', 'Returned_7d', 'Upgraded', 'Upgrade_Rate']].to_string(index=False))

# Save to CSV
source_seg.to_csv('../outputs/tables/segment_source.csv', index=False)

In [None]:
# Visualize source segmentation
plot_segment_comparison(source_seg, 'source')

### Segment Comparison Summary

In [None]:
print("\nüìä BEST & WORST PERFORMING SEGMENTS\n")

print("Best Country:", country_seg.iloc[0]['country'], f"({country_seg.iloc[0]['Upgrade_Rate']:.1f}%)")
print("Worst Country:", country_seg.iloc[-1]['country'], f"({country_seg.iloc[-1]['Upgrade_Rate']:.1f}%)")

print("\nBest Device:", device_seg.iloc[0]['device'], f"({device_seg.iloc[0]['Upgrade_Rate']:.1f}%)")
print("Worst Device:", device_seg.iloc[-1]['device'], f"({device_seg.iloc[-1]['Upgrade_Rate']:.1f}%)")

print("\nBest Source:", source_seg.iloc[0]['source'], f"({source_seg.iloc[0]['Upgrade_Rate']:.1f}%)")
print("Worst Source:", source_seg.iloc[-1]['source'], f"({source_seg.iloc[-1]['Upgrade_Rate']:.1f}%)")

---
<a id='behavioral'></a>
## 4. Behavioral Signals

### Compare Upgraded vs Non-Upgraded Users

In [None]:
# Calculate behavioral metrics
behavior = behavioral_metrics(users, events, payments)

# Compare by upgrade status
comparison = behavior.groupby('is_upgraded').agg({
    'total_events': ['mean', 'median'],
    'distinct_events': ['mean', 'median'],
    'days_active': ['mean', 'median'],
    'days_to_feature': ['mean', 'median']
}).round(2)

print("\nüîç BEHAVIORAL COMPARISON\n")
print(comparison)

# Save to CSV
behavior.to_csv('../outputs/tables/behavioral_metrics.csv', index=False)

In [None]:
# Visualize behavioral differences
plot_behavioral_comparison(behavior)

### High-Intent Behaviors

In [None]:
# Analyze high-intent behaviors
intent = high_intent_analysis(users, events, payments)

print("\nüéØ HIGH-INTENT BEHAVIORS & CONVERSION RATES\n")
print(intent.to_string(index=False))

# Save to CSV
intent.to_csv('../outputs/tables/high_intent_signals.csv', index=False)

In [None]:
# Visualize intent signals
baseline_rate = funnel.iloc[3]['Pct_of_Signups']
plot_intent_signals(intent, baseline_rate)

### Engagement Score Analysis

In [None]:
# Calculate engagement scores
engagement = calculate_engagement_score(users, events)

# Merge with upgrade status
engagement_with_status = engagement.merge(
    behavior[['user_id', 'is_upgraded']], 
    on='user_id'
)

# Compare engagement scores
print("\nüìä ENGAGEMENT SCORE COMPARISON\n")
score_comparison = engagement_with_status.groupby('is_upgraded')['engagement_score'].describe().round(2)
print(score_comparison)

# Save to CSV
engagement.to_csv('../outputs/tables/engagement_scores.csv', index=False)

---
<a id='insights'></a>
## 5. Key Insights & Recommendations

### Summary of Key Findings

In [None]:
print("\n" + "="*60)
print("KEY FINDINGS SUMMARY")
print("="*60)

# Funnel insights
print("\n1. FUNNEL PERFORMANCE")
step2_conversion = funnel.iloc[1]['Conversion_Rate']
step3_conversion = funnel.iloc[2]['Conversion_Rate']
step4_conversion = funnel.iloc[3]['Conversion_Rate']
overall_conversion = funnel.iloc[3]['Pct_of_Signups']

print(f"   ‚Ä¢ Overall conversion rate: {overall_conversion:.1f}%")
print(f"   ‚Ä¢ Biggest drop-off: Identify from conversion rates above")
print(f"   ‚Ä¢ 30-day upgrade rate: {upgrade_30d['upgrade_rate_30d']:.2f}%")

# Segmentation insights
print("\n2. TOP PERFORMING SEGMENTS")
print(f"   ‚Ä¢ Country: {country_seg.iloc[0]['country']} ({country_seg.iloc[0]['Upgrade_Rate']:.1f}%)")
print(f"   ‚Ä¢ Device: {device_seg.iloc[0]['device']} ({device_seg.iloc[0]['Upgrade_Rate']:.1f}%)")
print(f"   ‚Ä¢ Source: {source_seg.iloc[0]['source']} ({source_seg.iloc[0]['Upgrade_Rate']:.1f}%)")

# Behavioral insights
print("\n3. CONVERSION SIGNALS")
upgraded_avg_events = behavior[behavior['is_upgraded']==True]['total_events'].mean()
not_upgraded_avg_events = behavior[behavior['is_upgraded']==False]['total_events'].mean()
print(f"   ‚Ä¢ Avg events (upgraded): {upgraded_avg_events:.1f}")
print(f"   ‚Ä¢ Avg events (not upgraded): {not_upgraded_avg_events:.1f}")
print(f"   ‚Ä¢ Difference: {upgraded_avg_events - not_upgraded_avg_events:.1f}x more active")

if len(intent) > 0:
    top_intent = intent.iloc[0]
    print(f"   ‚Ä¢ Top intent signal: {top_intent['Behavior']} ({top_intent['Conversion_Rate']:.1f}% conversion)")

print("\n" + "="*60)

### Actionable Recommendations

Based on the analysis, here are data-backed recommendations:

**1. Improve User Activation**
- Focus on getting users to view features early (Step 2 conversion)
- Implement onboarding flows for best-performing segments
- Create device-specific experiences

**2. Optimize Conversion**
- Target high-intent behaviors identified in the analysis
- Double down on best-performing acquisition channels
- Create segment-specific pricing strategies

**3. Improve Retention**
- Monitor weekly retention and intervene when users become inactive
- Encourage multiple distinct event types in first 3 days
- Build features that increase days active

See the full report for detailed recommendations.

---

## Analysis Complete!

All results have been exported to:
- **Tables**: `outputs/tables/`
- **Figures**: `outputs/figures/`
- **Full Report**: `reports/User_Conversion_Analysis_Report.md`