# Defining suggestions for segmentations

As was mentioned on "`02_ab_tesst_analysis.ipynb`" file, I suggest a new test with multiple groups. For some of these groups are necesasry a deeper understanding in a segmentation. 

<br></br>
### RFM Segmentation 

**What is?**

RFM segmentation (Recency, Frequency, Monetary) is a data-driven technique used in marketing and customer analytics to classify customers based on their purchasing behavior. It evaluates:

- Recency (R): How recently a customer made a purchase
- Frequency (F): How often a customer makes purchases.
- Monetary (M): How much a customer spends.

By scoring customers on these three dimensions, businesses can identify high-value customers, predict churn, and tailor marketing strategies effectively.

I'll discuss more details of how it works later on. 

For now, I see that this segmentation can help for both **"Hybrid Segment-Based Group"**, **"Tiered AOV Group"** and **"Frequency Accelerator Group"**

### Limitations

- As my database is very limited (only data of orders in the 30 days, period of the campaign) I'll not be able to do a full analysis with RFM because this segmentation implies that we can see data of users that are more time since the last order

<br></br>

# Loading Data

In [2]:
# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

In [3]:
df_orders = pd.read_parquet("../data/processed/orders_processed.parquet")
df_ab_test = pd.read_parquet("../data/processed/ab_test.parquet")
df_consumers = pd.read_parquet("../data/processed/consumers_processed.parquet")

## Calculating RFM Metrics

In [4]:
# First, calculate the three core metrics for each customer:
#   CHANGE THIS CODE TO CONSIDER OTHER CONSUMERS IN DF_CONSUMERS THAT IS NOT HERE AS "LOST" OR SOMETHING ELSE

# Set the analysis end date (most recent date in your dataset)
analysis_end_date = df_orders['order_created_at'].max()

# Calculate RFM metrics
rfm = df_orders.groupby('customer_id').agg({
    'order_created_at': lambda x: (analysis_end_date - x.max()).days,  # Recency
    'order_id': 'count',  # Frequency
    'order_total_amount': 'mean'  # Monetar
}).reset_index()

# Also calculate total spend for additional context
rfm['total_spend'] = df_orders.groupby('customer_id')['order_total_amount'].sum().reset_index()['order_total_amount']

# Rename columns
rfm.columns = ['customer_id', 'recency_days', 'frequency', 'avg_order_value', 'total_spend']

In [None]:
# Step 2: Create Binary Segment Indicators
# Instead of quartile scoring, create binary indicators that will help form hybrid segments:

# Define thresholds based on business knowledge or distribution analysis
# These thresholds should be adjusted based on your specific data

# Recency thresholds
recent_threshold = 30  # Consider active if ordered in last 30 days
rfm['is_active'] = rfm['recency_days'] <= recent_threshold
rfm['is_inactive'] = rfm['recency_days'] > 90  # Inactive if no orders in 90+ days

# Frequency thresholds
high_freq_threshold = rfm['frequency'].quantile(0.75)  # Top 25% by frequency
low_freq_threshold = rfm['frequency'].quantile(0.25)   # Bottom 25% by frequency
rfm['is_high_frequency'] = rfm['frequency'] >= high_freq_threshold
rfm['is_low_frequency'] = rfm['frequency'] <= low_freq_threshold

# AOV thresholds
high_aov_threshold = rfm['avg_order_value'].quantile(0.75)  # Top 25% by AOV
low_aov_threshold = rfm['avg_order_value'].quantile(0.25)   # Bottom 25% by AOV
rfm['is_high_aov'] = rfm['avg_order_value'] >= high_aov_threshold
rfm['is_low_aov'] = rfm['avg_order_value'] <= low_aov_threshold

# Total value thresholds
high_value_threshold = rfm['total_spend'].quantile(0.75)  # Top 25% by total spend
rfm['is_high_value'] = rfm['total_spend'] >= high_value_threshold

# New customer indicator (low frequency but recent)
rfm['is_new'] = (rfm['frequency'] <= 3) & (rfm['recency_days'] <= 60)

In [None]:
# Step 3: Create Hybrid Segments
# Now, combine these indicators to create meaningful hybrid segments that align with different incentive strategies:

def create_hybrid_segment(row):
    # High-Frequency, Low-AOV Segment (needs AOV-focused incentives)
    if row['is_high_frequency'] and row['is_low_aov'] and row['is_active']:
        return 'Frequent Small Baskets'
    
    # Low-Frequency, High-AOV Segment (needs frequency-focused incentives)
    elif row['is_low_frequency'] and row['is_high_aov'] and row['is_active']:
        return 'Big Spenders, Rare Visits'
    
    # High-Value, Recently Inactive (needs win-back incentives)
    elif row['is_high_value'] and not row['is_active'] and not row['is_inactive']:
        return 'At-Risk High Value'
    
    # High-Value, Long Inactive (needs aggressive win-back)
    elif row['is_high_value'] and row['is_inactive']:
        return 'Churned High Value'
    
    # New Customers (needs onboarding incentives)
    elif row['is_new']:
        return 'New Explorers'
    
    # Consistent High-Value Customers (needs loyalty incentives)
    elif row['is_high_frequency'] and row['is_high_aov'] and row['is_active']:
        return 'VIP Customers'
    
    # Average Active Customers (needs general growth incentives)
    elif row['is_active'] and not row['is_low_frequency'] and not row['is_high_frequency']:
        return 'Core Customers'
    
    # Low-Value Inactive (needs reactivation but lower priority)
    elif row['is_inactive'] and not row['is_high_value']:
        return 'Dormant Low Value'
    
    # Catch-all for any remaining customers
    else:
        return 'Other Customers'

# Apply the segmentation function
rfm['hybrid_segment'] = rfm.apply(create_hybrid_segment, axis=1)

In [None]:
# Step 4: Analyze Segment Characteristics
# Examine the characteristics of each segment to validate your grouping and refine incentive strategies:

# Calculate segment metrics
segment_profile = rfm.groupby('hybrid_segment').agg({
    'customer_id': 'count',
    'recency_days': 'mean',
    'frequency': 'mean',
    'avg_order_value': 'mean',
    'total_spend': 'mean'
}).reset_index()

# Calculate percentage of total customer base
segment_profile['pct_customers'] = segment_profile['customer_id'] / segment_profile['customer_id'].sum() * 100

# Sort by customer count to see largest segments first
segment_profile = segment_profile.sort_values('customer_id', ascending=False)

print(segment_profile)

In [None]:
# Step 5: Map Segments to Incentive Strategies
# Now, map each hybrid segment to a specific incentive strategy:

incentive_mapping = {
    'Frequent Small Baskets': {
        'incentive_type': 'AOV-focused',
        'strategy': 'Tiered discounts based on basket size',
        'example': '15% off orders over R$50, 25% off over R$70, 35% off over R$90',
        'primary_goal': 'Increase basket size',
        'secondary_goal': 'Maintain frequency'
    },
    'Big Spenders, Rare Visits': {
        'incentive_type': 'Frequency-focused',
        'strategy': 'Rewards for consistent ordering',
        'example': 'Order 3 times this month, get R$50 off your next order',
        'primary_goal': 'Increase ordering frequency',
        'secondary_goal': 'Maintain AOV'
    },
    'At-Risk High Value': {
        'incentive_type': 'Retention-focused',
        'strategy': 'Personalized win-back offers',
        'example': 'We miss you! 40% off your next order this week only',
        'primary_goal': 'Prevent churn',
        'secondary_goal': 'Reestablish regular ordering pattern'
    },
    'Churned High Value': {
        'incentive_type': 'Reactivation-focused',
        'strategy': 'Aggressive win-back with high value',
        'example': 'Come back and enjoy 50% off your next 2 orders',
        'primary_goal': 'Reactivate valuable customer',
        'secondary_goal': 'Rebuild loyalty'
    },
    'New Explorers': {
        'incentive_type': 'Exploration-focused',
        'strategy': 'Progressive engagement rewards',
        'example': 'Try 3 different restaurant categories, get increasing discounts with each',
        'primary_goal': 'Establish broad platform usage',
        'secondary_goal': 'Build ordering habits'
    },
    'VIP Customers': {
        'incentive_type': 'Loyalty-focused',
        'strategy': 'Premium benefits and exclusivity',
        'example': 'VIP-only offers and early access to new features',
        'primary_goal': 'Increase share of wallet',
        'secondary_goal': 'Maintain loyalty and prevent competitive switching'
    },
    'Core Customers': {
        'incentive_type': 'Growth-focused',
        'strategy': 'Balanced incentives for both frequency and AOV',
        'example': 'Order 4+ times this month AND spend R$60+ per order, get R$75 off',
        'primary_goal': 'Increase overall engagement',
        'secondary_goal': 'Move toward VIP segment'
    },
    'Dormant Low Value': {
        'incentive_type': 'Low-cost reactivation',
        'strategy': 'Simple, low-investment offers',
        'example': 'Free delivery on your next order',
        'primary_goal': 'Reactivate with minimal investment',
        'secondary_goal': 'Assess potential for growth'
    },
    'Other Customers': {
        'incentive_type': 'Testing-focused',
        'strategy': 'Experimental offers to determine response patterns',
        'example': 'Rotate through different incentive types',
        'primary_goal': 'Identify effective incentive type',
        'secondary_goal': 'Move to a more defined segment'
    }
}

In [None]:
# Step 6: Visualize Segment Distribution and Characteristics
# Create visualizations to better understand your segments:

# Segment size visualization
plt.figure(figsize=(12, 6))
sns.barplot(x='hybrid_segment', y='pct_customers', data=segment_profile)
plt.title('Customer Distribution by Hybrid Segment')
plt.xticks(rotation=45, ha='right')
plt.ylabel('Percentage of Customers')
plt.tight_layout()
plt.show()

# Segment characteristics visualization
plt.figure(figsize=(14, 10))
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Recency by segment
sns.barplot(x='hybrid_segment', y='recency_days', data=segment_profile, ax=axes[0, 0])
axes[0, 0].set_title('Average Recency by Segment (Days)')
axes[0, 0].set_xticklabels(axes[0, 0].get_xticklabels(), rotation=45, ha='right')

# Frequency by segment
sns.barplot(x='hybrid_segment', y='frequency', data=segment_profile, ax=axes[0, 1])
axes[0, 1].set_title('Average Frequency by Segment (Orders)')
axes[0, 1].set_xticklabels(axes[0, 1].get_xticklabels(), rotation=45, ha='right')

# AOV by segment
sns.barplot(x='hybrid_segment', y='avg_order_value', data=segment_profile, ax=axes[1, 0])
axes[1, 0].set_title('Average Order Value by Segment (R$)')
axes[1, 0].set_xticklabels(axes[1, 0].get_xticklabels(), rotation=45, ha='right')

# Total spend by segment
sns.barplot(x='hybrid_segment', y='total_spend', data=segment_profile, ax=axes[1, 1])
axes[1, 1].set_title('Total Customer Spend by Segment (R$)')
axes[1, 1].set_xticklabels(axes[1, 1].get_xticklabels(), rotation=45, ha='right')

plt.tight_layout()
plt.show()

In [None]:
# Step 7: Create a Segment Migration Strategy
# Define pathways for customers to move between segments:
# Define ideal segment progression paths
segment_progression = {
    'New Explorers': ['Core Customers', 'Frequent Small Baskets', 'VIP Customers'],
    'Frequent Small Baskets': ['Core Customers', 'VIP Customers'],
    'Big Spenders, Rare Visits': ['Core Customers', 'VIP Customers'],
    'Core Customers': ['VIP Customers'],
    'Dormant Low Value': ['Core Customers', 'Frequent Small Baskets'],
    'At-Risk High Value': ['VIP Customers', 'Core Customers'],
    'Churned High Value': ['At-Risk High Value', 'Core Customers', 'VIP Customers']
}

# Visualize the segment progression paths
# (This would require a network visualization library like networkx)

In [None]:
#Step 8: Implement A/B Testing Framework
# Set up your A/B test to evaluate the effectiveness of the hybrid segment-based approach:

# For each segment, create test and control groups
segments = rfm['hybrid_segment'].unique()
test_assignment = {}

for segment in segments:
    # Get customers in this segment
    segment_customers = rfm[rfm['hybrid_segment'] == segment]['customer_id'].tolist()
    
    # Randomly assign to test or control (80% test, 20% control)
    np.random.seed(42)  # For reproducibility
    test_mask = np.random.choice([True, False], size=len(segment_customers), p=[0.8, 0.2])
    
    test_customers = [customer for i, customer in enumerate(segment_customers) if test_mask[i]]
    control_customers = [customer for i, customer in enumerate(segment_customers) if not test_mask[i]]
    
    test_assignment[segment] = {
        'test_customers': test_customers,
        'control_customers': control_customers,
        'incentive_strategy': incentive_mapping[segment]['strategy'],
        'incentive_example': incentive_mapping[segment]['example']
    }

# Create a dataframe with test assignments for implementation
test_assignment_df = []
for segment, data in test_assignment.items():
    for customer in data['test_customers']:
        test_assignment_df.append({
            'customer_id': customer,
            'segment': segment,
            'test_group': 'test',
            'incentive_strategy': data['incentive_strategy'],
            'incentive_example': data['incentive_example']
        })
    for customer in data['control_customers']:
        test_assignment_df.append({
            'customer_id': customer,
            'segment': segment,
            'test_group': 'control',
            'incentive_strategy': 'No incentive (control)',
            'incentive_example': 'No incentive (control)'
        })

test_assignment_df = pd.DataFrame(test_assignment_df)

In [None]:
# Step 9: Define Segment-Specific Success Metrics
# Create a measurement framework tailored to each segment's goals:

segment_metrics = {
    'Frequent Small Baskets': {
        'primary_metric': 'avg_order_value',
        'secondary_metrics': ['orders_per_month', 'retention_rate'],
        'success_threshold': 'AOV increase of 15%+'
    },
    'Big Spenders, Rare Visits': {
        'primary_metric': 'orders_per_month',
        'secondary_metrics': ['avg_order_value', 'days_between_orders'],
        'success_threshold': 'Frequency increase of 30%+'
    },
    'At-Risk High Value': {
        'primary_metric': 'retention_rate',
        'secondary_metrics': ['orders_per_month', 'avg_order_value'],
        'success_threshold': 'Retention increase of 40%+'
    },
    'Churned High Value': {
        'primary_metric': 'reactivation_rate',
        'secondary_metrics': ['orders_after_reactivation', 'avg_order_value'],
        'success_threshold': 'Reactivation of 25%+ of segment'
    },
    'New Explorers': {
        'primary_metric': 'category_diversity',
        'secondary_metrics': ['retention_rate', 'orders_per_month'],
        'success_threshold': 'Average of 3+ categories tried'
    },
    'VIP Customers': {
        'primary_metric': 'share_of_wallet',
        'secondary_metrics': ['avg_order_value', 'orders_per_month'],
        'success_threshold': 'AOV increase of 10%+'
    },
    'Core Customers': {
        'primary_metric': 'customer_value_growth',
        'secondary_metrics': ['orders_per_month', 'avg_order_value'],
        'success_threshold': 'Value growth of 20%+'
    },
    'Dormant Low Value': {
        'primary_metric': 'reactivation_rate',
        'secondary_metrics': ['orders_after_reactivation'],
        'success_threshold': 'Reactivation of 15%+ of segment'
    }
}