---
title: Limitations & Next Steps
---

# 12. Limitations & Next Steps

### Data Limitations

- This is **simulated data for educational purposes** - not real customer data
- Analysis covers only one year - seasonal patterns may not be fully captured
- We don't have product category detail - can't distinguish premium vs. budget items
- Missing some demographic information (some households have null values)

### Methodological Limitations

- We used simple quartile-based segmentation - more sophisticated clustering could reveal additional segments
- We didn't account for customer lifetime value or churn risk
- Correlation doesn't prove causation - we observe patterns but can't definitively say promotions _cause_ higher spending

### Future Enhancements

**If this were a real project, we would:**

1. **Add category-level analysis**

   - Are smart shoppers buying premium products on sale or budget items?
   - Which product categories drive the spending difference?

2. **Incorporate time-series analysis**

   - Do smart shoppers shop more frequently during promotional periods?
   - Are there seasonal patterns in promotion sensitivity?

3. **Build predictive models**

   - Propensity modeling: Who is likely to become a smart shopper?
   - Customer lifetime value models: What's the long-term value of each segment?

4. **Test with A/B experiments**

   - Randomly assign households to different promotional intensities
   - Measure incremental impact of promotions on revenue

5. **Expand demographic analysis**
   - Bring in additional data (location, shopping channel, payment method)
   - Build more sophisticated customer personas


## 13. Conclusion

This analysis set out to answer a critical business question: **Are promotion-heavy customers 
high-value "smart shoppers" or low-margin "cherry-pickers"?**

Through rigorous analysis of 1.47M transactions across 801 households, we found that:

[COMPLETE AFTER RUNNING ANALYSIS - summarize your main findings in 2-3 sentences]

For Regork, this means [strategic implication]. Rather than viewing promotions as a necessary 
evil that erodes margins, the data suggests [your conclusion based on findings].

The recommended path forward is to [action], with a focus on [specific segments], while 
[what to avoid or deprioritize].

By implementing these recommendations, Regork can [expected business outcome].


In [None]:

print("\n" + "=" * 80)
print("FINAL SUMMARY STATISTICS")


pop_data = {
    'Metric': [
        'Total Households Analyzed',
        'Total Transactions',
        'Date Range'
    ],
    'Value': [
        f'{len(households):,}',
        f'{len(transactions):,}',
        f"{transactions['transaction_timestamp'].min().date()} to {transactions['transaction_timestamp'].max().date()}"
    ]
}
print("\nðŸ“Š Overall Population:")
display(pd.DataFrame(pop_data).set_index('Metric'))


promo_heavy_count = (households['promo_segment'] == 'promo_heavy').sum()
promo_heavy_pct = promo_heavy_count / len(households) * 100
promo_data = {
    'Metric': [
        'Promo-Heavy Households',
        'Avg Discount Rate (Promo-Heavy)',
        'Avg Discount Rate (Promo-Light)'
    ],
    'Value': [
        f'{promo_heavy_count:,} ({promo_heavy_pct:.1f}%)',
        f"{households[households['promo_segment']=='promo_heavy']['discount_share'].mean():.1%}",
        f"{households[households['promo_segment']=='promo_light']['discount_share'].mean():.1%}"
    ]
}
print("\nðŸ“Š Promotion Usage:")
display(pd.DataFrame(promo_data).set_index('Metric'))


high_spend_count = (households['spend_segment'] == 'high_spend').sum()
high_spend_pct = high_spend_count / len(households) * 100
spend_data = {
    'Metric': [
        'High-Spend Households',
        'Avg Annual Spending (High)',
        'Avg Annual Spending (Low)'
    ],
    'Value': [
        f'{high_spend_count:,} ({high_spend_pct:.1f}%)',
        f"${households[households['spend_segment']=='high_spend']['total_sales'].mean():.2f}",
        f"${households[households['spend_segment']=='low_spend']['total_sales'].mean():.2f}"
    ]
}
print("\nðŸ“Š Spending Levels:")
display(pd.DataFrame(spend_data).set_index('Metric'))


smart_count = (households['shopper_type'] == 'promo_heavy & high_spend').sum()
smart_pct = smart_count / len(households) * 100

if smart_count > 0:
    smart_hh = households[households['shopper_type'] == 'promo_heavy & high_spend']
    smart_data = {
        'Metric': [
            'Smart Shopper Households',
            'Avg Basket Value',
            'Avg Annual Spending',
            'Avg Discount Usage'
        ],
        'Value': [
            f'{smart_count:,} ({smart_pct:.1f}%)',
            f"${smart_hh['avg_basket_value'].mean():.2f}",
            f"${smart_hh['total_sales'].mean():.2f}",
            f"{smart_hh['discount_share'].mean():.1%}"
        ]
    }
    print("\nðŸ“Š Smart Shoppers:")
    display(pd.DataFrame(smart_data).set_index('Metric'))


FINAL SUMMARY STATISTICS

ðŸ“Š Overall Population:


Unnamed: 0_level_0,Value
Metric,Unnamed: 1_level_1
Total Households Analyzed,2469
Total Transactions,1469307
Date Range,2017-01-01 to 2018-01-01



ðŸ“Š Promotion Usage:


Unnamed: 0_level_0,Value
Metric,Unnamed: 1_level_1
Promo-Heavy Households,767 (31.1%)
Avg Discount Rate (Promo-Heavy),60.9%
Avg Discount Rate (Promo-Light),45.1%



ðŸ“Š Spending Levels:


Unnamed: 0_level_0,Value
Metric,Unnamed: 1_level_1
High-Spend Households,618 (25.0%)
Avg Annual Spending (High),$4609.36
Avg Annual Spending (Low),$944.06



ðŸ“Š Smart Shoppers:


Unnamed: 0_level_0,Value
Metric,Unnamed: 1_level_1
Smart Shopper Households,238 (9.6%)
Avg Basket Value,$42.26
Avg Annual Spending,$4574.30
Avg Discount Usage,57.7%



ANALYSIS COMPLETE
