---
title: '"Promotion-Heavy" vs "Promotion-Light"'
---
# 6. Define "Promotion-Heavy" vs "Promotion-Light" Households

We need a transparent, business-friendly way to classify households by promotion intensity.

### Our Approach: Independent Thresholds

Instead of creating an arbitrary weighted score, we use the natural distribution of the data:

- **Discount-heavy**: Top 25% of `discount_share` (% of items bought on sale)
- **Coupon-heavy**: Top 25% of `coupons_per_100_items` (redemption rate)

A household is classified as **"promo_heavy"** if they are in the top quartile of 
**either metric** (or both). This captures:
- Heavy sale shoppers
- Heavy coupon users  
- Households who are both

Everyone else is **"promo_light"**.

This mirrors how retailers think about promotion engagement and is easy to explain to executives.


In [14]:

print("DEFINING PROMOTION SEGMENTS")


DEFINING PROMOTION SEGMENTS


In [15]:
disc_threshold = households["discount_share"].quantile(0.75)
coupon_threshold = households["coupons_per_100_items"].quantile(
    0.50
)

print("\nThreshold values:")
threshold_df = pd.DataFrame({
    'Metric': ['Discount Share (75th percentile)', 'Coupon Usage (50th percentile)'],
    'Threshold': [f'{disc_threshold:.1%}', f'{coupon_threshold:.2f}']
})
display(threshold_df.set_index('Metric'))



Threshold values:


Unnamed: 0_level_0,Threshold
Metric,Unnamed: 1_level_1
Discount Share (75th percentile),56.9%
Coupon Usage (50th percentile),0.00


In [16]:
extreme_disc_threshold = households["discount_share"].quantile(0.90)
extreme_coupon_threshold = households["coupons_per_100_items"].quantile(0.90)

households["is_high_discount"] = households["discount_share"] >= disc_threshold
households["is_high_coupon"] = households["coupons_per_100_items"] >= coupon_threshold
households["is_extreme_disc"] = households["discount_share"] >= extreme_disc_threshold
households["is_extreme_coupon"] = (
    households["coupons_per_100_items"] >= extreme_coupon_threshold
)


In [17]:
households["promo_heavy_flag"] = (
    (
        households["is_high_discount"] & households["is_high_coupon"]
    )
    | households["is_extreme_disc"]
    | households["is_extreme_coupon"]
)


In [18]:
households["promo_segment"] = np.where(
    households["promo_heavy_flag"], "promo_heavy", "promo_light"
)


In [19]:
segment_counts = households["promo_segment"].value_counts()
for segment, count in segment_counts.items():
    pct = count / len(households) * 100


In [20]:
if "promo_heavy" in segment_counts.index:
    promo_heavy_hh = households[households["promo_segment"] == "promo_heavy"]

    both_high = (
        promo_heavy_hh["is_high_discount"] & promo_heavy_hh["is_high_coupon"]
    ).sum()
    extreme_disc = promo_heavy_hh["is_extreme_disc"].sum()
    extreme_coup = promo_heavy_hh["is_extreme_coupon"].sum()

    promo_qual_df = pd.DataFrame({
        'Qualification Type': [
            'High Discount + Some Coupons',
            'Extreme Discount Users (Top 10%)',
            'Extreme Coupon Users (Top 10%)'
        ],
        'Count': [f'{both_high:,}', f'{extreme_disc:,}', f'{extreme_coup:,}']
    })
    print("\nPromo-heavy qualification:")
    display(promo_qual_df.set_index('Qualification Type'))



Promo-heavy qualification:


Unnamed: 0_level_0,Count
Qualification Type,Unnamed: 1_level_1
High Discount + Some Coupons,618
Extreme Discount Users (Top 10%),251
Extreme Coupon Users (Top 10%),247
