---
title: Define Promotion Usage Metrics
---
# 3. Define Promotion Usage Metrics

We measure promotion intensity using two complementary metrics:

### 3.1 Discount Usage
From `transactions`, we calculate what % of items were purchased with **any** discount:
- `retail_disc` > 0 (sale price)
- `coupon_disc` > 0 (manufacturer coupon)
- `coupon_match_disc` > 0 (store matches competitor coupon)

### 3.2 Coupon Redemptions
From `coupon_redemptions`, we count how many coupons each household redeemed
and normalize by purchase volume to get "coupons per 100 items purchased"


In [None]:

discount_cols = ['retail_disc', 'coupon_disc', 'coupon_match_disc']

missing_cols = [c for c in discount_cols if c not in transactions.columns]
if missing_cols:
    discount_cols = [c for c in discount_cols if c in transactions.columns]


if discount_cols:
    transactions['has_discount'] = (transactions[discount_cols] > 0).any(axis=1)
else:
    print("⚠️ No discount columns found - assuming no discounts")
    transactions['has_discount'] = False

total_transactions = len(transactions)
discounted_transactions = transactions['has_discount'].sum()
discount_rate = discounted_transactions / total_transactions

summary_df = pd.DataFrame({
    'Metric': ['Total Transactions', 'Transactions with Discount', 'Overall Discount Rate'],
    'Value': [f'{total_transactions:,}', f'{discounted_transactions:,}', f'{discount_rate:.1%}']
})
display(summary_df.set_index('Metric'))

print("\nSample transactions with discounts:")
display(transactions[transactions['has_discount']][
    ['household_id', 'sales_value', 'retail_disc', 'coupon_disc', 'coupon_match_disc']
].head(10))



In [None]:

discount_usage = (
    transactions
    .groupby('household_id')
    .agg(
        total_items=('has_discount', 'size'),
        discounted_items=('has_discount', 'sum'),
        total_sales=('sales_value', 'sum')
    )
    .reset_index()
)

discount_usage['discount_share'] = (
    discount_usage['discounted_items'] / discount_usage['total_items']
)

print("\nDiscount share distribution:")
print(discount_usage['discount_share'].describe())

print("\nTop 10 households by discount usage:")
display(discount_usage.nlargest(10, 'discount_share')[
    ['household_id', 'total_items', 'discounted_items', 'discount_share']
])



In [None]:

coupon_usage = (
    coupon_redemptions
    .groupby('household_id')
    .size()
    .rename('num_coupons_redeemed')
    .reset_index()
)

coupon_summary = pd.DataFrame({
    'Metric': ['Households with Coupons', 'Total Coupons Redeemed', 'Average per Household'],
    'Value': [f'{len(coupon_usage):,}', f'{coupon_usage["num_coupons_redeemed"].sum():,}', f'{coupon_usage["num_coupons_redeemed"].mean():.1f}']
})
display(coupon_summary.set_index('Metric'))

print("\nTop 10 coupon users:")
display(coupon_usage.nlargest(10, 'num_coupons_redeemed'))



## 4. Define Spending Behavior Metrics

To assess whether promotion-heavy households are high-value or low-value, we calculate:

- **Total annual spending** (`total_sales`)
- **Number of shopping trips** (`num_trips` = distinct baskets)
- **Average basket value** (total_sales / num_trips)
- **Average price per unit** (total_sales / total_quantity) - if quantity data available

These metrics help us distinguish between:
- High-value customers (large baskets, expensive items)
- Low-value customers (small baskets, cheap items)


In [None]:

qty_col = None
for candidate in ['quantity', 'QUANTITY', 'purchase_quantity']:
    if candidate in transactions.columns:
        qty_col = candidate
        break

agg_dict = {
    'sales_value': 'sum',
    'basket_id': pd.Series.nunique,
}

if qty_col:
    agg_dict[qty_col] = 'sum'
else:
    print("⚠️ No quantity column found - cannot calculate avg price per unit")

household_spend = (
    transactions
    .groupby('household_id')
    .agg(agg_dict)
    .rename(columns={
        'sales_value': 'total_sales',
        'basket_id': 'num_trips',
    })
    .reset_index()
)

household_spend['avg_basket_value'] = (
    household_spend['total_sales'] / household_spend['num_trips']
)

if qty_col:
    household_spend.rename(columns={qty_col: 'total_quantity'}, inplace=True)
    household_spend['avg_price_per_unit'] = (
        household_spend['total_sales'] / household_spend['total_quantity'].replace(0, np.nan)
    )
    
    
    zero_qty_count = (household_spend['total_quantity'] == 0).sum()
    if zero_qty_count > 0:
        print(f"  ⚠️ Warning: {zero_qty_count} households have zero total quantity")


print("\nSpending behavior summary:")
print(household_spend[['total_sales', 'num_trips', 'avg_basket_value']].describe())

if 'avg_price_per_unit' in household_spend.columns:
    print("\nAverage price per unit:")
    print(household_spend['avg_price_per_unit'].describe())



In [None]:
print("\nTop 10 households by total spending:")
display(household_spend.nlargest(10, 'total_sales')[
    ['household_id', 'total_sales', 'num_trips', 'avg_basket_value']
])



## 5. Combine Data: Promotions + Spending + Demographics

Now we create a master household-level dataset that includes:
- Promotion usage (discount share + coupon redemptions)
- Spending behavior (total sales, basket value, item prices)
- Demographics (age, income, kids, household size)

This unified dataset enables us to answer our key questions about smart shoppers.


In [None]:

households = (
    household_spend
    .merge(discount_usage[['household_id', 'discount_share', 'discounted_items', 'total_items']], 
           on='household_id', how='left', suffixes=('', '_disc'))
    .merge(coupon_usage, on='household_id', how='left')
    .merge(demographics, on='household_id', how='left')
)

households['num_coupons_redeemed'] = households['num_coupons_redeemed'].fillna(0)
households['discount_share'] = households['discount_share'].fillna(0)


print("\nMerge quality check:")
household_data_df = pd.DataFrame({
    'Data Type': [
        'Households with Spending Data',
        'Households with Demographic Data',
        'Households with Discount Data',
        'Households with Coupon Redemptions'
    ],
    'Count': [
        f"{household_spend['household_id'].nunique():,}",
        f"{households['age'].notna().sum():,}",
        f"{households['discount_share'].notna().sum():,}",
        f"{(households['num_coupons_redeemed'] > 0).sum():,}"
    ]
})
display(household_data_df.set_index('Data Type'))



In [None]:
print("\nSample of unified household data:")
display(households[[
    'household_id', 'total_sales', 'avg_basket_value', 'discount_share', 
    'num_coupons_redeemed', 'income', 'kids_count', 'household_comp'
]].head(15))



In [None]:

households['coupons_per_100_items'] = (
    households['num_coupons_redeemed'] / households['total_items'] * 100
)

print("\nDistribution:")
print(households['coupons_per_100_items'].describe())

