# Required Assignment 5.1: Will the Customer Accept the Coupon?

**UC Berkeley — Professional Certificate in Machine Learning & Artificial Intelligence**

**Author:** Saroj Nayak  
**Date:** February 2026

---

## Context

Imagine driving through town and a coupon is delivered to your cell phone for a restaurant near where you are driving. Would you accept that coupon and take a short detour to the restaurant? Would you accept the coupon but use it on a subsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaurant? What about a coffee house?

This project uses the **UCI In-Vehicle Coupon Recommendation** dataset to explore the factors that determine whether a driver accepts or rejects a coupon. Through visualizations, statistical summaries, and probability distributions, we aim to distinguish between customers who accepted a driving coupon versus those who did not.

## Data Source

The data comes from the UCI Machine Learning Repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios (destination, time, weather, passenger, etc.) and asks whether the person would accept the coupon.

- **Y = 1**: Accepted the coupon ("right away" or "later before expiration")
- **Y = 0**: Rejected the coupon ("no, I do not want the coupon")
- **Five coupon types**: Restaurants under $20, Coffee Houses, Carry out & Take away, Bars, and Restaurants $20–$50


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

sns.set_theme(style="whitegrid", palette="muted")
plt.rcParams.update({'figure.dpi': 120, 'font.size': 10, 'figure.facecolor': 'white'})

ACCEPT_COLOR = '#2ecc71'
REJECT_COLOR = '#e74c3c'
PALETTE = {1: ACCEPT_COLOR, 0: REJECT_COLOR}

## 1. Data Loading & Exploration

In [None]:
# Load dataset
df = pd.read_csv('coupons.csv')
print(f"Dataset shape: {df.shape}")
df.head()

In [None]:
df.info()

In [None]:
df.describe()

## 2. Data Cleaning

### Missing Values

In [None]:
# Check missing values
missing = df.isnull().sum()
missing_pct = (missing / len(df) * 100).round(2)
missing_df = pd.DataFrame({'Count': missing, 'Percent': missing_pct})
print(missing_df[missing_df['Count'] > 0])

In [None]:
# Drop 'car' column (99% missing - not useful)
df.drop(columns=['car'], inplace=True)

# Fill remaining missing values with mode
for col in ['Bar', 'CoffeeHouse', 'CarryAway', 'RestaurantLessThan20', 'Restaurant20To50']:
    df[col].fillna(df[col].mode()[0], inplace=True)

# Remove duplicates
dupes = df.duplicated().sum()
print(f"Duplicate rows found: {dupes}")
df.drop_duplicates(inplace=True)
print(f"Dataset after cleaning: {df.shape}")

# Create numeric age column for analysis
age_map = {'below21': 20, '21': 21, '26': 26, '31': 31, '36': 36, '41': 41, '46': 46, '50plus': 55}
df['age_numeric'] = df['age'].map(age_map)

## 3. Overall Coupon Acceptance Analysis

Let's first understand the overall proportion of coupon acceptance and how it varies by coupon type.

In [None]:
print(f"Overall acceptance rate: {df['Y'].mean():.1%}")
print(f"\nAcceptance rate by coupon type:")
print(df.groupby('coupon')['Y'].mean().sort_values(ascending=False).apply(lambda x: f"{x:.1%}"))

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Pie chart
labels_pie = ['Accepted (Y=1)', 'Rejected (Y=0)']
sizes = [df['Y'].sum(), (df['Y'] == 0).sum()]
colors_pie = [ACCEPT_COLOR, REJECT_COLOR]
axes[0].pie(sizes, labels=labels_pie, autopct='%1.1f%%', colors=colors_pie,
            startangle=90, textprops={'fontsize': 11})
axes[0].set_title('Overall Coupon Acceptance Rate', fontweight='bold', fontsize=13)

# Acceptance by coupon type
coupon_accept = df.groupby('coupon')['Y'].mean().sort_values(ascending=True)
bars = axes[1].barh(coupon_accept.index, coupon_accept.values, 
                     color=sns.color_palette("viridis", len(coupon_accept)))
axes[1].set_xlabel('Acceptance Rate')
axes[1].set_title('Acceptance Rate by Coupon Type', fontweight='bold', fontsize=13)
for bar, val in zip(bars, coupon_accept.values):
    axes[1].text(val + 0.005, bar.get_y() + bar.get_height()/2, f'{val:.1%}', va='center')
axes[1].set_xlim(0, 0.85)

# Volume by coupon type
coupon_counts = df.groupby(['coupon', 'Y']).size().unstack(fill_value=0)
coupon_counts.plot(kind='bar', stacked=True, ax=axes[2], color=[REJECT_COLOR, ACCEPT_COLOR])
axes[2].set_title('Coupon Volume: Accepted vs Rejected', fontweight='bold', fontsize=13)
axes[2].set_xlabel('Coupon Type')
axes[2].set_ylabel('Count')
axes[2].legend(['Rejected', 'Accepted'], loc='upper right')
axes[2].tick_params(axis='x', rotation=30)

plt.tight_layout()
plt.show()

### Key Findings — Overall Acceptance:
- **56.8%** of all coupons were accepted overall.
- **Carry out & Take away** coupons have the highest acceptance rate (~73.5%), followed by cheap restaurants (<$20) at ~70.7%.
- **Bar coupons** have the lowest acceptance rate (~41.0%), and **Coffee House** coupons are nearly a coin flip (~49.9%).
- **Expensive restaurant** coupons ($20–$50) are also relatively low at ~44.1%.

This suggests that **lower-cost, everyday dining coupons** are far more likely to be accepted than discretionary ones like bars or fine dining.


## 4. Bar Coupon Analysis (Deep Dive)

Let's investigate the bar coupon acceptance patterns in detail, as this is the lowest-acceptance coupon type.

In [None]:
bar_df = df[df['coupon'] == 'Bar'].copy()
print(f"Bar coupon records: {len(bar_df)}")
print(f"Bar coupon acceptance rate: {bar_df['Y'].mean():.1%}")

In [None]:
# Compare: went to bar <=3 times vs >3 times per month
bar_df['bar_freq'] = bar_df['Bar'].apply(lambda x: '>3' if x in ['4~8', 'gt8'] else '<=3')
bar_freq_rates = bar_df.groupby('bar_freq')['Y'].mean()
print("Bar visit frequency acceptance rates:")
print(bar_freq_rates.apply(lambda x: f"{x:.1%}"))

In [None]:
# Compare: Bar>1 time & Age>25 vs all others
bar_df['bar_gt1'] = bar_df['Bar'].apply(lambda x: True if x in ['1~3', '4~8', 'gt8'] else False)
bar_df['age_gt25'] = bar_df['age_numeric'] > 25

g1 = bar_df[(bar_df['bar_gt1']) & (bar_df['age_gt25'])]
g2 = bar_df[~((bar_df['bar_gt1']) & (bar_df['age_gt25']))]
print(f"Bar>1 & Age>25 acceptance rate: {g1['Y'].mean():.1%}")
print(f"All others acceptance rate: {g2['Y'].mean():.1%}")

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Bar frequency
bar_freq_data = bar_df.groupby('bar_freq')['Y'].agg(['mean', 'count']).reset_index()
bars = axes[0, 0].bar(bar_freq_data['bar_freq'], bar_freq_data['mean'],
                       color=[ACCEPT_COLOR, '#27ae60'], edgecolor='white')
axes[0, 0].set_title('Bar Coupon: Visit Frequency', fontweight='bold')
axes[0, 0].set_xlabel('Bar Visits per Month')
axes[0, 0].set_ylabel('Acceptance Rate')
for bar, val in zip(bars, bar_freq_data['mean']):
    axes[0, 0].text(bar.get_x() + bar.get_width()/2, val + 0.01, f'{val:.1%}',
                    ha='center', fontweight='bold')
axes[0, 0].set_ylim(0, 1)

# Age
age_order = ['below21', '21', '26', '31', '36', '41', '46', '50plus']
age_bar_rate = bar_df.groupby('age')['Y'].mean().reindex(age_order)
axes[0, 1].bar(range(len(age_bar_rate)), age_bar_rate.values,
               color=sns.color_palette("RdYlGn", len(age_bar_rate)))
axes[0, 1].set_xticks(range(len(age_bar_rate)))
axes[0, 1].set_xticklabels(age_bar_rate.index, rotation=45)
axes[0, 1].set_title('Bar Coupon Acceptance by Age', fontweight='bold')
axes[0, 1].set_ylabel('Acceptance Rate')
axes[0, 1].set_ylim(0, 1)

# Passenger
pass_bar = bar_df.groupby('passanger')['Y'].mean().sort_values(ascending=True)
axes[1, 0].barh(pass_bar.index, pass_bar.values, color=sns.color_palette("coolwarm", len(pass_bar)))
axes[1, 0].set_title('Bar Coupon by Passenger Type', fontweight='bold')
axes[1, 0].set_xlabel('Acceptance Rate')
for i, (val, name) in enumerate(zip(pass_bar.values, pass_bar.index)):
    axes[1, 0].text(val + 0.01, i, f'{val:.1%}', va='center')

# Hypothesized groups
bar_df['not_widowed'] = bar_df['maritalStatus'] != 'Widowed'
bar_df['no_kid_passenger'] = ~bar_df['passanger'].isin(['Kid(s)'])
bar_df['under30'] = bar_df['age_numeric'] < 30
bar_df['cheap_rest_gt4'] = bar_df['RestaurantLessThan20'].apply(
    lambda x: True if x in ['4~8', 'gt8'] else False)
bar_df['low_income'] = bar_df['income'].isin(
    ['Less than $12500', '$12500 - $24999', '$25000 - $37499', '$37500 - $49999'])

group_labels = ['Bar>1, Not Widowed,\nNo Kids', 'Bar>1, Age<30', 
                'CheapRest>4,\nIncome<$50K', 'Overall Bar Rate']
mask1 = (bar_df['bar_gt1']) & (bar_df['not_widowed']) & (bar_df['no_kid_passenger'])
mask2 = (bar_df['bar_gt1']) & (bar_df['under30'])
mask3 = (bar_df['cheap_rest_gt4']) & (bar_df['low_income'])

group_rates = [bar_df[mask1]['Y'].mean(), bar_df[mask2]['Y'].mean(),
               bar_df[mask3]['Y'].mean(), bar_df['Y'].mean()]

colors_bar = ['#3498db', '#9b59b6', '#e67e22', '#95a5a6']
b = axes[1, 1].bar(group_labels, group_rates, color=colors_bar)
axes[1, 1].set_title('Hypothesized Driver Groups', fontweight='bold')
axes[1, 1].set_ylabel('Acceptance Rate')
axes[1, 1].set_ylim(0, 1)
for bar, val in zip(b, group_rates):
    axes[1, 1].text(bar.get_x() + bar.get_width()/2, val + 0.01, f'{val:.1%}',
                    ha='center', fontweight='bold', fontsize=9)

plt.suptitle('Bar Coupon Deep Dive Analysis', fontsize=15, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

### Key Findings — Bar Coupons:

1. **Frequency matters most**: Drivers who visit bars more than 3 times/month accept at **76.9%** vs. **37.1%** for those who go ≤3 times. This is the single strongest predictor.

2. **Age + Frequency**: Drivers who go to bars more than once AND are over 25 accept at **69.5%** — more than double the 33.5% of all others.

3. **Social context**: Drivers with **friends** as passengers have the highest bar coupon acceptance, while those with **kids** have the lowest.

4. **Lifestyle segments**:
   - Young bar-goers (age <30, bar >1/month): ~**72%** acceptance
   - Regular bar-goers who aren't widowed and don't have kid passengers: ~**71%** acceptance
   - Budget diners with lower income: ~**46%** acceptance

**Hypothesis**: Bar coupon acceptance is primarily driven by **existing bar-going behavior** and **social context** rather than purely economic factors.


## 5. Independent Investigation: Coffee House Coupons

Coffee House coupons are the most frequently offered coupon type yet have a near-50/50 acceptance rate. Let's explore what drives acceptance.

In [None]:
coffee_df = df[df['coupon'] == 'Coffee House'].copy()
print(f"Coffee House coupon records: {len(coffee_df)}")
print(f"Coffee House acceptance rate: {coffee_df['Y'].mean():.1%}")

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(18, 10))

# Visit frequency
coffee_order = ['never', 'less1', '1~3', '4~8', 'gt8']
coffee_freq = coffee_df.groupby('CoffeeHouse')['Y'].mean().reindex(coffee_order)
bars = axes[0, 0].bar(coffee_freq.index, coffee_freq.values, 
                       color=sns.color_palette("YlOrBr", len(coffee_freq)))
axes[0, 0].set_title('Acceptance by Visit Frequency', fontweight='bold')
axes[0, 0].set_xlabel('Coffee House Visits/Month')
axes[0, 0].set_ylabel('Acceptance Rate')
for bar, val in zip(bars, coffee_freq.values):
    axes[0, 0].text(bar.get_x() + bar.get_width()/2, val + 0.01, f'{val:.1%}', ha='center', fontsize=9)
axes[0, 0].set_ylim(0, 0.85)

# Time of day
time_order = ['7AM', '10AM', '2PM', '6PM', '10PM']
coffee_time = coffee_df.groupby('time')['Y'].mean().reindex(time_order)
axes[0, 1].plot(coffee_time.index, coffee_time.values, 'o-', color='#8B4513', linewidth=2, markersize=8)
axes[0, 1].fill_between(coffee_time.index, coffee_time.values, alpha=0.2, color='#D2691E')
axes[0, 1].set_title('Acceptance by Time of Day', fontweight='bold')
axes[0, 1].set_ylabel('Acceptance Rate')
axes[0, 1].set_ylim(0.3, 0.7)
for t, v in zip(coffee_time.index, coffee_time.values):
    axes[0, 1].annotate(f'{v:.1%}', (t, v), textcoords="offset points", xytext=(0, 10), ha='center')

# Weather
weather_coffee = coffee_df.groupby('weather')['Y'].mean().sort_values(ascending=True)
axes[0, 2].barh(weather_coffee.index, weather_coffee.values, 
                 color=['#3498db', '#f39c12', '#e74c3c'])
axes[0, 2].set_title('Acceptance by Weather', fontweight='bold')
axes[0, 2].set_xlabel('Acceptance Rate')
for bar, val in zip(axes[0, 2].patches, weather_coffee.values):
    axes[0, 2].text(val + 0.005, bar.get_y() + bar.get_height()/2, f'{val:.1%}', va='center')

# Gender
gender_coffee = coffee_df.groupby('gender')['Y'].mean()
axes[1, 0].bar(gender_coffee.index, gender_coffee.values, color=['#e91e63', '#2196f3'], width=0.5)
axes[1, 0].set_title('Acceptance by Gender', fontweight='bold')
axes[1, 0].set_ylabel('Acceptance Rate')
for i, (idx, val) in enumerate(gender_coffee.items()):
    axes[1, 0].text(i, val + 0.01, f'{val:.1%}', ha='center', fontweight='bold')
axes[1, 0].set_ylim(0, 0.7)

# Expiration
exp_coffee = coffee_df.groupby('expiration')['Y'].mean()
axes[1, 1].bar(exp_coffee.index, exp_coffee.values, color=['#e74c3c', '#2ecc71'], width=0.5)
axes[1, 1].set_title('Acceptance by Expiration', fontweight='bold')
axes[1, 1].set_ylabel('Acceptance Rate')
for i, (idx, val) in enumerate(exp_coffee.items()):
    axes[1, 1].text(i, val + 0.01, f'{val:.1%}', ha='center', fontweight='bold')
axes[1, 1].set_ylim(0, 0.75)

# Temperature
temp_coffee = coffee_df.groupby('temperature')['Y'].mean()
axes[1, 2].bar(temp_coffee.index.astype(str), temp_coffee.values,
               color=['#3498db', '#f1c40f', '#e74c3c'], width=0.5)
axes[1, 2].set_title('Acceptance by Temperature', fontweight='bold')
axes[1, 2].set_xlabel('Temperature (°F)')
axes[1, 2].set_ylabel('Acceptance Rate')
for i, (idx, val) in enumerate(temp_coffee.items()):
    axes[1, 2].text(i, val + 0.01, f'{val:.1%}', ha='center', fontweight='bold')
axes[1, 2].set_ylim(0, 0.7)

plt.suptitle('Coffee House Coupon Deep Dive', fontsize=15, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

### Key Findings — Coffee House Coupons:

1. **Visit frequency is again the strongest predictor**: Those who visit coffee houses 4+ times/month accept at ~67%, while those who never visit accept at only ~34%.

2. **Time of day matters**: Morning (10AM) shows the highest acceptance (~56%), aligning with typical coffee consumption patterns. Evening (6PM, 10PM) shows notably lower acceptance.

3. **Weather impact**: Sunny weather drives slightly higher acceptance. Snowy/rainy conditions reduce willingness to detour for coffee.

4. **Expiration**: Coupons with **1-day expiration** are accepted at much higher rates (~58%) than **2-hour coupons** (~43%), giving drivers flexibility increases acceptance.

5. **Temperature**: Warmer weather (80°F) shows higher acceptance than cold weather (30°F), possibly because people are more willing to make a stop in pleasant conditions.

6. **Gender**: Females show slightly higher acceptance rate for coffee house coupons than males.


## 6. Demographic Analysis Across All Coupons

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Income × Coupon heatmap
income_order = ['Less than $12500', '$12500 - $24999', '$25000 - $37499',
                '$37500 - $49999', '$50000 - $62499', '$62500 - $74999',
                '$75000 - $87499', '$87500 - $99999', '$100000 or More']
income_coupon = df.groupby(['income', 'coupon'])['Y'].mean().unstack()
income_coupon = income_coupon.reindex(income_order)
sns.heatmap(income_coupon, annot=True, fmt='.0%', cmap='RdYlGn',
            ax=axes[0, 0], linewidths=0.5, vmin=0.3, vmax=0.8)
axes[0, 0].set_title('Acceptance Rate: Income × Coupon Type', fontweight='bold')
axes[0, 0].set_ylabel('Income Level')
axes[0, 0].tick_params(axis='y', rotation=0)
axes[0, 0].tick_params(axis='x', rotation=30)

# Age × Coupon heatmap
age_order = ['below21', '21', '26', '31', '36', '41', '46', '50plus']
age_coupon = df.groupby(['age', 'coupon'])['Y'].mean().unstack()
age_coupon = age_coupon.reindex(age_order)
sns.heatmap(age_coupon, annot=True, fmt='.0%', cmap='RdYlGn',
            ax=axes[0, 1], linewidths=0.5, vmin=0.3, vmax=0.8)
axes[0, 1].set_title('Acceptance Rate: Age × Coupon Type', fontweight='bold')
axes[0, 1].set_ylabel('Age Group')
axes[0, 1].tick_params(axis='x', rotation=30)

# Education
edu_accept = df.groupby('education')['Y'].mean().sort_values(ascending=True)
axes[1, 0].barh(edu_accept.index, edu_accept.values, 
                 color=sns.color_palette("viridis", len(edu_accept)))
axes[1, 0].set_title('Acceptance Rate by Education Level', fontweight='bold')
axes[1, 0].set_xlabel('Acceptance Rate')
for bar, val in zip(axes[1, 0].patches, edu_accept.values):
    axes[1, 0].text(val + 0.003, bar.get_y() + bar.get_height()/2, f'{val:.1%}', va='center', fontsize=9)

# Marital Status
marital_accept = df.groupby('maritalStatus')['Y'].mean().sort_values(ascending=True)
axes[1, 1].barh(marital_accept.index, marital_accept.values,
                 color=sns.color_palette("Set2", len(marital_accept)))
axes[1, 1].set_title('Acceptance Rate by Marital Status', fontweight='bold')
axes[1, 1].set_xlabel('Acceptance Rate')
for bar, val in zip(axes[1, 1].patches, marital_accept.values):
    axes[1, 1].text(val + 0.003, bar.get_y() + bar.get_height()/2, f'{val:.1%}', va='center', fontsize=9)

plt.suptitle('Demographic Factors and Coupon Acceptance', fontsize=15, fontweight='bold', y=1.01)
plt.tight_layout()
plt.show()

## 7. Contextual Factors

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(16, 10))

# Time of day × coupon type
time_order = ['7AM', '10AM', '2PM', '6PM', '10PM']
for coupon_type in df['coupon'].unique():
    subset = df[df['coupon'] == coupon_type]
    time_rates = subset.groupby('time')['Y'].mean().reindex(time_order)
    axes[0, 0].plot(time_rates.index, time_rates.values, 'o-', label=coupon_type, linewidth=2)
axes[0, 0].set_title('Acceptance Rate by Time of Day', fontweight='bold')
axes[0, 0].set_xlabel('Time of Day')
axes[0, 0].set_ylabel('Acceptance Rate')
axes[0, 0].legend(fontsize=8)
axes[0, 0].set_ylim(0.2, 0.85)

# Destination
dest_coupon = df.groupby(['destination', 'coupon'])['Y'].mean().unstack()
dest_coupon.plot(kind='bar', ax=axes[0, 1], colormap='Set2')
axes[0, 1].set_title('Acceptance by Destination × Coupon', fontweight='bold')
axes[0, 1].set_ylabel('Acceptance Rate')
axes[0, 1].tick_params(axis='x', rotation=20)
axes[0, 1].legend(fontsize=7, loc='upper right')

# Direction
dir_data = df.groupby(['direction_same', 'coupon'])['Y'].mean().unstack()
dir_data.index = ['Opposite Direction', 'Same Direction']
dir_data.plot(kind='bar', ax=axes[1, 0], colormap='Paired')
axes[1, 0].set_title('Acceptance by Direction', fontweight='bold')
axes[1, 0].set_ylabel('Acceptance Rate')
axes[1, 0].tick_params(axis='x', rotation=0)
axes[1, 0].legend(fontsize=7)

# Distance
dist_labels = ['<5min', '5-15min', '15-25min', '>25min']
accepted = df[df['Y'] == 1]
rejected = df[df['Y'] == 0]
accept_dist = [1.0, accepted['toCoupon_GEQ5min'].mean(), 
               accepted['toCoupon_GEQ15min'].mean(), accepted['toCoupon_GEQ25min'].mean()]
reject_dist = [1.0, rejected['toCoupon_GEQ5min'].mean(),
               rejected['toCoupon_GEQ15min'].mean(), rejected['toCoupon_GEQ25min'].mean()]
x = np.arange(len(dist_labels))
w = 0.35
axes[1, 1].bar(x - w/2, accept_dist, w, label='Accepted', color=ACCEPT_COLOR)
axes[1, 1].bar(x + w/2, reject_dist, w, label='Rejected', color=REJECT_COLOR)
axes[1, 1].set_title('Distance to Coupon Location', fontweight='bold')
axes[1, 1].set_xticks(x)
axes[1, 1].set_xticklabels(dist_labels)
axes[1, 1].legend()

plt.suptitle('Contextual Factors Affecting Coupon Acceptance', fontsize=15, fontweight='bold', y=1.01)
plt.tight_layout()
plt.show()

## 8. Statistical Analysis: Feature Association Strength

We use **Chi-Square tests of independence** to quantify the association between each feature and coupon acceptance. **Cramér's V** provides an effect size measure (0 = no association, 1 = perfect association).

In [None]:
key_features = ['coupon', 'weather', 'temperature', 'time', 'destination',
                'passanger', 'gender', 'age', 'maritalStatus', 'education',
                'expiration', 'income', 'Bar', 'CoffeeHouse', 'CarryAway',
                'RestaurantLessThan20', 'Restaurant20To50']

chi2_results = []
for feat in key_features:
    ct = pd.crosstab(df[feat], df['Y'])
    chi2, p, dof, expected = stats.chi2_contingency(ct)
    cramers_v = np.sqrt(chi2 / (len(df) * (min(ct.shape) - 1)))
    chi2_results.append({'Feature': feat, 'Chi2': round(chi2, 2), 
                         'p-value': f'{p:.2e}', "Cramer's V": round(cramers_v, 4),
                         'Significant': 'Yes' if p < 0.05 else 'No'})

chi2_df = pd.DataFrame(chi2_results).sort_values("Cramer's V", ascending=False)
print(chi2_df.to_string(index=False))

In [None]:
fig, ax = plt.subplots(figsize=(10, 7))
chi2_sorted = chi2_df.sort_values("Cramer's V", ascending=True)
colors_chi = ['#2ecc71' if x == 'Yes' else '#95a5a6' for x in chi2_sorted['Significant']]
ax.barh(chi2_sorted['Feature'], chi2_sorted["Cramer's V"], color=colors_chi)
ax.set_xlabel("Cramér's V (Effect Size)")
ax.set_title("Feature Association with Coupon Acceptance\n(Chi-Square Test)", fontweight='bold', fontsize=14)
ax.axvline(x=0.1, color='red', linestyle='--', alpha=0.5, label='Small effect threshold')
ax.legend()
plt.tight_layout()
plt.show()

## 9. Probability Distributions

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

for label, group in df.groupby('Y'):
    name = 'Accepted' if label == 1 else 'Rejected'
    color = ACCEPT_COLOR if label == 1 else REJECT_COLOR
    axes[0].hist(group['temperature'], bins=15, alpha=0.6, label=name, color=color, density=True)
axes[0].set_title('Temperature Distribution: Accepted vs Rejected', fontweight='bold')
axes[0].set_xlabel('Temperature (°F)')
axes[0].set_ylabel('Density')
axes[0].legend()

for label, group in df.groupby('Y'):
    name = 'Accepted' if label == 1 else 'Rejected'
    color = ACCEPT_COLOR if label == 1 else REJECT_COLOR
    axes[1].hist(group['age_numeric'], bins=15, alpha=0.6, label=name, color=color, density=True)
axes[1].set_title('Age Distribution: Accepted vs Rejected', fontweight='bold')
axes[1].set_xlabel('Age')
axes[1].set_ylabel('Density')
axes[1].legend()

plt.suptitle('Probability Distributions of Key Numeric Features', fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

## 10. Conclusions & Recommendations

### Summary of Findings

This analysis explored the UCI In-Vehicle Coupon Recommendation dataset to understand what factors distinguish coupon acceptors from rejectors. Key findings include:

**Strongest predictors of coupon acceptance (by Cramér's V):**

1. **Coupon type** (V=0.26): The type of coupon is the single most important factor. Carry out and cheap restaurant coupons enjoy ~70%+ acceptance, while bar and expensive restaurant coupons hover around 41–44%.

2. **Visit frequency** (V=0.15 for CoffeeHouse): Whether someone already frequents the establishment type is a powerful predictor. Bar-goers accept bar coupons at nearly twice the rate of non-bar-goers, and the same pattern holds for coffee houses.

3. **Passenger/social context** (V=0.13): Who is in the car matters — friends increase acceptance (especially for bars), while kids decrease it.

4. **Destination urgency** (V=0.13): Drivers heading to "No Urgent Place" are far more likely to accept coupons than those going to work or home.

5. **Expiration time** (V=0.13): Longer expiration (1 day) increases acceptance significantly compared to 2-hour coupons, giving drivers flexibility.

6. **Time of day** (V=0.12): Acceptance varies by time — coffee coupons peak in the morning, bar coupons in the evening, aligning with natural consumption patterns.

7. **Weather & temperature** (V=0.10, 0.06): Sunny, warm weather modestly increases acceptance.

### Actionable Recommendations for Coupon Targeting:

- **Target frequent visitors**: Someone who already goes to bars or coffee houses is 2× more likely to accept a coupon for that venue.
- **Match timing to behavior**: Send coffee coupons in the morning, bar coupons in the evening.
- **Extend expiration**: 1-day coupons dramatically outperform 2-hour coupons.
- **Consider social context**: Target bar/restaurant coupons when drivers have friends (not kids) as passengers.
- **Avoid targeting urgent trips**: Focus on leisure driving or "no urgent place" destinations.

---

*This analysis was completed as part of the UC Berkeley Professional Certificate in ML & AI program. Data source: [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/603/in+vehicle+coupon+recommendation).*
