### Required Assignment 5.1: Will the Customer Accept the Coupon?

### LvS Response

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaurant near where you are driving. Would you accept that coupon and take a short detour to the restaurant? Would you accept the coupon but use it on a subsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaurant? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\$20 - $50).

### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Set seaborn style
sns.set_style("whitegrid")
sns.set_palette("husl")

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [None]:
data = pd.read_csv('data/coupons.csv')

In [None]:
data.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


2. Investigate the dataset for missing or problematic data.

In [None]:
# Check the shape of the dataset
print("Dataset shape:", data.shape)
print("\n")

# Check for missing values
print("Missing values per column:")
print(data.isnull().sum())
print("\n")

# Check data types
print("Data types:")
print(data.dtypes)

3. Decide what to do about your missing data -- drop, replace, other...

In [None]:
# Drop the 'car' column as it has too many missing values and may not be useful
data = data.drop('car', axis=1)

# Drop rows with missing values in other columns
data = data.dropna()

# Check the new shape
print("New dataset shape after cleaning:", data.shape)

4. What proportion of the total observations chose to accept the coupon?



In [None]:
# Visualize the proportion of accepted vs rejected coupons
acceptance_rate = data['Y'].sum() / len(data)

# Create a DataFrame for plotting
acceptance_df = pd.DataFrame({
    'Status': ['Rejected', 'Accepted'],
    'Count': data['Y'].value_counts().sort_index().values,
    'Percentage': data['Y'].value_counts(normalize=True).sort_index().values * 100
})

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Count plot
sns.barplot(data=acceptance_df, x='Status', y='Count', ax=axes[0], hue='Status', palette=['#ff9999', '#66b3ff'], legend=False)
axes[0].set_xlabel('Coupon Status')
axes[0].set_ylabel('Count')
axes[0].set_title('Overall Coupon Acceptance Count')
for i, row in acceptance_df.iterrows():
    axes[0].text(i, row['Count'], f'{int(row["Count"])}', ha='center', va='bottom')

# Percentage bar plot
sns.barplot(data=acceptance_df, x='Status', y='Percentage', ax=axes[1], hue='Status', palette=['#ff9999', '#66b3ff'], legend=False)
axes[1].set_ylabel('Percentage (%)')
axes[1].set_title('Coupon Acceptance Rate')
for i, row in acceptance_df.iterrows():
    axes[1].text(i, row['Percentage'], f'{row["Percentage"]:.1f}%', ha='center', va='bottom')

plt.tight_layout()
plt.show()

print(f"Acceptance Rate: {acceptance_rate:.4f} ({acceptance_rate * 100:.2f}%)")

5. Use a bar plot to visualize the `coupon` column.

In [None]:
# Create a bar plot for the coupon column using seaborn
coupon_counts = data['coupon'].value_counts().reset_index()
coupon_counts.columns = ['coupon', 'count']
coupon_counts = coupon_counts.sort_values('count', ascending=True)

plt.figure(figsize=(10, 6))
sns.barplot(data=coupon_counts, y='coupon', x='count', hue='coupon', palette='viridis', legend=False)
plt.title('Distribution of Coupon Types', fontsize=14)
plt.xlabel('Count')
plt.ylabel('Coupon Type')
plt.tight_layout()
plt.show()

6. Use a histogram to visualize the temperature column.

In [None]:
# Create a histogram for temperature using seaborn
plt.figure(figsize=(10, 6))
sns.histplot(data=data, x='temperature', bins=10, kde=True, color='skyblue', edgecolor='black')
plt.title('Distribution of Temperature', fontsize=14)
plt.xlabel('Temperature (F)')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()

In [None]:
# Visualize acceptance rate by coupon type
coupon_stats = data.groupby('coupon').agg({'Y': ['mean', 'count']}).reset_index()
coupon_stats.columns = ['coupon', 'acceptance_rate', 'count']
coupon_stats = coupon_stats.sort_values('acceptance_rate', ascending=True)

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Acceptance rate by coupon type
sns.barplot(data=coupon_stats, y='coupon', x='acceptance_rate', hue='coupon', ax=axes[0], palette='coolwarm', legend=False)
axes[0].set_xlabel('Acceptance Rate')
axes[0].set_ylabel('Coupon Type')
axes[0].set_title('Acceptance Rate by Coupon Type')
axes[0].set_xlim(0, 1)
for i, row in coupon_stats.iterrows():
    axes[0].text(row['acceptance_rate'], i, f' {row["acceptance_rate"]:.1%}', va='center')

# Count by coupon type
sns.barplot(data=coupon_stats, y='coupon', x='count', hue='coupon', ax=axes[1], palette='viridis', legend=False)
axes[1].set_xlabel('Number of Coupons')
axes[1].set_ylabel('Coupon Type')
axes[1].set_title('Coupon Count by Type')
for i, row in coupon_stats.iterrows():
    axes[1].text(row['count'], i, f' {int(row["count"])}', va='center')

plt.tight_layout()
plt.show()

In [None]:
# Analyze acceptance by weather and passenger
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Weather analysis
weather_data = data.groupby('weather')['Y'].mean().reset_index()
weather_data.columns = ['weather', 'acceptance_rate']
sns.barplot(data=weather_data, x='weather', y='acceptance_rate', hue='weather', ax=axes[0], palette='Set2', legend=False)
axes[0].set_ylabel('Acceptance Rate')
axes[0].set_xlabel('Weather')
axes[0].set_title('Acceptance Rate by Weather')
axes[0].set_ylim(0, 1)
for i, row in weather_data.iterrows():
    axes[0].text(i, row['acceptance_rate'], f'{row["acceptance_rate"]:.1%}', ha='center', va='bottom')

# Passenger analysis
passenger_data = data.groupby('passanger')['Y'].mean().reset_index().sort_values('Y', ascending=False)
passenger_data.columns = ['passanger', 'acceptance_rate']
sns.barplot(data=passenger_data, y='passanger', x='acceptance_rate', hue='passanger', ax=axes[1], palette='Set3', legend=False)
axes[1].set_xlabel('Acceptance Rate')
axes[1].set_ylabel('Passenger Type')
axes[1].set_title('Acceptance Rate by Passenger Type')
axes[1].set_xlim(0, 1)
for i, row in passenger_data.iterrows():
    axes[1].text(row['acceptance_rate'], i, f' {row["acceptance_rate"]:.1%}', va='center')

plt.tight_layout()
plt.show()

In [None]:
# Visualize acceptance by age group
age_data = data.groupby('age')['Y'].mean().reset_index().sort_values('Y', ascending=True)
age_data.columns = ['age', 'acceptance_rate']

plt.figure(figsize=(12, 6))
sns.barplot(data=age_data, y='age', x='acceptance_rate', hue='age', palette='coral', legend=False)
plt.xlabel('Acceptance Rate')
plt.ylabel('Age Group')
plt.title('Coupon Acceptance Rate by Age Group')
plt.xlim(0, 1)
for i, row in age_data.iterrows():
    plt.text(row['acceptance_rate'], i, f' {row["acceptance_rate"]:.1%}', va='center')
plt.tight_layout()
plt.show()

In [None]:
# Visualize acceptance by destination and expiration time
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Destination analysis
dest_data = data.groupby('destination')['Y'].mean().reset_index()
dest_data.columns = ['destination', 'acceptance_rate']
sns.barplot(data=dest_data, x='destination', y='acceptance_rate', hue='destination', ax=axes[0], palette='bright', legend=False)
axes[0].set_ylabel('Acceptance Rate')
axes[0].set_xlabel('Destination')
axes[0].set_title('Acceptance Rate by Destination')
axes[0].set_ylim(0, 1)
axes[0].tick_params(axis='x', rotation=45)
for i, row in dest_data.iterrows():
    axes[0].text(i, row['acceptance_rate'], f'{row["acceptance_rate"]:.1%}', ha='center', va='bottom')

# Expiration time analysis
exp_data = data.groupby('expiration')['Y'].mean().reset_index()
exp_data.columns = ['expiration', 'acceptance_rate']
sns.barplot(data=exp_data, x='expiration', y='acceptance_rate', hue='expiration', ax=axes[1], palette='pastel', legend=False)
axes[1].set_ylabel('Acceptance Rate')
axes[1].set_xlabel('Expiration Time')
axes[1].set_title('Acceptance Rate by Expiration Time')
axes[1].set_ylim(0, 1)
for i, row in exp_data.iterrows():
    axes[1].text(i, row['acceptance_rate'], f'{row["acceptance_rate"]:.1%}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [None]:
# Create a new DataFrame with just bar coupons
bar_coupons = data[data['coupon'] == 'Bar']
print(f"Total bar coupons: {len(bar_coupons)}")

2. What proportion of bar coupons were accepted?


In [None]:
# Visualize bar coupon acceptance
bar_acceptance_rate = bar_coupons['Y'].sum() / len(bar_coupons)

# Create a DataFrame for plotting
bar_acceptance_df = pd.DataFrame({
    'Status': ['Rejected', 'Accepted'],
    'Count': bar_coupons['Y'].value_counts().sort_index().values,
    'Percentage': bar_coupons['Y'].value_counts(normalize=True).sort_index().values * 100
})

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Count plot
sns.barplot(data=bar_acceptance_df, x='Status', y='Count', ax=axes[0], hue='Status', palette=['#ffcc99', '#99ccff'], legend=False)
axes[0].set_xlabel('Coupon Status')
axes[0].set_ylabel('Count')
axes[0].set_title('Bar Coupon Acceptance Count')
for i, row in bar_acceptance_df.iterrows():
    axes[0].text(i, row['Count'], f'{int(row["Count"])}', ha='center', va='bottom')

# Percentage bar plot
sns.barplot(data=bar_acceptance_df, x='Status', y='Percentage', ax=axes[1], hue='Status', palette=['#ffcc99', '#99ccff'], legend=False)
axes[1].set_ylabel('Percentage (%)')
axes[1].set_title(f'Bar Coupon Acceptance Rate (Total: {len(bar_coupons)})')
for i, row in bar_acceptance_df.iterrows():
    axes[1].text(i, row['Percentage'], f'{row["Percentage"]:.1f}%', ha='center', va='bottom')

plt.tight_layout()
plt.show()

3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [None]:
# Compare acceptance between those who went to bar 3 or fewer times vs more than 3
bar_3_or_less = bar_coupons[bar_coupons['Bar'].isin(['never', 'less1', '1~3'])]
bar_more_than_3 = bar_coupons[bar_coupons['Bar'].isin(['4~8', 'gt8'])]

acceptance_3_or_less = bar_3_or_less['Y'].sum() / len(bar_3_or_less)
acceptance_more_than_3 = bar_more_than_3['Y'].sum() / len(bar_more_than_3)

# Create visualization data
comparison_df = pd.DataFrame({
    'Group': ['≤3 times/month', '>3 times/month'],
    'Acceptance Rate': [acceptance_3_or_less, acceptance_more_than_3],
    'Count': [len(bar_3_or_less), len(bar_more_than_3)]
})

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Subplot 1: Acceptance rates comparison
sns.barplot(data=comparison_df, x='Group', y='Acceptance Rate', ax=axes[0], hue='Group', palette=['#ff9999', '#66b3ff'], legend=False)
axes[0].set_ylabel('Acceptance Rate')
axes[0].set_xlabel('')
axes[0].set_title('Bar Coupon Acceptance by Visit Frequency')
axes[0].set_ylim(0, 1)
for i, row in comparison_df.iterrows():
    axes[0].text(i, row['Acceptance Rate'], f'{row["Acceptance Rate"]:.1%}', ha='center', va='bottom')

# Subplot 2: Group sizes
sns.barplot(data=comparison_df, x='Group', y='Count', ax=axes[1], hue='Group', palette=['#ff9999', '#66b3ff'], legend=False)
axes[1].set_ylabel('Number of Coupons')
axes[1].set_xlabel('')
axes[1].set_title('Sample Size by Group')
for i, row in comparison_df.iterrows():
    axes[1].text(i, row['Count'], str(int(row['Count'])), ha='center', va='bottom')

plt.tight_layout()
plt.show()

4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [None]:
# Compare acceptance for drivers who go to bar more than once a month and are over 25
frequent_and_over_25 = bar_coupons[(bar_coupons['Bar'].isin(['1~3', '4~8', 'gt8'])) & 
                                    (bar_coupons['age'].isin(['26', '31', '36', '41', '46', '50plus']))]
all_others = bar_coupons[~((bar_coupons['Bar'].isin(['1~3', '4~8', 'gt8'])) & 
                            (bar_coupons['age'].isin(['26', '31', '36', '41', '46', '50plus'])))]

acceptance_frequent_over_25 = frequent_and_over_25['Y'].sum() / len(frequent_and_over_25)
acceptance_all_others = all_others['Y'].sum() / len(all_others)

# Create visualization data
age_comparison_df = pd.DataFrame({
    'Group': ['Frequent Bar-Goers\nOver 25', 'All Others'],
    'Acceptance Rate': [acceptance_frequent_over_25, acceptance_all_others],
    'Count': [len(frequent_and_over_25), len(all_others)]
})

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Subplot 1: Acceptance comparison
sns.barplot(data=age_comparison_df, x='Group', y='Acceptance Rate', ax=axes[0], hue='Group', palette=['#66b3ff', '#ffcc99'], legend=False)
axes[0].set_ylabel('Acceptance Rate')
axes[0].set_xlabel('')
axes[0].set_title('Acceptance: Frequent Bar-Goers (>1/month) Over 25')
axes[0].set_ylim(0, 1)
for i, row in age_comparison_df.iterrows():
    axes[0].text(i, row['Acceptance Rate'], f'{row["Acceptance Rate"]:.1%}', ha='center', va='bottom')

# Subplot 2: Group sizes
sns.barplot(data=age_comparison_df, x='Group', y='Count', ax=axes[1], hue='Group', palette=['#66b3ff', '#ffcc99'], legend=False)
axes[1].set_ylabel('Number of Coupons')
axes[1].set_xlabel('')
axes[1].set_title('Sample Size by Group')
for i, row in age_comparison_df.iterrows():
    axes[1].text(i, row['Count'], str(int(row['Count'])), ha='center', va='bottom')

plt.tight_layout()
plt.show()

5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry.


In [None]:
# Drivers who go to bars more than once a month, no kid passengers, not farming/fishing/forestry
specific_group = bar_coupons[(bar_coupons['Bar'].isin(['1~3', '4~8', 'gt8'])) & 
                             (bar_coupons['passanger'] != 'Kid(s)') & 
                             (bar_coupons['occupation'] != 'Farming Fishing & Forestry')]

all_bar_coupons = bar_coupons

acceptance_specific = specific_group['Y'].sum() / len(specific_group)
acceptance_all_bar = all_bar_coupons['Y'].sum() / len(all_bar_coupons)

# Create visualization data
passenger_occ_df = pd.DataFrame({
    'Group': ['Frequent + No Kids\n+ Not Farming', 'All Bar Coupons'],
    'Acceptance Rate': [acceptance_specific, acceptance_all_bar],
    'Count': [len(specific_group), len(all_bar_coupons)]
})

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Subplot 1: Acceptance comparison
sns.barplot(data=passenger_occ_df, x='Group', y='Acceptance Rate', ax=axes[0], hue='Group', palette=['#99ff99', '#ffcc99'], legend=False)
axes[0].set_ylabel('Acceptance Rate')
axes[0].set_xlabel('')
axes[0].set_title('Acceptance: Specific Passenger/Occupation Criteria')
axes[0].set_ylim(0, 1)
for i, row in passenger_occ_df.iterrows():
    axes[0].text(i, row['Acceptance Rate'], f'{row["Acceptance Rate"]:.1%}', ha='center', va='bottom')

# Subplot 2: Group sizes
sns.barplot(data=passenger_occ_df, x='Group', y='Count', ax=axes[1], hue='Group', palette=['#99ff99', '#ffcc99'], legend=False)
axes[1].set_ylabel('Number of Coupons')
axes[1].set_xlabel('')
axes[1].set_title('Sample Size by Group')
for i, row in passenger_occ_df.iterrows():
    axes[1].text(i, row['Count'], str(int(row['Count'])), ha='center', va='bottom')

plt.tight_layout()
plt.show()

6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K.



In [None]:
# Complex conditions: multiple OR groups
# Group 1: go to bars >1/month, passenger not kid, not widowed
group1 = bar_coupons[(bar_coupons['Bar'].isin(['1~3', '4~8', 'gt8'])) & 
                     (bar_coupons['passanger'] != 'Kid(s)') & 
                     (bar_coupons['maritalStatus'] != 'Widowed')]

# Group 2: go to bars >1/month and under 30
group2 = bar_coupons[(bar_coupons['Bar'].isin(['1~3', '4~8', 'gt8'])) & 
                     (bar_coupons['age'].isin(['below21', '21', '26']))]

# Group 3: go to cheap restaurants >4/month and income < 50K
group3 = bar_coupons[(bar_coupons['RestaurantLessThan20'].isin(['4~8', 'gt8'])) & 
                     (bar_coupons['income'].isin(['Less than $12500', '$12500 - $24999', 
                                                  '$25000 - $37499', '$37500 - $49999']))]

# Combine all three groups (OR condition)
combined_group = pd.concat([group1, group2, group3]).drop_duplicates()

# Calculate acceptance rates for each group
acceptance_group1 = group1['Y'].sum() / len(group1) if len(group1) > 0 else 0
acceptance_group2 = group2['Y'].sum() / len(group2) if len(group2) > 0 else 0
acceptance_group3 = group3['Y'].sum() / len(group3) if len(group3) > 0 else 0
acceptance_combined = combined_group['Y'].sum() / len(combined_group)
acceptance_all_bar = bar_coupons['Y'].sum() / len(bar_coupons)

# Create visualization data
complex_df = pd.DataFrame({
    'Group': ['Group 1\n(Frequent+\nNo Kids+\nNot Widowed)', 
              'Group 2\n(Frequent+\nUnder 30)', 
              'Group 3\n(Restaurants+\nLow Income)',
              'Combined\n(Any Group)',
              'All Bar\nCoupons'],
    'Acceptance Rate': [acceptance_group1, acceptance_group2, acceptance_group3, acceptance_combined, acceptance_all_bar],
    'Count': [len(group1), len(group2), len(group3), len(combined_group), len(bar_coupons)]
})

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Subplot 1: Individual group acceptance rates
sns.barplot(data=complex_df, x='Group', y='Acceptance Rate', ax=axes[0], hue='Group', palette='Set2', legend=False)
axes[0].set_ylabel('Acceptance Rate')
axes[0].set_xlabel('')
axes[0].set_title('Complex Multi-Condition Group Acceptance')
axes[0].set_ylim(0, 1)
axes[0].tick_params(axis='x', rotation=0, labelsize=9)
for i, row in complex_df.iterrows():
    axes[0].text(i, row['Acceptance Rate'], f'{row["Acceptance Rate"]:.1%}', ha='center', va='bottom', fontsize=9)

# Subplot 2: Group sizes
sns.barplot(data=complex_df, x='Group', y='Count', ax=axes[1], hue='Group', palette='Set2', legend=False)
axes[1].set_ylabel('Number of Coupons')
axes[1].set_xlabel('')
axes[1].set_title('Sample Size by Group')
axes[1].tick_params(axis='x', rotation=0, labelsize=9)
for i, row in complex_df.iterrows():
    axes[1].text(i, row['Count'], str(int(row['Count'])), ha='center', va='bottom', fontsize=9)

plt.tight_layout()
plt.show()

7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

In [None]:
"""
Hypothesis about drivers who accepted bar coupons:

Based on the analysis, drivers who are more likely to accept bar coupons have these characteristics:

1. They visit bars frequently (more than 3 times a month) - these drivers show much higher acceptance rates
2. They are younger (under 30) or middle-aged (over 25) who already go to bars regularly
3. They are not traveling with kids as passengers
4. They have social lifestyles - they go to bars and cheap restaurants frequently
5. They are not in farming/fishing/forestry occupations

The key pattern is that bar coupon acceptance is strongly correlated with existing bar-going behavior. 
People who already visit bars are much more likely to accept bar coupons compared to those who rarely 
or never go to bars. Age and passenger type also play important roles.
"""

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

In [None]:
### Independent Investigation: Coffee House Coupons

# Create a DataFrame with just coffee house coupons
coffee_coupons = data[data['coupon'] == 'Coffee House']
coffee_acceptance_rate = coffee_coupons['Y'].sum() / len(coffee_coupons)

# Create a DataFrame for plotting
coffee_acceptance_df = pd.DataFrame({
    'Status': ['Rejected', 'Accepted'],
    'Count': coffee_coupons['Y'].value_counts().sort_index().values,
    'Percentage': coffee_coupons['Y'].value_counts(normalize=True).sort_index().values * 100
})

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Count plot
sns.barplot(data=coffee_acceptance_df, x='Status', y='Count', ax=axes[0], hue='Status', palette=['#ffcccc', '#ccffcc'], legend=False)
axes[0].set_xlabel('Coupon Status')
axes[0].set_ylabel('Count')
axes[0].set_title('Coffee House Coupon Acceptance Count')
for i, row in coffee_acceptance_df.iterrows():
    axes[0].text(i, row['Count'], f'{int(row["Count"])}', ha='center', va='bottom')

# Percentage bar plot
sns.barplot(data=coffee_acceptance_df, x='Status', y='Percentage', ax=axes[1], hue='Status', palette=['#ffcccc', '#ccffcc'], legend=False)
axes[1].set_ylabel('Percentage (%)')
axes[1].set_title(f'Coffee House Coupons (Total: {len(coffee_coupons)})')
for i, row in coffee_acceptance_df.iterrows():
    axes[1].text(i, row['Percentage'], f'{row["Percentage"]:.1f}%', ha='center', va='bottom')

plt.tight_layout()
plt.show()

In [None]:
# Compare acceptance by coffee house visit frequency
coffee_frequent = coffee_coupons[coffee_coupons['CoffeeHouse'].isin(['1~3', '4~8', 'gt8'])]
coffee_infrequent = coffee_coupons[coffee_coupons['CoffeeHouse'].isin(['never', 'less1'])]

acceptance_frequent = coffee_frequent['Y'].sum() / len(coffee_frequent)
acceptance_infrequent = coffee_infrequent['Y'].sum() / len(coffee_infrequent)

# Create visualization data
coffee_freq_df = pd.DataFrame({
    'Group': ['Frequent\n(≥1/month)', 'Infrequent\n(<1/month)'],
    'Acceptance Rate': [acceptance_frequent, acceptance_infrequent],
    'Count': [len(coffee_frequent), len(coffee_infrequent)]
})

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Subplot 1: Acceptance rates comparison
sns.barplot(data=coffee_freq_df, x='Group', y='Acceptance Rate', ax=axes[0], hue='Group', palette=['#66b3ff', '#ff9999'], legend=False)
axes[0].set_ylabel('Acceptance Rate')
axes[0].set_xlabel('')
axes[0].set_title('Coffee House Coupon Acceptance by Visit Frequency')
axes[0].set_ylim(0, 1)
for i, row in coffee_freq_df.iterrows():
    axes[0].text(i, row['Acceptance Rate'], f'{row["Acceptance Rate"]:.1%}', ha='center', va='bottom')

# Subplot 2: Group sizes
sns.barplot(data=coffee_freq_df, x='Group', y='Count', ax=axes[1], hue='Group', palette=['#66b3ff', '#ff9999'], legend=False)
axes[1].set_ylabel('Number of Coupons')
axes[1].set_xlabel('')
axes[1].set_title('Sample Size by Group')
for i, row in coffee_freq_df.iterrows():
    axes[1].text(i, row['Count'], str(int(row['Count'])), ha='center', va='bottom')

plt.tight_layout()
plt.show()

In [None]:
# Compare by time of day
coffee_morning = coffee_coupons[coffee_coupons['time'].isin(['7AM', '10AM'])]
coffee_afternoon = coffee_coupons[coffee_coupons['time'].isin(['2PM', '6PM'])]

if len(coffee_morning) > 0 and len(coffee_afternoon) > 0:
    acceptance_morning = coffee_morning['Y'].sum() / len(coffee_morning)
    acceptance_afternoon = coffee_afternoon['Y'].sum() / len(coffee_afternoon)
    
    # Create visualization data
    coffee_time_df = pd.DataFrame({
        'Group': ['Morning\n(7AM, 10AM)', 'Afternoon\n(2PM, 6PM)'],
        'Acceptance Rate': [acceptance_morning, acceptance_afternoon],
        'Count': [len(coffee_morning), len(coffee_afternoon)]
    })
    
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    # Subplot 1: Acceptance rates comparison
    sns.barplot(data=coffee_time_df, x='Group', y='Acceptance Rate', ax=axes[0], hue='Group', palette=['#ffcc99', '#99ccff'], legend=False)
    axes[0].set_ylabel('Acceptance Rate')
    axes[0].set_xlabel('')
    axes[0].set_title('Coffee House Acceptance by Time of Day')
    axes[0].set_ylim(0, 1)
    for i, row in coffee_time_df.iterrows():
        axes[0].text(i, row['Acceptance Rate'], f'{row["Acceptance Rate"]:.1%}', ha='center', va='bottom')
    
    # Subplot 2: Group sizes
    sns.barplot(data=coffee_time_df, x='Group', y='Count', ax=axes[1], hue='Group', palette=['#ffcc99', '#99ccff'], legend=False)
    axes[1].set_ylabel('Number of Coupons')
    axes[1].set_xlabel('')
    axes[1].set_title('Sample Size by Time Period')
    for i, row in coffee_time_df.iterrows():
        axes[1].text(i, row['Count'], str(int(row['Count'])), ha='center', va='bottom')
    
    plt.tight_layout()
    plt.show()

In [None]:
# Compare by passenger type
coffee_alone = coffee_coupons[coffee_coupons['passanger'] == 'Alone']
coffee_friend = coffee_coupons[coffee_coupons['passanger'] == 'Friend(s)']

acceptance_alone = coffee_alone['Y'].sum() / len(coffee_alone)
acceptance_friend = coffee_friend['Y'].sum() / len(coffee_friend)

# Create visualization data
coffee_passenger_df = pd.DataFrame({
    'Group': ['Alone', 'With Friend(s)'],
    'Acceptance Rate': [acceptance_alone, acceptance_friend],
    'Count': [len(coffee_alone), len(coffee_friend)]
})

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Subplot 1: Acceptance rates comparison
sns.barplot(data=coffee_passenger_df, x='Group', y='Acceptance Rate', ax=axes[0], hue='Group', palette=['#ff99cc', '#99ff99'], legend=False)
axes[0].set_ylabel('Acceptance Rate')
axes[0].set_xlabel('')
axes[0].set_title('Coffee House Acceptance by Passenger Type')
axes[0].set_ylim(0, 1)
for i, row in coffee_passenger_df.iterrows():
    axes[0].text(i, row['Acceptance Rate'], f'{row["Acceptance Rate"]:.1%}', ha='center', va='bottom')

# Subplot 2: Group sizes
sns.barplot(data=coffee_passenger_df, x='Group', y='Count', ax=axes[1], hue='Group', palette=['#ff99cc', '#99ff99'], legend=False)
axes[1].set_ylabel('Number of Coupons')
axes[1].set_xlabel('')
axes[1].set_title('Sample Size by Passenger Type')
for i, row in coffee_passenger_df.iterrows():
    axes[1].text(i, row['Count'], str(int(row['Count'])), ha='center', va='bottom')

plt.tight_layout()
plt.show()

In [None]:
"""
Summary of Coffee House Coupon Analysis:

Key Findings:
1. Visit Frequency: People who already visit coffee houses regularly are more likely to accept coffee coupons.
   This mirrors the pattern we saw with bar coupons.

2. Time of Day: Morning times show different acceptance patterns than afternoon/evening, likely because
   coffee is more associated with morning routines.

3. Social Context: Passengers matter - whether alone or with friends affects acceptance rates.
   This suggests coffee house visits have both social and solitary aspects.

Overall Pattern:
Similar to bar coupons, the strongest predictor of coffee house coupon acceptance is existing behavior.
People who already have coffee house habits are much more receptive to these coupons. The contextual
factors (time, passenger) provide additional refinement to understanding acceptance patterns.
"""