### Required Assignment 5.1: Will the Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaurant near where you are driving. Would you accept that coupon and take a short detour to the restaurant? Would you accept the coupon but use it on a subsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaurant? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\$20 - $50).

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece.





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [None]:
data = pd.read_csv('data/coupons.csv')

In [None]:
data.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


2. Investigate the dataset for missing or problematic data.

In [None]:
# ## Task
# **Investigate the dataset for missing or problematic data.**

# ---

# ## Findings

# ### 🔍 Missing Values
# > **'car' column**:  
# > - 12,576 missing values (99.15%)  
# >  
# > **Other columns with missing values**:  
# > - CoffeeHouse: 217 (1.71%)  
# > - Restaurant20To50: 189 (1.49%)  
# > - CarryAway: 151 (1.19%)  
# > - RestaurantLessThan20: 130 (1.02%)  
# > - Bar: 107 (0.84%)

# ---

# ### 🧮 Data Type Issues
# > - 'age' is stored as an object but contains numerical values  
# > - Several categorical columns are incorrectly stored as objects  
# > - Binary columns (`Y`, `direction_same`, `direction_opp`) are correctly typed as `int64`

# ---

# ### ⚠️ Data Quality Issues
# > - 74 duplicate rows found  
# > - 'age' values include `"below21"` and `"50plus"` as strings  
# > - 'temperature' has only three distinct values: 30, 55, 80  
# > - 'time' values are consistent with five distinct options  
# > - 'weather' has three values: Sunny, Snowy, Rainy

# ---

# ### 🚩 Potential Outliers
# > - No outliers in: `temperature`, `has_children`, or `Y`  
# > - **toCoupon_GEQ25min**: 1,511 potential outliers  
# > - **direction_same** and **direction_opp**: 2,724 potential outliers each

# ---

# ### 📊 Value Distributions
# > - Coupon types are well distributed; Coffee House is most common  
# > - Gender is fairly balanced: 6,511 Female, 6,173 Male  
# > - Age groups are reasonably distributed  
# > - Weather is heavily skewed toward Sunny (10,069 cases)


3. Decide what to do about your missing data -- drop, replace, other...

In [None]:
# ## ✅ Recommendations for Data Cleaning
# - Consider **dropping the 'car' column** due to excessive missing data  
# - Impute or drop rows with missing values in other columns  
# - Convert `'age'` to a **numerical or categorized** format  
# - Remove **duplicate rows**  
# - Standardize **categorical variables**  
# - Investigate and potentially address outliers in:
#   - `toCoupon_GEQ25min`
#   - `direction_same`
#   - `direction_opp`

4. What proportion of the total observations chose to accept the coupon?



In [None]:
# Based on the data investigation, I can answer this question from the Numerical Columns Analysis section, specifically from the Target Variable (Y) statistics:
# The proportion of total observations that chose to accept the coupon is 57% (0.57 or 57.41% to be more precise).
# This can be seen in the Target Variable (Y) section of the analysis:
# Mean: 0.57 (57% acceptance rate)
# Standard Deviation: 0.50
# This means that out of all the observations in the dataset, 57.41% of drivers chose to accept the coupons they were offered. This is a binary variable where:
# 1 represents accepting the coupon
# 0 represents not accepting the coupon
# The standard deviation of 0.50 indicates that there's a good spread in the acceptance rates, which is expected for a binary variable with a mean around 0.5.

5. Use a bar plot to visualize the `coupon` column.

6. Use a histogram to visualize the temperature column.

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


2. What proportion of bar coupons were accepted?


3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry.


6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K.



In [None]:
# Bar Coupon Acceptance by Visit Frequency:
# --------------------------------------------------
#                    Total Coupons  Acceptance Rate
# visit_frequency                                  
# 3 or fewer times            1818            0.371
# More than 3 times            199            0.769

# Detailed Statistics:
# --------------------------------------------------
# Infrequent visitors (≤3 times/month):
# Total coupons: 1818
# Acceptance rate: 37.1%

# Frequent visitors (>3 times/month):
# Total coupons: 199

7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

In [None]:
# Based on the analysis of bar coupon acceptance rates, I can make several key observations and hypotheses about drivers who accepted bar coupons:
# Frequency of Bar Visits is a Strong Predictor
# Drivers who visit bars more than 3 times per month have a significantly higher acceptance rate (76.9%) compared to those who visit 3 or fewer times (37.1%)
# This suggests that regular bar-goers are much more likely to accept bar coupons, possibly because:
# They are more familiar with bar environments
# They have a stronger preference for bar experiences
# They may be more likely to incorporate bar visits into their regular routine
# Sample Size Considerations
# The analysis shows 1,818 coupons for infrequent visitors (≤3 times/month)
# Only 199 coupons for frequent visitors (>3 times/month)
# This distribution suggests that bar coupons are more commonly offered to less frequent bar visitors, which might be a strategic marketing approach to attract new customers
# Behavioral Patterns
# The stark difference in acceptance rates (76.9% vs 37.1%) indicates that bar-going behavior is a strong indicator of coupon acceptance
# This suggests that past behavior (frequency of bar visits) is a reliable predictor of future behavior (coupon acceptance)
# Marketing Implications
# The data suggests that bar coupons might be most effective when targeted at:
# Regular bar visitors (who have a high likelihood of acceptance)
# Or potentially used as a strategy to attract infrequent visitors (though with lower acceptance rates)
# Potential Confounding Factors
# The analysis doesn't account for other factors that might influence acceptance, such as:
# Time of day
# Weather conditions
# Presence of passengers
# Age of the driver
# These factors might interact with bar visit frequency to influence acceptance rates

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

In [None]:
# Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.

# I'll create an analysis to compare the acceptance rates between frequent and infrequent bar visitors. Let me modify our code to perform this comparison:
# )
# Now let's run this analysis:
# bar_vis
# python bar_visit_analysis.py

# I'll run the bar visit analysis script:
# 7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

# Based on the analysis of bar coupon acceptance rates, I can make several key observations and hypotheses about drivers who accepted bar coupons:
# Frequency of Bar Visits is a Strong Predictor
# Drivers who visit bars more than 3 times per month have a significantly higher acceptance rate (76.9%) compared to those who visit 3 or fewer times (37.1%)
# This suggests that regular bar-goers are much more likely to accept bar coupons, possibly because:
# They are more familiar with bar environments
# They have a stronger preference for bar experiences
# They may be more likely to incorporate bar visits into their regular routine
# Sample Size Considerations
# The analysis shows 1,818 coupons for infrequent visitors (≤3 times/month)
# Only 199 coupons for frequent visitors (>3 times/month)
# This distribution suggests that bar coupons are more commonly offered to less frequent bar visitors, which might be a strategic marketing approach to attract new customers
# Behavioral Patterns
# The stark difference in acceptance rates (76.9% vs 37.1%) indicates that bar-going behavior is a strong indicator of coupon acceptance
# This suggests that past behavior (frequency of bar visits) is a reliable predictor of future behavior (coupon acceptance)
# Marketing Implications
# The data suggests that bar coupons might be most effective when targeted at:
# Regular bar visitors (who have a high likelihood of acceptance)
# Or potentially used as a strategy to attract infrequent visitors (though with lower acceptance rates)
# Potential Confounding Factors
# The analysis doesn't account for other factors that might influence acceptance, such as:
# Time of day
# Weather conditions
# Presence of passengers
# Age of the driver
# These factors might interact with bar visit frequency to influence acceptance rates
# Would you like me to:
# Investigate any of these hypotheses further?
# Look at how other factors (like age, time of day, or weather) interact with bar visit frequency?
# Compare these acceptance rates with other types of coupons?
# Analyze any other specific aspects of the bar coupon data?
# Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

# I'll analyze the Coffee House coupons, as they represent another common type of coupon in the dataset. I'll create a similar analysis to what we did for bar coupons:
# Now let's run this analysis:
# Based on the analysis of coffee house coupons, I can identify several key characteristics of passengers who accept these coupons:
# Visit Frequency Impact
# Similar to bar coupons, frequent visitors (>3 times/month) have a higher acceptance rate (67.5%) compared to infrequent visitors (45.0%)
# However, the difference is less dramatic than with bar coupons (76.9% vs 37.1%)
# Time of Day Preferences
# Highest acceptance rates:
# 10AM: 64.1% (899 coupons)
# 2PM: 54.8% (794 coupons)
# Lowest acceptance rates:
# 6PM: 41.3% (1,093 coupons)
# 10PM: 42.4% (297 coupons)
# This suggests that coffee house coupons are most effective during morning and early afternoon hours
# Age Group Patterns
# Highest acceptance rates:
# Below 21: 69.7% (155 coupons)
# 21-25: 52.4% (883 coupons)
# Lowest acceptance rates:
# 50plus: 42.0% (545 coupons)
# 31-35: 47.7% (623 coupons)
# This indicates that younger demographics are more likely to accept coffee house coupons
# Weather Impact
# Sunny: 50.4% (3,467 coupons)
# Rainy: 52.2% (226 coupons)
# Snowy: 43.2% (303 coupons)
# Weather has a moderate impact, with slightly higher acceptance rates during sunny and rainy conditions
# Overall Sample Size
# Total coffee house coupons: 3,996
# This is significantly larger than the bar coupon sample (2,017), suggesting coffee house coupons are more commonly offered
# Key Hypotheses:
# Coffee house coupons are most effective when targeted at:
# Young adults (especially under 21)
# Regular coffee house visitors
# Morning and early afternoon hours
# The acceptance pattern suggests that coffee house visits are more routine and less weather-dependent than bar visits
# The time of day effect is particularly strong, indicating that coffee consumption patterns significantly influence coupon acceptance