### Required Assignment 5.1: Will the Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaurant near where you are driving. Would you accept that coupon and take a short detour to the restaurant? Would you accept the coupon but use it on a subsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaurant? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\$20 - $50).

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece.





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [7]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

ModuleNotFoundError: No module named 'matplotlib'

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [None]:
data = pd.read_csv('data/coupons.csv')

In [None]:
data.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


2. Investigate the dataset for missing or problematic data.

In [None]:
data.isnull().sum() #This provides the 

3. Decide what to do about your missing data -- drop, replace, other...

In [None]:
data_cleaned = data.drop(columns=['car']) 

data_updated = data_cleaned.fillna('Unknown').reset_index()

4. What proportion of the total observations chose to accept the coupon?



In [1]:

total_observations = len(data_updated) #Total Size of the dataframe
coupon_accepted = data_updated['Y'].sum()  #Total count of enteries where Y= 1
proportion_accepted = (coupon_accepted / total_observations)*100 #Percentage of propotion accepted

NameError: name 'data_updated' is not defined

5. Use a bar plot to visualize the `coupon` column.

In [None]:
coupon_counts = data_updated['coupon'].value_counts()

plt.figure(figsize=(10, 6))
coupon_counts.plot(kind='bar')
plt.title('Distribution of Coupons', fontsize=16)
plt.xlabel('Coupon Type', fontsize=14)
plt.ylabel('Coupon Count', fontsize=14)
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

6. Use a histogram to visualize the temperature column.

In [None]:
plt.figure(figsize=(10, 6))
plt.hist(data_updated['temperature'], bins=20, edgecolor='black')
plt.title('Histogram of Temperature', fontsize=16)
plt.xlabel('Temperature', fontsize=14)
plt.ylabel('Frequency', fontsize=14)
plt.tight_layout()
plt.show()

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [None]:
Coupon_bar = data_updated[data_updated['coupon'] == 'Bar']

2. What proportion of bar coupons were accepted?


In [None]:
total_bar_coupons = len(Coupon_bar) #Total Size of the dataframe
accepted_bar_coupons = Coupon_bar['Y'].sum() #Total count of enteries where Y= 1

proportion_bar_coupons_accepted = (accepted_bar_coupons / total_bar_coupons) * 100


3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [None]:

low_bar_visitors = Coupon_bar[Coupon_bar['Bar'].isin(['never', 'less1', '1~3'])] #data set with lower turn around group
high_bar_visitors = Coupon_bar[Coupon_bar['Bar'].isin(['4~8', 'gt8'])] # data set with high turn around group


low_visitors_acceptance_rate = (low_bar_visitors['Y'].sum() / len(low_bar_visitors)) * 100  #Total count of enteries where Y= 1 
high_visitors_acceptance_rate = (high_bar_visitors['Y'].sum() / len(high_bar_visitors)) * 100 #Total count of enteries where Y= 1



4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [None]:
frequent_drivers_over_25 = Coupon_bar[
    (Coupon_bar['Bar'].isin(['4~8', 'gt8', '1~3'])) & 
    (Coupon_bar['age'].str.extract('(\d+)').astype(int) > 25).squeeze()
]

# Filter all other drivers
other_drivers = Coupon_bar[~Coupon_bar.index.isin(frequent_drivers_over_25.index)]

# Calculate acceptance rates for each group
frequent_over_25_acceptance_rate = (frequent_drivers_over_25['Y'].sum() / len(frequent_drivers_over_25)) * 100
others_acceptance_rate = (other_drivers['Y'].sum() / len(other_drivers)) * 100


5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry.


In [None]:
frequent_bar_drivers_filtered = Coupon_bar[
    (Coupon_bar['Bar'].isin(['4~8', 'gt8', '1~3'])) & 
    (Coupon_bar['passanger'] != "Kid(s)") &
    (~Coupon_bar['occupation'].str.contains("Farming, Fishing, or Forestry", na=False))
]

# Filter all other drivers
other_drivers_filtered = Coupon_bar[~Coupon_bar.index.isin(frequent_bar_drivers_filtered.index)]

# Calculate acceptance rates for each group
frequent_filtered_acceptance_rate = (frequent_bar_drivers_filtered['Y'].sum() / len(frequent_bar_drivers_filtered)) * 100
others_filtered_acceptance_rate = (other_drivers_filtered['Y'].sum() / len(other_drivers_filtered)) * 100

frequent_filtered_acceptance_rate, others_filtered_acceptance_rate

6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K.



In [None]:


# 1. Go to bars more than once a month, had passengers not a kid, and were not widowed
condition_1 = (
    (Coupon_bar['Bar'].isin(['4~8', 'gt8', '1~3'])) &
    (Coupon_bar['passanger'] != "Kid(s)") &
    (Coupon_bar['maritalStatus'] != "Widowed")
)

# 2. Go to bars more than once a month and are under the age of 30
condition_2 = (
    (Coupon_bar['Bar'].isin(['4~8', 'gt8', '1~3'])) &
    (Coupon_bar['age'].str.extract('(\d+)').astype(int) < 30).squeeze()
)

# 3. Go to cheap restaurants more than 4 times a month and income is less than 50K
condition_3 = (
    (Coupon_bar['RestaurantLessThan20'].isin(['4~8', 'gt8'])) &
    (Coupon_bar['income'].str.contains('<50K', na=False))
)

#  OR logic
combined_conditions = condition_1 | condition_2 | condition_3

# Data set meeting one condition
filtered_drivers = Coupon_bar[combined_conditions]

# Filter for drivers not meeting the conditions
other_drivers = Coupon_bar[~combined_conditions]

# Calculate acceptance rates for each group
filtered_acceptance_rate = (filtered_drivers['Y'].sum() / len(filtered_drivers)) * 100
other_acceptance_rate = (other_drivers['Y'].sum() / len(other_drivers)) * 100


7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

In [None]:
#Drivers who accepted bar coupons are typically younger, socially active, and frequent bar-goers. They are more likely to accept coupons if they don’t have kids as passengers and are influenced by social and budget-conscious habits. Those visiting cheap restaurants often and earning less than 50K also show a higher acceptance rate. This suggests that bar coupons appeal most to individuals whose lifestyle aligns with social outings and cost-saving incentives.

SyntaxError: invalid character '’' (U+2019) (385018467.py, line 1)

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

In [None]:
# Filter for Coffee House coupons 
Coupon_coffee = data_updated[data_updated['coupon'] == 'Coffee House']


# Proportion of Coffee House coupons accepted
total_coffee_coupons = len(Coupon_coffee)
accepted_coffee_coupons = Coupon_coffee['Y'].sum()
proportion_coffee_coupons_accepted = (accepted_coffee_coupons / total_coffee_coupons) * 100


# Low vs High Coffee House Visitors' Acceptance Rate
low_coffee_visitors = Coupon_coffee[Coupon_coffee['CoffeeHouse'].isin(['never', 'less1', '1~3'])]
high_coffee_visitors = Coupon_coffee[Coupon_coffee['CoffeeHouse'].isin(['4~8', 'gt8'])]
low_visitors_acceptance_rate = (low_coffee_visitors['Y'].sum() / len(low_coffee_visitors)) * 100
high_visitors_acceptance_rate = (high_coffee_visitors['Y'].sum() / len(high_coffee_visitors)) * 100

# Drivers over 25 who visit coffee houses frequently
frequent_drivers_over_25 = Coupon_coffee[
    (Coupon_coffee['CoffeeHouse'].isin(['4~8', 'gt8', '1~3'])) & 
    (Coupon_coffee['age'].str.extract('(\d+)').astype(int) > 25).squeeze()
]

other_drivers = Coupon_coffee[~Coupon_coffee.index.isin(frequent_drivers_over_25.index)]
frequent_over_25_acceptance_rate = (frequent_drivers_over_25['Y'].sum() / len(frequent_drivers_over_25)) * 100
others_acceptance_rate = (other_drivers['Y'].sum() / len(other_drivers)) * 100


# Drivers who visit coffee houses frequently and meet specific conditions
frequent_coffee_drivers_filtered = Coupon_coffee[
    (Coupon_coffee['CoffeeHouse'].isin(['4~8', 'gt8', '1~3'])) & 
    (Coupon_coffee['passanger'] != "Kid(s)") &
    (~Coupon_coffee['occupation'].str.contains("Farming, Fishing, or Forestry", na=False))
]

other_drivers_filtered = Coupon_coffee[~Coupon_coffee.index.isin(frequent_coffee_drivers_filtered.index)]
frequent_filtered_acceptance_rate = (frequent_coffee_drivers_filtered['Y'].sum() / len(frequent_coffee_drivers_filtered)) * 100
others_filtered_acceptance_rate = (other_drivers_filtered['Y'].sum() / len(other_drivers_filtered)) * 100


# Combined conditions for drivers
condition_1 = (
    (Coupon_coffee['CoffeeHouse'].isin(['4~8', 'gt8', '1~3'])) &
    (Coupon_coffee['passanger'] != "Kid(s)") &
    (Coupon_coffee['maritalStatus'] != "Widowed")
)
condition_2 = (
    (Coupon_coffee['CoffeeHouse'].isin(['4~8', 'gt8', '1~3'])) &
    (Coupon_coffee['age'].str.extract('(\d+)').astype(int) < 30).squeeze()
)
condition_3 = (
    (Coupon_coffee['RestaurantLessThan20'].isin(['4~8', 'gt8'])) &
    (Coupon_coffee['income'].str.contains('<50K', na=False))
)

combined_conditions = condition_1 | condition_2 | condition_3
filtered_drivers = Coupon_coffee[combined_conditions]
other_drivers = Coupon_coffee[~combined_conditions]
filtered_acceptance_rate = (filtered_drivers['Y'].sum() / len(filtered_drivers)) * 100
other_acceptance_rate = (other_drivers['Y'].sum() / len(other_drivers)) * 100



In [None]:
#Drivers who accepted Coffee House coupons are typically younger and frequent visitors to coffee shops. They are more likely to accept coupons if they are alone or with friends, and not accompanied by kids. Acceptance rates are higher among those aged 21–26 and individuals with incomes between $12,500 and $50,000. This suggests that Coffee House coupons appeal to socially active, budget-conscious individuals whose lifestyle includes regular coffee shop visits.