### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [222]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [2]:
data = pd.read_csv('data/coupons.csv')

In [3]:
data.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


2. Investigate the dataset for missing or problematic data.

### First, I want to get the full list of columns and their data type. The notable observations are that age and income are not integers and are ranges instead.

In [5]:
data.dtypes

destination             object
passanger               object
weather                 object
temperature              int64
time                    object
coupon                  object
expiration              object
gender                  object
age                     object
maritalStatus           object
has_children             int64
education               object
occupation              object
income                  object
car                     object
Bar                     object
CoffeeHouse             object
CarryAway               object
RestaurantLessThan20    object
Restaurant20To50        object
toCoupon_GEQ5min         int64
toCoupon_GEQ15min        int64
toCoupon_GEQ25min        int64
direction_same           int64
direction_opp            int64
Y                        int64
dtype: object

### Next, I wanted to see what columns have NaN values. It is evident that Car column is not useful given most of it's values are null. In addition, there are a couple hundred Nan values for a few of other fields.

In [6]:
data.isnull().sum()

destination                 0
passanger                   0
weather                     0
temperature                 0
time                        0
coupon                      0
expiration                  0
gender                      0
age                         0
maritalStatus               0
has_children                0
education                   0
occupation                  0
income                      0
car                     12576
Bar                       107
CoffeeHouse               217
CarryAway                 151
RestaurantLessThan20      130
Restaurant20To50          189
toCoupon_GEQ5min            0
toCoupon_GEQ15min           0
toCoupon_GEQ25min           0
direction_same              0
direction_opp               0
Y                           0
dtype: int64

3. Decide what to do about your missing data -- drop, replace, other...

### I won't be using the Car column given most of its values are NaN. For other columns that have NaN values, I'll drop those rows. Given I have 12k+ rows in the table, I'll still have plenty of data to make my assessment after dropping those rows.

In [7]:
data = data[['destination', 'passanger', 'weather', 'temperature', 'time', 'coupon',
       'expiration', 'gender', 'age', 'maritalStatus', 'has_children',
       'education', 'occupation', 'income', 'Bar', 'CoffeeHouse',
       'CarryAway', 'RestaurantLessThan20', 'Restaurant20To50',
       'toCoupon_GEQ5min', 'toCoupon_GEQ15min', 'toCoupon_GEQ25min',
       'direction_same', 'direction_opp', 'Y']]

In [8]:
data = data.dropna()
data

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
22,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Male,21,Single,...,less1,4~8,4~8,less1,1,0,0,0,1,1
23,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Male,21,Single,...,less1,4~8,4~8,less1,1,0,0,0,1,0
24,No Urgent Place,Friend(s),Sunny,80,10AM,Bar,1d,Male,21,Single,...,less1,4~8,4~8,less1,1,0,0,0,1,1
25,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Male,21,Single,...,less1,4~8,4~8,less1,1,1,0,0,1,0
26,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Male,21,Single,...,less1,4~8,4~8,less1,1,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12679,Home,Partner,Rainy,55,6PM,Carry out & Take away,1d,Male,26,Single,...,never,1~3,4~8,1~3,1,0,0,1,0,1
12680,Work,Alone,Rainy,55,7AM,Carry out & Take away,1d,Male,26,Single,...,never,1~3,4~8,1~3,1,0,0,0,1,1
12681,Work,Alone,Snowy,30,7AM,Coffee House,1d,Male,26,Single,...,never,1~3,4~8,1~3,1,0,0,1,0,0
12682,Work,Alone,Snowy,30,7AM,Bar,1d,Male,26,Single,...,never,1~3,4~8,1~3,1,1,1,0,1,0


4. What proportion of the total observations chose to accept the coupon? 



### based on the math below, 57% accepted the coupan across all coupon types.


In [135]:
portion_accept = len(data.query('Y == 1'))/len(data['Y'])
print("total oveservations who chose to accept the coupon is " + str(portion_accept))

total oveservations who chose to accept the coupon is 0.5693352098683666


5. Use a bar plot to visualize the `coupon` column.

### Will be importing Plotly to draw the bar chart

In [237]:
import plotly.express as px 
px.bar(data, x = 'coupon')

In [140]:
data['coupon'].value_counts()

Coffee House             3816
Restaurant(<20)          2653
Carry out & Take away    2280
Bar                      1913
Restaurant(20-50)        1417
Name: coupon, dtype: int64

6. Use a histogram to visualize the temperature column.

In [136]:
px.histogram(data['temperature'])

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [239]:
bar_data = data.query('coupon == "Bar"')

2. What proportion of bar coupons were accepted?


### based on the math below, the portion is 41%

In [142]:
portion_bar_accept = len(bar_data.query('Y == 1'))/len(bar_data)
print ("Portion of Bar Coupons accepted is " + str(portion_bar_accept))

Portion of Bar Coupons accepted is 0.41191845269210664


3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


### based on the math below, acceptance rate is 76% for those who went to bar 3 times or more and 37% for those who went less than 3.

In [37]:
data['Bar'].unique() ##first, I want to understand the different possible values

array(['never', 'less1', '1~3', 'gt8', '4~8'], dtype=object)

In [240]:
three_or_fewer = ['never', 'less1', '1~3']


bar_accept_3orless = len(bar_data.query('Bar in @three_or_fewer & Y == 1'))/len(bar_data.query('Bar in @three_or_fewer'))
bar_accept_more_three = len(bar_data.query('Bar not in @three_or_fewer  & Y == 1'))/len(bar_data.query('Bar not in @three_or_fewer'))

print("Acceptance rate of those who went to bar 3 or fewer per month is " + str(bar_accept_3orless))
print("Acceptance rate of those who went to bar more than 3 times per month is " + str(bar_accept_more_three))

Acceptance rate of those who went to bar 3 or fewer per month is 0.37267441860465117
Acceptance rate of those who went to bar more than 3 times per month is 0.7616580310880829


4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


### Based on the math below, the acceptance rate is 69% for those who went to bar more than once a month and are over 25, compared to 39% for others.

In [243]:
data['age'].unique()

array(['21', '46', '26', '31', '41', '50plus', '36', 'below21'],
      dtype=object)

In [244]:
more_than_once = ['1~3', 'gt8', '4~8']
over_25 = ['46', '26', '31', '41', '50plus', '36']

more_than_once_over_25 = len(bar_data.query('Bar in @more_than_once & age in @over_25 & Y == 1'))/len(bar_data.query('Bar in @more_than_once & age in @over_25'))

all_others = len(bar_data.query('Bar not in @more_than_once & age not in @over_25 & Y == 1'))/len(bar_data.query('Bar not in @more_than_once & age not in @over_25'))

print("The acceptance rate for those who got to bar more than once and are over 25 is " + str(more_than_once_over_25))
print("The acceptance rate for everyone else is " + str(all_others))


The acceptance rate for those who got to bar more than once and are over 25 is 0.6898263027295285
The acceptance rate for everyone else is 0.3883495145631068


5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


### Acceptance rate is 71% for those who went to bar more than once a month, had passengers that were not kids, and had occumpations other thann farming.

In [88]:
data['passanger'].unique()

array(['Alone', 'Friend(s)', 'Kid(s)', 'Partner'], dtype=object)

In [111]:
data['occupation'].value_counts()

Unemployed                                   1814
Student                                      1497
Computer & Mathematical                      1368
Sales & Related                              1072
Education&Training&Library                    855
Management                                    772
Office & Administrative Support               617
Arts Design Entertainment Sports & Media      564
Business & Financial                          516
Retired                                       473
Food Preparation & Serving Related            276
Healthcare Support                            242
Healthcare Practitioners & Technical          222
Legal                                         219
Community & Social Services                   219
Transportation & Material Moving              218
Protective Service                            175
Architecture & Engineering                    175
Life Physical Social Science                  170
Construction & Extraction                     154


In [145]:
nokid_nofarmer_acceptance = len (bar_data.query('Bar in @more_than_once & passanger in ["Friend(s)","Partner"] & occupation not in ["Farming Fishing & Forestry"] & Y == 1'))/len (bar_data.query('Bar in @more_than_once & passanger in ["Friend(s)","Partner"] & occupation not in ["Farming Fishing & Forestry"] '))
print ("Acceptance Rate for drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry is " + str(nokid_nofarmer_acceptance))


Acceptance Rate for drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry is 0.7142857142857143


6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



a) go to bars more than once a month, had passengers that were not a kid, and were not widowed

### Acceptance reate is 71% for those who go to bar more than once a month, had passenger that were not a kid and were not widowed.

In [131]:
data['maritalStatus'].unique()

array(['Single', 'Married partner', 'Unmarried partner', 'Divorced',
       'Widowed'], dtype=object)

In [146]:
nokid_notwidow_acceptance = len (bar_data.query('Bar in @more_than_once & passanger in ["Friend(s)","Partner"] & maritalStatus not in ["Widowed"] & Y == 1'))/len (bar_data.query('Bar in @more_than_once & passanger in ["Friend(s)","Partner"] & maritalStatus not in ["Widowed"] '))
print ("Acceptance rate for go to bars more than once a month, had passengers that were not a kid, and were not widowed is " + str(nokid_notwidow_acceptance))


Acceptance rate for go to bars more than once a month, had passengers that were not a kid, and were not widowed is 0.7142857142857143


b) go to bars more than once a month and are under the age of 30

### Acceptance rate is 72% for those who are under 30 and go to bar atleast once a month.

In [147]:
under_30 = ['21', '26', 'below21']

More_than_once_less30 = len (bar_data.query('Bar in @more_than_once & age in @under_30 & Y == 1'))/len (bar_data.query('Bar in @more_than_once & age in @under_30'))
print('Acceptance for drivers that go to bars more than once a month and are under the age of 30 is ' + str(More_than_once_less30))

Acceptance for drivers that go to bars more than once a month and are under the age of 30 is 0.7195121951219512


c) go to cheap restaurants more than 4 times a month and income is less than 50K

### Acceptance rate is only 45% amonth those who are low earners, making less than 50k and frequently dining at lower end restaurants.

In [120]:
cheap_more_than4 = ['4~8', 'gt8']

In [122]:
income_less50k = ['$12500 - $24999', '$37500 - $49999', '$25000 - $37499','Less than $12500']

In [148]:
cheap_grt4_income_less50k = len(bar_data.query('RestaurantLessThan20 in @cheap_more_than4 & income in @income_less50k & Y ==1'))/len(bar_data.query('RestaurantLessThan20 in @cheap_more_than4 & income in @income_less50k'))
print("Acceptance for drivers who go to cheap restaurants more than 4 times a month and income is less than 50K is " + str(cheap_grt4_income_less50k))

Acceptance for drivers who go to cheap restaurants more than 4 times a month and income is less than 50K is 0.45645645645645644


7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

### Factors that are directly correlated with customers who accept the bar coupan are: 1) income , 2)whether they eat at cheap restaurants often (which is correlated with income), 3) whether they have a partner in the car that is not a kid, 4) how often they go to bar. 
### Factors that are not important are:  surprisingly, age is not that important as those who are under 30 and those who are over 25 have very close acceptance rates.

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

I will be looking at the Restaurant (20-50) coupons. I will refer to this category as high-end restaurant througout this excercise. Similar to the previous problems, I'll phrase certain questions, find answers and draw conclusions on the type of customers who is more likely to accept the coupons.

What portion of the high-end resutaurant coupons were accepted?

### Total acceptance rate for 20-50 restaurants is 44%

In [161]:
highend_restaurant_data = data.query('coupon == "Restaurant(20-50)"')
hrd = highend_restaurant_data ## first defining a new data frame, called hrd, that only inlcudes the 20-50 restaurant data.


In [162]:
highend_restaurant_acceptance = len(hrd.query('Y==1'))/len(hrd)
print("Acceptance rate of high-end restaurant coupons is " + str(highend_restaurant_acceptance))

Acceptance rate of high-end restaurant coupons is 0.44601270289343686


Compare the acceptance rate of Singles to widows to divorced to those with partners?

### Acceptance rate is higher amongts singles and those with a partner, compared to widowed and devorced 

In [169]:
single_acceptance = len(hrd.query('maritalStatus == "Single" & Y == 1'))/len(hrd.query('maritalStatus == "Single"'))
widowed_acceptance = len(hrd.query('maritalStatus == "Widowed" & Y == 1'))/len(hrd.query('maritalStatus == "Widowed"'))
divorced_acceptance = len(hrd.query('maritalStatus == "Divorced" & Y == 1'))/len(hrd.query('maritalStatus == "Divorced"'))
partnered_acceptance = len(hrd.query('maritalStatus in ["Married partner", "Unmarried partner"] & Y == 1'))/len(hrd.query('maritalStatus in ["Married partner", "Unmarried partner"]'))
print("Acceptance rate for high-end restaurants among singles is " + str(single_acceptance))
print("Acceptance rate for high-end restaurants among widowed is " + str(widowed_acceptance))
print("Acceptance rate for high-end restaurants among divorced is " + str(divorced_acceptance))
print("Acceptance rate for high-end restaurants among those with a partner is " + str(partnered_acceptance))


Acceptance rate for high-end restaurants among singles is 0.47368421052631576
Acceptance rate for high-end restaurants among widowed is 0.21428571428571427
Acceptance rate for high-end restaurants among divorced is 0.3548387096774194
Acceptance rate for high-end restaurants among those with a partner is 0.4379746835443038


Compare the acceptance rate between male and female?

### acceptance rate is higher amongst males that females

In [170]:
hrd['gender'].unique()

array(['Male', 'Female'], dtype=object)

In [171]:
male_acceptance = len(hrd.query('gender == "Male" & Y == 1'))/len(hrd.query('gender == "Male"'))

In [172]:
female_acceptance = len(hrd.query('gender == "Female" & Y == 1'))/len(hrd.query('gender == "Female"'))

In [173]:
print("Acceptance rate for high-end restaurants among Males is " + str(male_acceptance))
print("Acceptance rate for high-end restaurants among Females is " + str(female_acceptance))


Acceptance rate for high-end restaurants among Males is 0.46448863636363635
Acceptance rate for high-end restaurants among Females is 0.4277699859747546


Compare the acceptance rate of those over 30 to everyone else?

### after multiple comparisons, it became clear that acceptance is higher amongst those between 21-30 of age.

In [174]:
hrd['age'].unique()

array(['21', '46', '26', '31', '41', '50plus', '36', 'below21'],
      dtype=object)

In [175]:
below30 = ['21', '26', 'below21']
over30 = ['46', '31', '41', '50plus', '36']

In [177]:
below30_acceptance = len(hrd.query('age in @below30 & Y == 1'))/len(hrd.query('age in @below30'))
over30_acceptance = len(hrd.query('age in @over30 & Y == 1'))/len(hrd.query('age in @over30'))

In [178]:
print("Acceptance rate for high-end restaurants among those below 30 is " + str(below30_acceptance))
print("Acceptance rate for high-end restaurants among those above 30 is " + str(over30_acceptance))

Acceptance rate for high-end restaurants among those below 30 is 0.46895424836601307
Acceptance rate for high-end restaurants among those above 30 is 0.42857142857142855


Compare the acceptance rate of those above 50 to everyone else?

In [179]:
below50_acceptance = len(hrd.query('age != "50plus" & Y == 1'))/len(hrd.query('age != "50plus"'))
over50_acceptance = len(hrd.query('age == "50plus" & Y == 1'))/len(hrd.query('age == "50plus"'))

In [180]:
print("Acceptance rate for high-end restaurants among those below 50 is " + str(below50_acceptance))
print("Acceptance rate for high-end restaurants among those above 50 is " + str(over50_acceptance))

Acceptance rate for high-end restaurants among those below 50 is 0.46256239600665555
Acceptance rate for high-end restaurants among those above 50 is 0.35348837209302325


In [200]:
px.histogram(hrd['age'], color = hrd['Y'])

Compare the acceptance rate of those making over 75k to those making less than 75k?

### The income data didn't follow a logical trend. Thus, it won't be a good candidate for determining the acceptance rate

In [181]:
hrd['income'].unique()

array(['$62500 - $74999', '$12500 - $24999', '$75000 - $87499',
       '$50000 - $62499', '$37500 - $49999', '$25000 - $37499',
       '$100000 or More', '$87500 - $99999', 'Less than $12500'],
      dtype=object)

In [197]:
above75k = ['$75000 - $87499','$87500 - $99999','$100000 or More', '$87500 - $99999']

In [198]:
below75k_acceptance = len(hrd.query('income not in @above75k & Y == 1'))/len(hrd.query('income not in @above75k'))
over75k_acceptance = len(hrd.query('income in @above75k & Y == 1'))/len(hrd.query('income in @above75k'))

In [199]:
print("Acceptance rate for high-end restaurants among those making below 75k is " + str(below75k_acceptance))
print("Acceptance rate for high-end restaurants among thoose making above 75k is " + str(over75k_acceptance))

Acceptance rate for high-end restaurants among those making below 75k is 0.4388560157790927
Acceptance rate for high-end restaurants among thoose making above 75k is 0.4640198511166253


In [196]:
px.histogram(hrd['income'], color = hrd['Y'])

Compare the acceptance rate of those who have kids and without kids?

### Accpetance rate is higher among those who don't have kids

In [207]:
have_kids_acceptance = len(hrd.query('has_children == 1 and Y ==1'))/len(hrd.query('has_children == 1'))
no_kids_acceptance = len(hrd.query('has_children == 0 and Y ==1'))/len(hrd.query('has_children == 0'))
have_kids_acceptance

0.4078303425774878

In [209]:
print ("Acceptance rate of high-end restaurants for those with kids is " + str(have_kids_acceptance))
print ("Acceptance rate of high-end restaurants for those with no kids is " + str(no_kids_acceptance))


Acceptance rate of high-end restaurants for those with kids is 0.4078303425774878
Acceptance rate of high-end restaurants for those with no kids is 0.47512437810945274


How are the acceptance rate different depending on how often the driver goes to high-end restaurants?

### As expected, those who go out during the month have higher acceptance rate than those who never go.

In [211]:
px.histogram(hrd["Restaurant20To50"], color = hrd['Y'])

In [213]:
How are the acceptance rate different depending on how often the driver goes to lower-end restaurants?

Object `restaurants` not found.


In [None]:
How are the acceptance rate different depending on how often the driver goes to lower-end restaurants

### There was not an evident trend in here. Thus, this information is not suited to determine the acceptance rate

In [212]:
px.histogram(hrd["RestaurantLessThan20"], color = hrd['Y'])

What difference education makes in accepting the coupons?

### there doesn't appear to be a trend in here, and thus, education is not a good criteria to determine the acceptance rate.

In [214]:
px.histogram(hrd["education"], color = hrd['Y'])

### Based on all the criteria that was examined, customers who are not divorced or widowed, are male, below thirty and go out at least once a month are best candidates for this coupon. Applying these criteria increases the accpetance rate from 44% to 50+%.

Find the acceptance rate for people who are not divorced or widowed, are male, below 30 and have gone out at least once in the past month

In [None]:
["Married partner", "Unmarried partner", "Single"]
below30 = ['21', '26', 'below21']

In [219]:
len(hrd.query('has_children == 0 & maritalStatus in ["Married partner", "Unmarried partner", "Single"] and gender == "Male" and age in @below30 and Restaurant20To50 != "never" & Y == 1'))/len(hrd.query('has_children == 0 & maritalStatus in ["Married partner", "Unmarried partner", "Single"] and gender == "Male" and age in @below30 and Restaurant20To50 != "never"'))

0.5058823529411764