### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [469]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import plotly.express as px

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [470]:
data = pd.read_csv('data/coupons.csv')

In [471]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 30)
data.head(10)

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,car,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0
5,No Urgent Place,Friend(s),Sunny,80,6PM,Restaurant(<20),2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
6,No Urgent Place,Friend(s),Sunny,55,2PM,Carry out & Take away,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
7,No Urgent Place,Kid(s),Sunny,80,10AM,Restaurant(<20),2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
8,No Urgent Place,Kid(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
9,No Urgent Place,Kid(s),Sunny,80,10AM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0


In [472]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar                   12577 non-null

2. Investigate the dataset for missing or problematic data.

In [473]:
data.isna().sum()

destination                 0
passanger                   0
weather                     0
temperature                 0
time                        0
coupon                      0
expiration                  0
gender                      0
age                         0
maritalStatus               0
has_children                0
education                   0
occupation                  0
income                      0
car                     12576
Bar                       107
CoffeeHouse               217
CarryAway                 151
RestaurantLessThan20      130
Restaurant20To50          189
toCoupon_GEQ5min            0
toCoupon_GEQ15min           0
toCoupon_GEQ25min           0
direction_same              0
direction_opp               0
Y                           0
dtype: int64

In [474]:
data['car'].value_counts(dropna=False)

car
NaN                                         12576
Scooter and motorcycle                         22
Mazda5                                         22
do not drive                                   22
crossover                                      21
Car that is too old to install Onstar :D       21
Name: count, dtype: int64

3. Decide what to do about your missing data -- drop, replace, other...

In [475]:
# the car column is almost entirely NaN, it is useless
clean_data = data.drop(columns=['car'])

In [476]:
clean_data['Bar'].value_counts()

Bar
never    5197
less1    3482
1~3      2473
4~8      1076
gt8       349
Name: count, dtype: int64

I'm seeing the value 'less than 1' in a lot of these columns, which would seem to be the same as zero. But it is consistently used enough that I'm assuming they are different, so I'm not going to do any combining. I'm going to assume this is a per-week average (it isn't clear), which would mean that 'less than 1' means once every two weeks or three weeks, etc.

In [477]:
clean_data['Bar'] = clean_data['Bar'].str.replace('gt8', '>8')
clean_data['Bar'] = clean_data['Bar'].str.replace('never', '0')
clean_data['Bar'] = clean_data['Bar'].str.replace('less1', '<1')
clean_data['Bar'] = clean_data['Bar'].str.replace('1~3', '1-3')
clean_data['Bar'] = clean_data['Bar'].str.replace('4~8', '4-8')
clean_data['Bar'].value_counts()

Bar
0      5197
<1     3482
1-3    2473
4-8    1076
>8      349
Name: count, dtype: int64

In [478]:
clean_data['CoffeeHouse'].value_counts()

CoffeeHouse
less1    3385
1~3      3225
never    2962
4~8      1784
gt8      1111
Name: count, dtype: int64

In [479]:
ch = "CoffeeHouse"

clean_data[ch] = clean_data[ch].str.replace('gt8', '>8')
clean_data[ch] = clean_data[ch].str.replace('never', '0')
clean_data[ch] = clean_data[ch].str.replace('less1', '<1')
clean_data[ch] = clean_data[ch].str.replace('1~3', '1-3')
clean_data[ch] = clean_data[ch].str.replace('4~8', '4-8')
clean_data[ch].value_counts()

CoffeeHouse
<1     3385
1-3    3225
0      2962
4-8    1784
>8     1111
Name: count, dtype: int64

In [480]:
clean_data['age'].value_counts()

age
21         2653
26         2559
31         2039
50plus     1788
36         1319
41         1093
46          686
below21     547
Name: count, dtype: int64

In [481]:
# DO NOT RUN THIS TWICE

clean_data['age'] = clean_data['age'].str.replace('21', '21-25')
clean_data['age'] = clean_data['age'].str.replace('26', '26-30')
clean_data['age'] = clean_data['age'].str.replace('31', '31-35')
clean_data['age'] = clean_data['age'].str.replace('36', '36-40')
clean_data['age'] = clean_data['age'].str.replace('41', '41-45')
clean_data['age'] = clean_data['age'].str.replace('46', '46-50')
clean_data['age'] = clean_data['age'].str.replace('50plus', '>50')
clean_data['age'] = clean_data['age'].str.replace('below21-25', '<21') #hack fix
clean_data['age'].value_counts()

age
21-25    2653
26-30    2559
31-35    2039
>50      1788
36-40    1319
41-45    1093
46-50     686
<21       547
Name: count, dtype: int64

In [482]:
# correct the spelling of passenger
clean_data = clean_data.rename(columns = {'passanger' : 'passenger'})

4. What proportion of the total observations chose to accept the coupon? 



In [483]:
data.query('Y == 1')['Y'].count() / data['Y'].count()

0.5684326710816777

5. Use a bar plot to visualize the `coupon` column.

In [502]:
coupon = data['coupon'].value_counts()
px.bar(coupon, y='count', title = 'Coupons Offered')

6. Use a histogram to visualize the temperature column.

In [504]:
px.histogram(clean_data, x='temperature', color='Y', title = 'Temperature at time of coupon offer')

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [486]:
barcoupon_data = clean_data.query('coupon == "Bar"')
barcoupon_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2017 entries, 9 to 12682
Data columns (total 25 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           2017 non-null   object
 1   passenger             2017 non-null   object
 2   weather               2017 non-null   object
 3   temperature           2017 non-null   int64 
 4   time                  2017 non-null   object
 5   coupon                2017 non-null   object
 6   expiration            2017 non-null   object
 7   gender                2017 non-null   object
 8   age                   2017 non-null   object
 9   maritalStatus         2017 non-null   object
 10  has_children          2017 non-null   int64 
 11  education             2017 non-null   object
 12  occupation            2017 non-null   object
 13  income                2017 non-null   object
 14  Bar                   1996 non-null   object
 15  CoffeeHouse           1978 non-null   obje

2. What proportion of bar coupons were accepted?


In [487]:
print(barcoupon_data.query('Y == 1')['Y'].count() / barcoupon_data['Y'].count())

# same thing, double checking
barcoupon_data.groupby('coupon')[['Y']].mean()

0.41001487357461575


Unnamed: 0_level_0,Y
coupon,Unnamed: 1_level_1
Bar,0.410015


3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [488]:
# first, drop nan values from bar column
df = barcoupon_data.dropna(subset = ['Bar'])

three_or_fewer = ['0', '<1', '1-3']
greater_than_three = ['4-8', '>8']

lte_3 = df.query('Bar in @three_or_fewer')[['Y']]
gt_3 = df.query('Bar in @greater_than_three')[['Y']]

print(lte_3.mean()['Y'])
print(gt_3.mean()['Y'])

0.37061769616026713
0.7688442211055276


Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [489]:
# drop nan from both Bar and age for this comparison
df = barcoupon_data.dropna(subset = ['Bar', 'age'])

more_than_once = ['1-3', '4-8', '>8']
over_25 = ['26-30', '31-35', '36-40', '41-45', '46-50', '>50']

more_than_once_and_over_25 = df.query('Bar in @more_than_once & age in @over_25')[['Y']]
others = df.query('Bar not in @more_than_once | age not in @over_25')[['Y']]

print(more_than_once_and_over_25.mean()['Y'])
print(others.mean()['Y'])

0.6952380952380952
0.3343908629441624


Yes, there is a difference. Rougly 70% acceptance versus 39%

5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


In [490]:
df = barcoupon_data.dropna(subset = ['Bar', 'passenger'])

kid_or_alone = ['Kid(s)', 'Alone']

condition1 = barcoupon_data.query('Bar in @more_than_once & passenger not in @kid_or_alone & occupation != "Farming Fishing & Forestry"')[['Y']]
others = barcoupon_data.query('Bar not in @more_than_once | passenger in @kid_or_alone | occupation == "Farming Fishing & Forestry"')[['Y']]

print(condition1.mean()['Y'])
print(others.mean()['Y'])

0.717948717948718
0.3770581778265642


6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



In [491]:
# I genuinely don't know what this question is asking me to do

7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

Based on these few comparisons, it seems that drivers over 25, drivers without children, and drivers who go to bars frequently are far more likely to accept a coupon to a bar.

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

I'm going to focus my analysis on the 'Coffee House' coupon group

In [492]:
coffee_coupon_data = clean_data.query('coupon == "Coffee House"').dropna(subset=['CoffeeHouse'])
coffee_coupon_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3924 entries, 1 to 12681
Data columns (total 25 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           3924 non-null   object
 1   passenger             3924 non-null   object
 2   weather               3924 non-null   object
 3   temperature           3924 non-null   int64 
 4   time                  3924 non-null   object
 5   coupon                3924 non-null   object
 6   expiration            3924 non-null   object
 7   gender                3924 non-null   object
 8   age                   3924 non-null   object
 9   maritalStatus         3924 non-null   object
 10  has_children          3924 non-null   int64 
 11  education             3924 non-null   object
 12  occupation            3924 non-null   object
 13  income                3924 non-null   object
 14  Bar                   3908 non-null   object
 15  CoffeeHouse           3924 non-null   obje

To start, let's start with the most obvious factor, the frequency with which the driver visits coffee houses

In [505]:
fig = px.histogram(coffee_coupon_data, x=ch, color = 'Y', title = 'Histogram of frequency of Coffee House visits')
fig.show()
coffee_coupon_data.groupby(ch)[['Y']].mean()

Unnamed: 0_level_0,Y
CoffeeHouse,Unnamed: 1_level_1
0,0.188781
1-3,0.647793
4-8,0.685874
<1,0.48186
>8,0.657895


As expected the frequency with which a driver visits a coffee house is correlated to their accepting the coupon. It would seem drivers who go 1 or more times a month are more likely to go


In [495]:
once_or_more = ['1-3', '4-8', '>8']

once_or_more_df = coffee_coupon_data.query('CoffeeHouse in @once_or_more')[['Y']]
less_than_once_df = coffee_coupon_data.query('CoffeeHouse not in @once_or_more')[['Y']]

print(once_or_more_df.mean()['Y'])
print(less_than_once_df.mean()['Y'])

0.6602497398543185
0.34615384615384615


Indeed, drivers who go once or more a month are much more likely to accept the coupon.

I'm a it skeptical that we will know this information for most drivers though, so I'm going to change gears a bit focus exclusively on attributes that we are essentially guaranteed to know accurately at the time of sending the coupon:

expiration
weather
time
temperature

In [506]:
fig = px.histogram(coffee_coupon_data, x='expiration', color = 'Y', title = 'Histogram of coupon expiration dates')
fig.show()
coffee_coupon_data.groupby('expiration')[['Y']].mean()

Unnamed: 0_level_0,Y
expiration,Unnamed: 1_level_1
1d,0.584721
2h,0.432432


It would seem people are more likely to accept coupons  that last longer, which makes sense. A longer coupon is more usable.

In [507]:
fig = px.histogram(coffee_coupon_data, x='weather', color = 'Y', title = 'Histogram of weather at time of coupon offer')
fig.show()
coffee_coupon_data.groupby('weather')[['Y']].mean()

Unnamed: 0_level_0,Y
weather,Unnamed: 1_level_1
Rainy,0.524664
Snowy,0.431973
Sunny,0.504256


This data is so skewed toward sunny - which is essentialy 50% - that I'm not going to consider it any longer. Hopefully temperature is a better dataset

In [508]:
fig = px.histogram(coffee_coupon_data, x='temperature', color = 'Y', title = 'Histogram of temperature at time of coupon offer')
fig.show()
coffee_coupon_data.groupby('temperature')[['Y']].mean()

Unnamed: 0_level_0,Y
temperature,Unnamed: 1_level_1
30,0.444805
55,0.455706
80,0.530681


This is also a bit skewed, but it does seem that higher temperatures lead to higher acceptance

In [509]:
fig = px.histogram(coffee_coupon_data, x='time', color = 'Y', title = 'Histogram of times of coupon offer')
fig.show()
coffee_coupon_data.groupby('time')[['Y']].mean()

Unnamed: 0_level_0,Y
time,Unnamed: 1_level_1
10AM,0.639456
10PM,0.431507
2PM,0.549422
6PM,0.413471
7AM,0.445676


It would seem that the times around noon have the highest acceptance rates

Let's start fine-tuning our queries here

In [466]:
around_noon = ['10AM', '2PM']

around_noon_df = coffee_coupon_data.query('time in @around_noon')[['Y']]
others = coffee_coupon_data.query('time not in @around_noon')[['Y']]

print(around_noon_df.mean()['Y'])
print(others.mean()['Y'])

0.5972305839855508
0.42863455589924876


This is a pretty clear difference, I'm going to stick with the around noon condition moving forward.

Let's add in temperature

In [467]:
around_noon_and_hot = coffee_coupon_data.query('time in @around_noon & temperature == 80')[['Y']]
around_noon_and_cold = coffee_coupon_data.query('time in @around_noon & temperature != 80')[['Y']]

print(around_noon_and_hot.mean()['Y'])
print(around_noon_and_cold.mean()['Y'])

0.5934618608549874
0.6068376068376068


Surprisingly, even though hot temperatures overall have better acceptance rates, when its around noon, cold temperatures have slightly higher rates. The difference is basically negligible though, so I'm going to put this aside for now

Let's look at expiration times now

In [468]:
around_noon_and_1_day = coffee_coupon_data.query('time in @around_noon and expiration == "1d"')[['Y']]
around_noon_and_2_hours = coffee_coupon_data.query('time in @around_noon and expiration == "2h"')[['Y']]

print(around_noon_and_1_day.mean()['Y'])
print(around_noon_and_2_hours.mean()['Y'])

0.6202365308804205
0.5777777777777777


1 day expirations perform marginally better, as expected, but again, barely.

I'm going to look at the four combinations now:
- around noon and
    - hot and 1 day
    - hot and 2 hours
    - cold and 1 day
    - cold and 2 hours

In [428]:
around_noon_hot_1_day = coffee_coupon_data.query('time in @around_noon & temperature == 80 & expiration == "1d"')[['Y']]
around_noon_hot_2_hours = coffee_coupon_data.query('time in @around_noon & temperature == 80 & expiration == "2h"')[['Y']]
around_noon_cold_1_day = coffee_coupon_data.query('time in @around_noon & temperature != 80 & expiration == "1d"')[['Y']]
around_noon_cold_2_hours = coffee_coupon_data.query('time in @around_noon & temperature != 80 & expiration == "2h"')[['Y']]
                                                 
print("around noon + hot + 1 day: ", around_noon_hot_1_day.mean()['Y'])
print("around noon + hot + 2 hours: ", around_noon_hot_2_hours.mean()['Y'])
print("around noon + cold + 1 day: ", around_noon_cold_1_day.mean()['Y'])
print("around noon + cold + 2 hours: ", around_noon_cold_2_hours.mean()['Y'])

around noon + hot + 1 day:  0.5951903807615231
around noon + hot + 2 hours:  0.5857142857142857
around noon + cold + 1 day:  0.6628787878787878
around noon + cold + 2 hours:  0.5362318840579711


It is clear that the best performing scenario is when it is around noon, cold, and the coupon expires in 1 day. The difference between 66% and 59% or even 53% isn't a whole lot though. I wouldn't consider any of those numbers high, its not even quite a 2/3 chance in the best scenario.

I'm going to combine these findings now with what I discovered earlier about customers who frequently visit coffee shops

In [501]:
once_or_more = ['1-3', '4-8', '>8']

once_or_more_df = coffee_coupon_data.query('CoffeeHouse in @once_or_more')

around_noon_hot_1_day = once_or_more_df.query('time in @around_noon & temperature == 80 & expiration == "1d"')[['Y']]
around_noon_hot_2_hours = once_or_more_df.query('time in @around_noon & temperature == 80 & expiration == "2h"')[['Y']]
around_noon_cold_1_day = once_or_more_df.query('time in @around_noon & temperature != 80 & expiration == "1d"')[['Y']]
around_noon_cold_2_hours = once_or_more_df.query('time in @around_noon & temperature != 80 & expiration == "2h"')[['Y']]
                                                 
print("once or more alone: ", once_or_more_df[['Y']].mean()['Y'])    
print("once or more: around noon + hot + 1 day: ", around_noon_hot_1_day.mean()['Y'])
print("once or more: around noon + hot + 2 hours: ", around_noon_hot_2_hours.mean()['Y'])
print("once or more: around noon + cold + 1 day: ", around_noon_cold_1_day.mean()['Y'])
print("once or more: around noon + cold + 2 hours: ", around_noon_cold_2_hours.mean()['Y'])

once or more alone:  0.6602497398543185
once or more: around noon + hot + 1 day:  0.7943548387096774
once or more: around noon + hot + 2 hours:  0.7604790419161677
once or more: around noon + cold + 1 day:  0.8536585365853658
once or more: around noon + cold + 2 hours:  0.6792452830188679


Adding in this data makes all of the acceptance rates compared jump quite a bit. Drivers who go to coffee shops once or more a month are fairly likely to accept a coupon, but more likely to accept one around noon. And even more likely when that coupon expires in a day versus 2 hours, and when it is cold. 

If we in fact know the frequency with which the driver visits coffee houses, we are very likely to have success offering 1 day coupons to drivers, and even more likely if we target specific times and temperatures. I would recommend this program if we can accurately know this information.

Absent the information about the frequency of coffee house visits, I would be much more hesitant to recommend the program. I'd have to look at the cost and benefits of the program very carefully before making a recommendation, but my initial position would be that offering coupons for coffee houses probably isn't worth it if we don't know anything about the customer's coffee house visiting habits.

As a final note, given how many different combinations there are on even a relatively small dataset such as this, the utility of machine learning algorithms that can quickly analyze large datasets is becoming overwhelmingly apparent.