### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [1112]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [1113]:
data = pd.read_csv('data/coupons.csv')

1.1 Analyze the table using head and info commands.

In [1114]:
data.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,car,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0


In [1115]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar                   12577 non-null

2. Investigate the dataset for missing or problematic data.

In [1116]:
data.isna().sum()

destination                 0
passanger                   0
weather                     0
temperature                 0
time                        0
coupon                      0
expiration                  0
gender                      0
age                         0
maritalStatus               0
has_children                0
education                   0
occupation                  0
income                      0
car                     12576
Bar                       107
CoffeeHouse               217
CarryAway                 151
RestaurantLessThan20      130
Restaurant20To50          189
toCoupon_GEQ5min            0
toCoupon_GEQ15min           0
toCoupon_GEQ25min           0
direction_same              0
direction_opp               0
Y                           0
dtype: int64

3. Decide what to do about your missing data -- drop, replace, other...

The car column has a lot of missing data. Let us drop the column.

In [1117]:
data_cleaned =  data.drop('car',axis = 1)

The other columns with missing data do not have a 
lot of missing values. Let us fill missing values with the most common value.

In [1118]:
data_cleaned.Bar.value_counts()


never    5197
less1    3482
1~3      2473
4~8      1076
gt8       349
Name: Bar, dtype: int64

In [1119]:
data_cleaned.Bar.fillna(value = 'never', inplace = True)

In [1120]:
data_cleaned.CarryAway.value_counts()

1~3      4672
4~8      4258
less1    1856
gt8      1594
never     153
Name: CarryAway, dtype: int64

In [1121]:
data_cleaned.CarryAway.fillna(value = '1~3', inplace = True)

In [1122]:
data_cleaned.CoffeeHouse.value_counts()

less1    3385
1~3      3225
never    2962
4~8      1784
gt8      1111
Name: CoffeeHouse, dtype: int64

In [1123]:
data_cleaned.CoffeeHouse.fillna(value = 'less1', inplace = True)

In [1124]:
data_cleaned.RestaurantLessThan20.value_counts()

1~3      5376
4~8      3580
less1    2093
gt8      1285
never     220
Name: RestaurantLessThan20, dtype: int64

In [1125]:
data_cleaned.RestaurantLessThan20.fillna(value = '1~3', inplace = True)

In [1126]:
data_cleaned.Restaurant20To50.value_counts()

less1    6077
1~3      3290
never    2136
4~8       728
gt8       264
Name: Restaurant20To50, dtype: int64

In [1127]:
data_cleaned.Restaurant20To50.fillna(value = 'less1', inplace = True)

Validate if the data no longer has any missing data.

In [1128]:
data_cleaned.isna().sum()

destination             0
passanger               0
weather                 0
temperature             0
time                    0
coupon                  0
expiration              0
gender                  0
age                     0
maritalStatus           0
has_children            0
education               0
occupation              0
income                  0
Bar                     0
CoffeeHouse             0
CarryAway               0
RestaurantLessThan20    0
Restaurant20To50        0
toCoupon_GEQ5min        0
toCoupon_GEQ15min       0
toCoupon_GEQ25min       0
direction_same          0
direction_opp           0
Y                       0
dtype: int64

4. What proportion of the total observations chose to accept the coupon? 



In [1129]:
total_acceptance = data_cleaned.loc[data_cleaned['Y'] == 1].shape[0] / data_cleaned.shape[0]
print('Total acceptance of coupons', "{0:.0%}".format(total_acceptance))

Total acceptance of coupons 57%


5. Use a bar plot to visualize the `coupon` column.

In [1130]:
fig = px.histogram(data_cleaned, x='coupon', color="Y",
labels={ "Y": "Coupon Accepted: <br /> 1 == Yes, 0 == No"}, title="Histogram of coupons colored by acceptance")
fig.show()

6. Use a histogram to visualize the temperature column.

6.1 Check unique temperature values to determine bin size

In [1131]:
data_cleaned.temperature.unique()

array([55, 80, 30])

In [1132]:
fig = px.histogram(data_cleaned, x='temperature', color="Y",nbins=50,
labels={ "Y": "Coupon Accepted: <br /> 1 == Yes, 0 == No"}, title="Histogram of temperatures colored by acceptance")
fig.show()

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [1133]:
data_bar_coupons =  data_cleaned.loc[data_cleaned['coupon'] == 'Bar']

2. What proportion of bar coupons were accepted?


In [1134]:
acceptance_rate_bar_coupons = data_bar_coupons.loc[data_bar_coupons['Y'] == 1].shape[0] / data_bar_coupons.shape[0]
print('Acceptance rate of bar coupons ', "{0:.0%}".format(acceptance_rate_bar_coupons))

Acceptance rate of bar coupons  41%


In [1135]:
px.pie(data_bar_coupons['Y'] == 1, names = "Y", 
       title='Acceptance percentage of bar coupons')

3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [1136]:
data_accepted_bar_coupons = data_bar_coupons.loc[data_bar_coupons['Y'] == 1]
total_bar_coupons = data_bar_coupons.shape[0]
total_accepted_bar_coupons = data_bar_coupons.loc[data_bar_coupons['Y'] == 1].shape[0]

data_bar_coupons_lessthan4 = data_bar_coupons.loc[data_bar_coupons['Bar'].isin(['never','less1','1~3'])]

data_accepted_bar_coupons_lessthan4 = data_bar_coupons_lessthan4.loc[data_bar_coupons_lessthan4["Y"] == 1]

total_accepted_bar_coupons_lessthan4 = \
data_accepted_bar_coupons_lessthan4.shape[0] / data_bar_coupons_lessthan4.shape[0]

data_bar_coupons_4ormore = data_bar_coupons.loc[data_bar_coupons['Bar'].isin(['4~8','gt8'])]

data_accepted_bar_coupons_4ormore = data_bar_coupons_4ormore.loc[data_bar_coupons_4ormore['Y'] == 1]

total_accepted_bar_coupons_4ormore = \
data_accepted_bar_coupons_4ormore.shape[0] / data_bar_coupons_4ormore.shape[0]

print('Acceptance rate - went to bar three or fewer times ', "{0:.0%}".format(total_accepted_bar_coupons_lessthan4))
print('Acceptance rate - went to bar more than three times', "{0:.0%}".format(total_accepted_bar_coupons_4ormore))


Acceptance rate - went to bar three or fewer times  37%
Acceptance rate - went to bar more than three times 77%


In [1137]:
branches = ['Accepted' , 'Not Accepted']
ratio_bar_coupons_lessthan4 = data_accepted_bar_coupons_lessthan4.shape[0] / data_bar_coupons_lessthan4.shape[0]
trace1 = go.Bar(
   x = branches,
   y = [ratio_bar_coupons_lessthan4,1-ratio_bar_coupons_lessthan4],
   name = 'Went to bar three or fewer times'
)
ratio_bar_coupons_4ormore = data_accepted_bar_coupons_4ormore.shape[0] / data_bar_coupons_4ormore.shape[0]
trace2 = go.Bar(
   x = branches,
   y =  [ratio_bar_coupons_4ormore, 1-ratio_bar_coupons_4ormore],
   name = 'Went to bar more than three times'
)
data = [trace1, trace2]
layout = go.Layout(barmode = 'group', title = "Compare ratios of bar coupon acceptance by number of visits per month")
fig = go.Figure(data = data, layout = layout)
fig.show()

4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [1138]:
data_bar_coupons_age25_morethanonce = \
data_bar_coupons.loc[data_bar_coupons['Bar'].isin(['gt8','4~8','1~3']) \
& data_bar_coupons['age'].isin(['46','26','31','50plus','41','36'])]

data_accepted_bar_coupons_age25_morethanonce= \
    data_bar_coupons_age25_morethanonce.loc[data_bar_coupons_age25_morethanonce['Y'] == 1]

total_accepted_bar_coupons_age25_morethanonce = \
data_accepted_bar_coupons_age25_morethanonce.shape[0] / data_bar_coupons_age25_morethanonce.shape[0]

data_not_bar_coupons_age25_morethanonce =  \
    data_bar_coupons.loc[data_bar_coupons.index.difference(data_bar_coupons_age25_morethanonce.index)]

data_accepted_not_bar_coupons_age25_morethanonce= \
    data_not_bar_coupons_age25_morethanonce.loc[data_not_bar_coupons_age25_morethanonce['Y'] == 1]

total_accepted_not_bar_coupons_age25_morethanonce = \
data_accepted_not_bar_coupons_age25_morethanonce.shape[0] / data_not_bar_coupons_age25_morethanonce.shape[0]

print('Acceptance rate - went to bar more than once & age over 25 ',
 "{0:.0%}".format(total_accepted_bar_coupons_age25_morethanonce))
print('Everyone Else ',
 "{0:.0%}".format(total_accepted_not_bar_coupons_age25_morethanonce))


Acceptance rate - went to bar more than once & age over 25  70%
Everyone Else  34%


In [1139]:
branches = ['Accepted' , 'Not Accepted']
ratio_bar_coupons_age25 = data_accepted_bar_coupons_age25_morethanonce.shape[0] / data_bar_coupons_age25_morethanonce.shape[0]
trace1 = go.Bar(
   x = branches,
   y = [ratio_bar_coupons_age25,1-ratio_bar_coupons_age25],
   name = 'Age 25 and visited more than once'
)
ratio_bar_coupons_not_age25 = data_accepted_not_bar_coupons_age25_morethanonce.shape[0] / data_not_bar_coupons_age25_morethanonce.shape[0]
trace2 = go.Bar(
   x = branches,
   y =  [ratio_bar_coupons_not_age25, 1-ratio_bar_coupons_not_age25],
   name = 'Everyone else'
)
data = [trace1, trace2]
layout = go.Layout(barmode = 'group', title = "Compare ratios of bar coupon acceptance by specific age and visit count")
fig = go.Figure(data = data, layout = layout)
fig.show()

5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


In [1140]:
data_bar_coupons_notkid = \
data_accepted_bar_coupons.loc[data_bar_coupons['Bar'].isin(['gt8','4~8','1~3']) \
& ~(data_bar_coupons['passanger'].isin(['Kid(s)']) | data_bar_coupons['occupation'].isin(['Farming Fishing & Forestry']))]

data_accepted_bar_coupons_notkid= \
    data_bar_coupons_notkid.loc[data_bar_coupons_notkid['Y'] == 1]

total_accepted_bar_coupons_notkid = \
data_accepted_bar_coupons_notkid.shape[0] / data_bar_coupons_notkid.shape[0]

data_not_bar_coupons_notkid =  \
    data_bar_coupons.loc[data_bar_coupons.index.difference(data_bar_coupons_notkid.index)]

data_accepted_not_bar_coupons_notkid= \
    data_not_bar_coupons_notkid.loc[data_not_bar_coupons_notkid['Y'] == 1]

total_accepted_not_bar_coupons_notkid = \
data_accepted_not_bar_coupons_notkid.shape[0] / data_not_bar_coupons_notkid.shape[0]

print('Acceptance rate - Group of users who went to bar more than once, did not have a kid as a passenger and \
occupation was not farming/fishing/forestery ',
 "{0:.0%}".format(total_accepted_bar_coupons_notkid))
print('Everyone Else ',
 "{0:.0%}".format(total_accepted_not_bar_coupons_notkid))

Acceptance rate - Group of users who went to bar more than once, did not have a kid as a passenger and occupation was not farming/fishing/forestery  100%
Everyone Else  27%


In [1141]:
branches = ['Accepted' , 'Not Accepted']
ratio_bar_coupons_notkid = data_accepted_bar_coupons_notkid.shape[0] / data_bar_coupons_notkid.shape[0]
trace1 = go.Bar(
   x = branches,
   y = [ratio_bar_coupons_notkid,1-ratio_bar_coupons_notkid],
   name = 'Went to bar more than once, <br \> did not have a kid as a passenger and <br \>\
occupation was not farming/fishing/forestery'
)
ratio_not_bar_coupons_notkid = data_accepted_not_bar_coupons_notkid.shape[0] / data_not_bar_coupons_notkid.shape[0]
trace2 = go.Bar(
   x = branches,
   y =  [ratio_not_bar_coupons_notkid, 1-ratio_not_bar_coupons_notkid],
   name = 'Everyone else'
)
data = [trace1, trace2]
layout = go.Layout(barmode = 'group', title = "Compare ratios of bar coupon acceptance by specific age and visit count")
fig = go.Figure(data = data, layout = layout)
fig.show()

6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



In [1142]:
data_bar_coupons_notkid_notwidowed = \
data_bar_coupons.loc[data_bar_coupons['Bar'].isin(['gt8','4~8','1~3']) \
& ~data_bar_coupons['passanger'].isin(['Kid(s)']) \
& ~data_bar_coupons['maritalStatus'].isin(['Widowed'])]

data_accepted_bar_coupons_notkid_notwidowed= \
    data_bar_coupons_notkid_notwidowed.loc[data_bar_coupons_notkid_notwidowed['Y'] == 1]

print('Acceptance rate - went to bar more than once & passenger not kid & not widowed ',
 "{0:.0%}".format(data_accepted_bar_coupons_notkid_notwidowed.shape[0]/data_bar_coupons_notkid_notwidowed.shape[0]))

data_bar_coupons_agelessthan30 = \
data_bar_coupons.loc[data_bar_coupons['Bar'].isin(['gt8','4~8','1~3']) \
& data_bar_coupons['age'].isin(['below21','21','26'])]

data_accepted_bar_coupons_agelessthan30= \
    data_bar_coupons_agelessthan30.loc[data_bar_coupons_agelessthan30['Y'] == 1]

print('Acceptance rate - went to bar more than once & age less than 30 ',
 "{0:.0%}".format(data_accepted_bar_coupons_agelessthan30.shape[0]/data_bar_coupons_agelessthan30.shape[0]))

data_bar_coupons_incomelessthan50 = \
data_bar_coupons.loc[data_bar_coupons['income'] \
    .isin(['Less than $12500','$12500 - $24999','$25000 - $37499','$37500 - $49999']) \
& data_bar_coupons['RestaurantLessThan20'].isin(['4~8','gt8'])]

data_accepted_bar_coupons_incomelessthan50= \
    data_bar_coupons_incomelessthan50.loc[data_bar_coupons_incomelessthan50['Y'] == 1]

print('Acceptance rate - went to cheap resturants and income less than 50k ',
 "{0:.0%}".format(data_accepted_bar_coupons_incomelessthan50.shape[0]/data_bar_coupons_incomelessthan50.shape[0]))

Acceptance rate - went to bar more than once & passenger not kid & not widowed  71%
Acceptance rate - went to bar more than once & age less than 30  72%
Acceptance rate - went to cheap resturants and income less than 50k  45%


In [1143]:
branches = ['Accepted' , 'Not Accepted']
ratio_notkid_notwidowed = data_accepted_bar_coupons_notkid_notwidowed.shape[0] / data_bar_coupons_notkid_notwidowed.shape[0]
trace1 = go.Bar(
   x = branches,
   y = [ratio_notkid_notwidowed,1-ratio_notkid_notwidowed],
   name = 'More than once, no kid and not widowed'
)
ratio_less30 = data_accepted_bar_coupons_agelessthan30.shape[0] / data_bar_coupons_agelessthan30.shape[0]
trace2 = go.Bar(
   x = branches,
   y =  [ratio_less30, 1-ratio_less30],
   name = 'More than once, age under 30'
)
ratio_incomeless50 = data_accepted_bar_coupons_incomelessthan50.shape[0] / data_bar_coupons_incomelessthan50.shape[0]
trace3 = go.Bar(
   x = branches,
   y =  [ratio_incomeless50, 1-ratio_incomeless50],
   name = 'Cheap res. more than 4, income under 50'
)
data = [trace1, trace2, trace3]
layout = go.Layout(barmode = 'group', title = "Compare ratios of three different coupon acceptance criteria")
fig = go.Figure(data = data, layout = layout)
fig.show()

7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

Below are key observations for the bar coupon group
 - Bar coupons acceptance rate was at 41%
 - Bar coupon acceptance rate was higher for 
    those who went to bar three of few times than those who went more. ( 77% vs 37%)
 - Bar coupon acceptance rate was higher for group of users over 25 and who visited the bar more than once
   ( 70% vs 34% )
 - Bar coupon acceptance rate for group of users who went to bar more than once, did not have a kid as a    passenger  and occupation was not farming/fishing/forestery was 100% vs everyone else at 27%
 - Bar coupon acceptance rate for users who went to bar more than once & passenger not kid & not widowed was 71%
 - Bar coupon acceptance rate for users who went to bar more than once & less than 30 was 72%
 - Bar coupon acceptance rate for users who went  to cheap resturants and had income less than 50k was 45%

 Based on the observations, we hypothesize the following 

 Target user groups below for higher success 
 - visit bar more than once a month
 - are over 25 years of age ( also under 30 for more success)
 - do not have kids as a passenger
 - have income more than 50k



 


### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

Let's pick the coffee house coupons to dig further into. The dataset has higher data available for Coffee house than any other coupon group.

In [1144]:
data_cleaned.coupon.value_counts()

Coffee House             3996
Restaurant(<20)          2786
Carry out & Take away    2393
Bar                      2017
Restaurant(20-50)        1492
Name: coupon, dtype: int64

Create a new dataframe that has just Coffee House coupons

In [1145]:
data_coffee_house_coupons =  data_cleaned.loc[data_cleaned['coupon'] == 'Coffee House']

Let us check the acceptance rate of coffee house coupons

In [1146]:
acceptance_rate_coffee_house_coupons = data_coffee_house_coupons.loc[data_coffee_house_coupons['Y'] == 1].shape[0] / data_coffee_house_coupons.shape[0]
print('Acceptance rate of coffee house coupons ', "{0:.0%}".format(acceptance_rate_coffee_house_coupons))

Acceptance rate of coffee house coupons  50%


In [1147]:
px.pie(data_coffee_house_coupons['Y'] == 1, names = "Y", 
       title='Acceptance percentage of coffee house coupons')

Does temperature have an impact on Coffee house coupon acceptance? 

In [1148]:
fig = px.histogram(data_coffee_house_coupons, x='temperature', color="Y",nbins=50,
labels={ "Y": "Coupon Accepted: <br /> 1 == Yes, 0 == No"}, \
    title = "Histogram of temperature colored with coffee house coupon acceptance ")
fig.show()

Does gender have an impact on Coffee house coupon acceptance?

In [1149]:
fig = px.histogram(data_coffee_house_coupons, x='gender', color="Y",nbins=50,
labels={ "Y": "Coupon Accepted: <br /> 1 == Yes, 0 == No"}, 
title = "Histogram of gender colored with coffee house coupon acceptance ")
fig.show()

Check if martial status impacts acceptance of coffee house coupons.

In [1150]:
fig = px.histogram(data_coffee_house_coupons, x='maritalStatus', color="Y",nbins=50,
labels={ "Y": "Coupon Accepted: <br /> 1 == Yes, 0 == No"}, 
title = "Histogram of marital status colored with coffee house coupon acceptance ")
fig.show()

In [1151]:
data_coffee_house_coupons.CoffeeHouse.unique()

array(['never', 'less1', '4~8', '1~3', 'gt8'], dtype=object)

Compare coffee house coupon acceptance based on people who visit less than 4 times.

In [1152]:
data_accepted_coffee_coupons = data_coffee_house_coupons.loc[data_coffee_house_coupons['Y'] == 1]
total_coffee_coupons = data_accepted_coffee_coupons.shape[0]
total_accepted_bar_coupons = data_bar_coupons.loc[data_bar_coupons['Y'] == 1].shape[0]

data_cofee_coupons_lessthan4 = data_coffee_house_coupons.loc[data_coffee_house_coupons['CoffeeHouse'].isin(['never','less1','1~3'])]

data_accepted_coffee_coupons_lessthan4 = data_cofee_coupons_lessthan4.loc[data_cofee_coupons_lessthan4["Y"] == 1]

total_accepted_coffee_coupons_lessthan4 = \
data_accepted_coffee_coupons_lessthan4.shape[0] / data_cofee_coupons_lessthan4.shape[0]

data_coffee_coupons_4ormore = data_coffee_house_coupons.loc[data_coffee_house_coupons['CoffeeHouse'].isin(['4~8','gt8'])]

data_accepted_coffee_coupons_4ormore = data_coffee_coupons_4ormore.loc[data_coffee_coupons_4ormore['Y'] == 1]

total_accepted_coffee_coupons_4ormore = \
data_accepted_coffee_coupons_4ormore.shape[0] / data_coffee_coupons_4ormore.shape[0]

print('Acceptance rate - went to coffee house three or fewer times ', "{0:.0%}".format(total_accepted_coffee_coupons_lessthan4))
print('Acceptance rate - went to coffee house more than three times', "{0:.0%}".format(total_accepted_coffee_coupons_4ormore))

Acceptance rate - went to coffee house three or fewer times  45%
Acceptance rate - went to coffee house more than three times 68%


Plot graph to visually compare marital status / income group combination with coupon acceptance  

In [1153]:
data_coffee_house_coupons = data_coffee_house_coupons.copy()
data_coffee_house_coupons['maritalincomegroup'] = data_coffee_house_coupons['maritalStatus'] + ',' + data_coffee_house_coupons["income"]

In [1154]:
fig = px.histogram(data_coffee_house_coupons, x='maritalincomegroup', color="Y",nbins=50,
labels={ "Y": "Coupon Accepted: <br /> 1 == Yes, 0 == No", "maritalincomegroup" : " Marital Status / Income  Combination"}, 
title = "Histogram of marital status / salary combinations colored with coffee house coupon acceptance ")
fig.show()

We can further group the income levels to get a consolidated view

In [1155]:
def consolidate_income_level(df):
  if df['income'] in ['Less than $12500','$12500 - $24999','$25000 - $37499','$37500 - $49999']:
    return '<50000'
  else:
    return '>50000'
  
data_coffee_house_coupons['maritalincomegroup2'] = data_coffee_house_coupons['maritalStatus'] \
    + ',' + data_coffee_house_coupons.apply(consolidate_income_level, axis=1)

In [1156]:
fig = px.histogram(data_coffee_house_coupons, x='maritalincomegroup2', color="Y",nbins=50,
labels={ "Y": "Coupon Accepted: <br /> 1 == Yes, 0 == No",  "maritalincomegroup2" : " Consolidated Marital Status / Income  Combination"}, 
title = "Histogram of consolidated marital status / salary combinations colored with coffee house coupon acceptance ")
fig.show()

Check impact of Destination on coupon acceptance

In [1157]:
fig = px.histogram(data_coffee_house_coupons, x='destination', color="Y",nbins=50,
labels={ "Y": "Coupon Accepted: <br /> 1 == Yes, 0 == No"}, 
title = "Histogram of marital status colored with coffee house coupon acceptance ")
fig.show()

Check impact of Direction / Destination on coupon acceptance 

In [1158]:
def group_destination_direction(df):
  if df['direction_same'] == 1:
    return 'Same Direction'

  if df['direction_opp'] == 1:
    return 'Opposite Direction'
  
data_coffee_house_coupons['destinationdirectiongroup'] = data_coffee_house_coupons['destination'] \
    + ',' + data_coffee_house_coupons.apply(group_destination_direction, axis=1)

In [1159]:
fig = px.histogram(data_coffee_house_coupons, x='destinationdirectiongroup', color="Y",nbins=50,
labels={ "Y": "Coupon Accepted: <br /> 1 == Yes, 0 == No",  "destinationdirectiongroup" : 
" Destination / Direction  Combination"}, 
title = "Histogram of destination / direction colored with coffee house coupon acceptance ")
fig.show()

Check impact of number of coffee house visits per month to acceptance rate

In [1163]:
data_accepted_coffee_house_coupons = data_coffee_house_coupons.loc[data_coffee_house_coupons['Y'] == 1]
total_accepted_coffee_house_coupons = data_coffee_house_coupons.loc[data_coffee_house_coupons['Y'] == 1].shape[0]

data_coffee_house_coupons_lessthan4 = data_coffee_house_coupons.loc \
[data_coffee_house_coupons['CoffeeHouse'].isin(['never','less1','1~3'])]

data_accepted_coffee_house_coupons_lessthan4 = data_coffee_house_coupons_lessthan4.loc[data_coffee_house_coupons_lessthan4["Y"] == 1]

total_accepted_coffee_house_coupons_lessthan4 = \
data_accepted_coffee_house_coupons_lessthan4.shape[0] / data_coffee_house_coupons_lessthan4.shape[0]

data_coffee_house_coupons_4ormore = data_coffee_house_coupons.loc[data_coffee_house_coupons['CoffeeHouse'].isin(['4~8','gt8'])]

data_accepted_coffee_house_coupons_4ormore = data_coffee_house_coupons_4ormore.loc[data_coffee_house_coupons_4ormore['Y'] == 1]

total_accepted_coffee_house_coupons_4ormore = \
data_accepted_coffee_house_coupons_4ormore.shape[0] / data_coffee_house_coupons_4ormore.shape[0]

print('Acceptance rate - went to bar three or fewer times ', "{0:.0%}".format(total_accepted_bar_coupons_lessthan4))
print('Acceptance rate - went to bar more than three times', "{0:.0%}".format(total_accepted_bar_coupons_4ormore))

Acceptance rate - went to bar three or fewer times  37%
Acceptance rate - went to bar more than three times 77%


In [1164]:
branches = ['Accepted' , 'Not Accepted']
ratio_coffee_house_coupons_lessthan4 = data_accepted_coffee_house_coupons_lessthan4.shape[0] / data_coffee_house_coupons_lessthan4.shape[0]
trace1 = go.Bar(
   x = branches,
   y = [ratio_coffee_house_coupons_lessthan4,1-ratio_coffee_house_coupons_lessthan4],
   name = 'Went to coffee house 3 or fewer times'
)
ratio_coffee_house_coupons_4ormore = data_accepted_coffee_house_coupons_4ormore.shape[0] / data_coffee_house_coupons_4ormore.shape[0]
trace2 = go.Bar(
   x = branches,
   y =  [ratio_coffee_house_coupons_4ormore, 1-ratio_coffee_house_coupons_4ormore],
   name = 'Went to coffee house more than three times'
)
data = [trace1, trace2]
layout = go.Layout(barmode = 'group', title = "Compare ratios of coffee house coupons acceptance by number of visits per month")
fig = go.Figure(data = data, layout = layout)
fig.show()

Below are key observations for the coffee house coupon group
 - Coffee house coupons acceptance rate was at 50.1%
 - Coffee house coupon acceptance rate was higher at higher temperatues but only marginally
 - Gender did not have any significant impact on coffee house coupon acceptance rate
 - Martial status did not have any significant impact on coffee house coupon acceptance rate
 - Direction of travel did not have any significant impact on coffee house coupon acceptance rate
 - Destination has a significant impact on coffee house coupon acceptance rate. No urgent place has a high correlation with acceptance.
 - Salary level did not have a significant impact on coffee house coupon acceptance rate
 - Number of visits per month to the coffee house has an impact on coffee house coupon acceptance rate

 Based on the observations, we hypothesize the following 

 Target user groups below for higher success 
 - visit the coffee house more than three times a month
 - do not have an urgent destination

