### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [85]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import plotly.express as px

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [86]:
data = pd.read_csv('data/coupons.csv')

In [87]:
pd.set_option('display.max_columns', None)

In [88]:
data.head(20)

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,car,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0
5,No Urgent Place,Friend(s),Sunny,80,6PM,Restaurant(<20),2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
6,No Urgent Place,Friend(s),Sunny,55,2PM,Carry out & Take away,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
7,No Urgent Place,Kid(s),Sunny,80,10AM,Restaurant(<20),2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
8,No Urgent Place,Kid(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
9,No Urgent Place,Kid(s),Sunny,80,10AM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0


2. Investigate the dataset for missing or problematic data.

In [89]:
data.describe()

Unnamed: 0,temperature,has_children,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
count,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0
mean,63.301798,0.414144,1.0,0.561495,0.119126,0.214759,0.785241,0.568433
std,19.154486,0.492593,0.0,0.496224,0.32395,0.410671,0.410671,0.495314
min,30.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,55.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
50%,80.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0
75%,80.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0
max,80.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [90]:
data.isnull().sum()

destination                 0
passanger                   0
weather                     0
temperature                 0
time                        0
coupon                      0
expiration                  0
gender                      0
age                         0
maritalStatus               0
has_children                0
education                   0
occupation                  0
income                      0
car                     12576
Bar                       107
CoffeeHouse               217
CarryAway                 151
RestaurantLessThan20      130
Restaurant20To50          189
toCoupon_GEQ5min            0
toCoupon_GEQ15min           0
toCoupon_GEQ25min           0
direction_same              0
direction_opp               0
Y                           0
dtype: int64

Investigated the data and found that car, Bar, CoffeeHouse, CarrayAway, ResturantLessThan20 and Restaurant20To50 have some NaN data. Was tempted to drop those but the comprise of a very big subset of the data so dedciding to keep them. Instead, we will replace the NaN with text NA so that we can perform analysis. There are some cosmetic changes like typo fix for passanger, removal of () from string for passenger column which also we will be performing as part of cleaning. Also, we should convert the age, income and temperature features to numeric for comaprision based analysis which we will be having further down the road. 

3. Decide what to do about your missing data -- drop, replace, other...

In [91]:
data = data.replace(np.nan, 'NA')

In [92]:
data = data.rename(columns = {'passanger' : 'passenger'})

In [93]:
data.loc[data['passenger'] == 'Friend(s)', 'passenger'] = 'Friends'
data.loc[data['passenger'] == 'Kid(s)', 'passenger'] = 'Kids'

In [94]:
data.loc[data['income'] == 'Less than $12500', 'income'] = '12499'
data.loc[data['income'] == '$12500 - $24999', 'income'] = '12500'
data.loc[data['income'] == '$25000 - $37499', 'income'] = '25000'
data.loc[data['income'] == '$37500 - $49999', 'income'] = '37500'
data.loc[data['income'] == '$50000 - $62499', 'income'] = '50000'
data.loc[data['income'] == '$62500 - $74999', 'income'] = '62500'
data.loc[data['income'] == '$75000 - $87499', 'income'] = '75000'
data.loc[data['income'] == '$87500 - $99999', 'income'] = '87500'
data.loc[data['income'] == '$100000 or More', 'income'] = '100000'

data['income'] = pd.to_numeric(data['income'])

In [95]:
data.loc[data['age'] == '50plus', 'age'] = '51'
data.loc[data['age'] == 'below21', 'age'] = '20'

data['age'] = pd.to_numeric(data['age'])

In [96]:
data['temperature'] = pd.to_numeric(data['temperature'])

In [97]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passenger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  int64 
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  int64 
 14  car                   12684 non-null  object
 15  Bar                   12684 non-null

Fixed passenger column name, replaced passenger data, converted age, income, temperature and replaced all NaN with 'NA'

4. What proportion of the total observations chose to accept the coupon? 



In [98]:
accepted_ratio = data.query('Y==1')['Y'].count()/ data['Y'].count()
accepted_ratio

0.5684326710816777

So, from the above calculation it is visible that around 57% of the observations chose to accept the coupon.

5. Use a bar plot to visualize the `coupon` column.

In [99]:
fig = px.bar(data, x='coupon', title='Bar plot for coupons')
fig.update_traces(marker_color='blue')
fig.show()
fig.write_image("images/coupon_bar.png")

6. Use a histogram to visualize the temperature column.

In [100]:
fig = px.histogram(data[['temperature','Y']], color='Y', title = 'Histogram for temperature')
fig.show()
fig.write_image("images/temperature_hist.png")

We can see there are only 3 kinds of temperature available in the data (30F, 50F and 80F).

In [101]:
data.groupby('temperature')['temperature'].count()

temperature
30    2316
55    3840
80    6528
Name: temperature, dtype: int64

Above `groupby` proves that our histogram is correct

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [102]:
bar_coupons = data.query('coupon == "Bar"')
bar_coupons

Unnamed: 0,destination,passenger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,car,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
9,No Urgent Place,Kids,Sunny,80,10AM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,37500,,never,never,,4~8,1~3,1,1,0,0,1,0
13,Home,Alone,Sunny,55,6PM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,37500,,never,never,,4~8,1~3,1,0,0,1,0,1
17,Work,Alone,Sunny,55,7AM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,37500,,never,never,,4~8,1~3,1,1,1,0,1,0
24,No Urgent Place,Friends,Sunny,80,10AM,Bar,1d,Male,21,Single,0,Bachelors degree,Architecture & Engineering,62500,,never,less1,4~8,4~8,less1,1,0,0,0,1,1
35,Home,Alone,Sunny,55,6PM,Bar,1d,Male,21,Single,0,Bachelors degree,Architecture & Engineering,62500,,never,less1,4~8,4~8,less1,1,0,0,1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12663,No Urgent Place,Friends,Sunny,80,10PM,Bar,1d,Male,26,Single,0,Bachelors degree,Sales & Related,75000,,never,never,1~3,4~8,1~3,1,1,0,0,1,0
12664,No Urgent Place,Friends,Sunny,55,10PM,Bar,2h,Male,26,Single,0,Bachelors degree,Sales & Related,75000,,never,never,1~3,4~8,1~3,1,1,0,0,1,0
12667,No Urgent Place,Alone,Rainy,55,10AM,Bar,1d,Male,26,Single,0,Bachelors degree,Sales & Related,75000,,never,never,1~3,4~8,1~3,1,1,0,0,1,0
12670,No Urgent Place,Partner,Rainy,55,6PM,Bar,2h,Male,26,Single,0,Bachelors degree,Sales & Related,75000,,never,never,1~3,4~8,1~3,1,1,0,0,1,0


2. What proportion of bar coupons were accepted?


In [103]:
bar_coupons.query('Y == 1')['Y'].count()/ bar_coupons['Y'].count()

0.41001487357461575

41% of the population accepts bar coupons

3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [104]:
bar_count = ['never','less1','1~3','4~8','gt8']
fig = px.histogram(bar_coupons.query('Bar in @bar_count')[['Bar','Y']], color='Y', title='Histogram for Bar')
fig.show()
fig.write_image("images/bar_coupon_hist1.png")

In [105]:
less_than_eq3 = ['never','less1','1~3']
people_who_went_leq3 = bar_coupons.query('Bar in @less_than_eq3')
acceptance_rate_for_leq3 = people_who_went_leq3.query('Y == 1')['Y'].count()  / bar_coupons.query('Y == 1')['Y'].count()
print(acceptance_rate_for_leq3)

more_than_3 = ['4~8','gt8']
people_who_went_gt3 = bar_coupons.query('Bar in @more_than_3')
acceptance_rate_for_gt3 = people_who_went_gt3.query('Y == 1')['Y'].count()  / bar_coupons.query('Y == 1')['Y'].count()
print(acceptance_rate_for_gt3)


0.8053204353083434
0.18500604594921402


80% of the population accepting coupon comprises of people who visited bar less than 3.

4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [106]:
more_than_once = ['1~3','4~8','gt8']
people_above_25_more_than_once = bar_coupons.query('Bar in @more_than_once and age > 25')
people_above_25_more_than_acceptance_ratio = people_above_25_more_than_once.query('Y == 1')['Y'].count() / bar_coupons.query('Y == 1')['Y'].count()
print(people_above_25_more_than_acceptance_ratio)




0.35308343409915355


We can see in the above analysis that people above 25 comprises of 35% of total population who has accepted the coupon.

5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


In [107]:
more_than_once = ['1~3','4~8','gt8']
passengers = ['Friends','Partner']
occupation_to_exclude = ['Farming Fishing & Forestry']
people_above_25 = bar_coupons.query('Bar in @more_than_once and age > 25 and passenger in @passengers and occupation not in @occupation_to_exclude')
people_above_25_acceptance_ratio = people_above_25.query('Y == 1')['Y'].count() / bar_coupons.query('Y == 1')['Y'].count()
print(people_above_25_acceptance_ratio)





0.1185006045949214


In [109]:
fig= px.violin(people_above_25, y='occupation', title='Violin plot of occupation')
fig.show()
fig.write_image("images/occupation_violin.png")

Only ~12% of the population who has accepted coupon is above 25, went with passengers without kids and not having occupation Farming Fishing & Forestry.

6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



In [110]:
fig =px.histogram(bar_coupons['RestaurantLessThan20'], title='Histogram for cheap restaurants')
fig.show()
fig.write_image("images/bar_coupon_hist2.png")

In [111]:
more_than_once = ['1~3','4~8','gt8']
marital_status_to_be_excluded = ['Widowed']
passengers = ['Friends','Partner']
cheap_restaurant_trips = ['4~8','gt8']
people_visiting_bar = bar_coupons.query('(Bar in @more_than_once and passenger in @passengers and maritalStatus not in @marital_status_to_be_excluded) or (Bar in @more_than_once and age < 30) or (RestaurantLessThan20 in @cheap_restaurant_trips and income < 50000)')
people_visiting_bar_acceptance_ratio = people_visiting_bar.query('Y == 1')['Y'].count() / bar_coupons.query('Y == 1')['Y'].count()
print(people_visiting_bar_acceptance_ratio)

0.4570737605804111


46% of population who accepted coupons comprises of people visiting bar more than once and had passengers other than kids and not widowed or visits bar more than once and age below 30 or vists more than 4 times in cheap restaurants and have less than 50K$ income.

In [112]:
more_than_once = ['1~3','4~8','gt8']
marital_status_to_be_excluded = ['Widowed']
passengers = ['Friends','Partner']
cheap_restaurant_trips = ['4~8','gt8']
group_one = bar_coupons.query('Bar in @more_than_once and passenger in @passengers and maritalStatus not in @marital_status_to_be_excluded')
group_one_acceptance_ratio = group_one.query('Y == 1')['Y'].count() / bar_coupons.query('Y == 1')['Y'].count()
print(group_one_acceptance_ratio)

group_two = bar_coupons.query('Bar in @more_than_once and age < 30')
group_two_acceptance_ratio = group_two.query('Y == 1')['Y'].count() / bar_coupons.query('Y == 1')['Y'].count()
print(group_two_acceptance_ratio)

group_three = bar_coupons.query('RestaurantLessThan20 in @cheap_restaurant_trips and income < 50000')
group_three_acceptance_ratio = group_three.query('Y == 1')['Y'].count() / bar_coupons.query('Y == 1')['Y'].count()
print(group_three_acceptance_ratio)

0.16928657799274485
0.3010882708585248
0.18863361547763


7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

In [119]:
fig = px.scatter(bar_coupons[['age', 'Bar', 'income', 'Y']], x='age', y='Bar', color='Y', title='Scatteplot for age x Bar')
fig.show()
fig.write_image("images/bar_coupon_scatter1.png")

In [120]:
fig = px.histogram(bar_coupons[['age', 'Bar', 'income', 'Y']], x='Bar', color='Y', title = 'Histogram for Bar categorized by Y')
fig.show()
fig.write_image("images/bar_coupon_hist3.png")

We could draw following hypothesis based on above analysis of data
* People of age < 25 generally tend to accept more bar coupons.
* People visitng bar more (>=4) tend to accept more coupons compared to people visiting less (never, 1~3, etc).
* Of course people visitng bar who accepts bar coupons generally don't travel with kids.
* People having income < 50K visiting cheap restaurants more than 4 times accepts bar coupon less.

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

#### Investing Coffee House

As per the histogram, coffee house count is highest in the dataset. Lets what coffee house reveals with respect to the analysis.

First lets create a DF containing only coffee house data

In [121]:
coffee_house_coupons = data.query('coupon == "Coffee House"')
coffee_house_coupons

Unnamed: 0,destination,passenger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,car,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
1,No Urgent Place,Friends,Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,37500,,never,never,,4~8,1~3,1,0,0,0,1,0
3,No Urgent Place,Friends,Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,37500,,never,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friends,Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,37500,,never,never,,4~8,1~3,1,1,0,0,1,0
12,No Urgent Place,Kids,Sunny,55,6PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,37500,,never,never,,4~8,1~3,1,1,0,0,1,1
15,Home,Alone,Sunny,80,6PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,37500,,never,never,,4~8,1~3,1,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12656,Home,Alone,Snowy,30,10PM,Coffee House,2h,Male,31,Married partner,1,Bachelors degree,Business & Financial,100000,,less1,never,4~8,gt8,less1,1,1,0,0,1,0
12659,Work,Alone,Snowy,30,7AM,Coffee House,1d,Male,31,Married partner,1,Bachelors degree,Business & Financial,100000,,less1,never,4~8,gt8,less1,1,0,0,1,0,0
12674,Home,Alone,Rainy,55,10PM,Coffee House,2h,Male,26,Single,0,Bachelors degree,Sales & Related,75000,,never,never,1~3,4~8,1~3,1,0,0,1,0,0
12675,Home,Alone,Snowy,30,10PM,Coffee House,2h,Male,26,Single,0,Bachelors degree,Sales & Related,75000,,never,never,1~3,4~8,1~3,1,1,0,0,1,0


Ratio of coupons accepted in data

In [122]:
coffee_house_coupons.query('Y == 1')['Y'].count()/ coffee_house_coupons['Y'].count()

0.49924924924924924

~50% of the population accepts coffee house coupons

In [123]:
fig = px.histogram(coffee_house_coupons[['CoffeeHouse','Y']], color='Y', title = 'Histgoram for Coffee house categorized by Y')
fig.show()
fig.write_image("images/coffee_house_coupon_hist1.png")

In [124]:
less_than_eq3 = ['never','less1','1~3']
people_who_went_leq3 = coffee_house_coupons.query('CoffeeHouse in @less_than_eq3')
acceptance_rate_for_leq3 = people_who_went_leq3.query('Y == 1')['Y'].count()  / coffee_house_coupons.query('Y == 1')['Y'].count()
print(acceptance_rate_for_leq3)

more_than_3 = ['4~8','gt8']
people_who_went_gt3 = coffee_house_coupons.query('CoffeeHouse in @more_than_3')
acceptance_rate_for_gt3 = people_who_went_gt3.query('Y == 1')['Y'].count()  / coffee_house_coupons.query('Y == 1')['Y'].count()
print(acceptance_rate_for_gt3)

0.6857142857142857
0.29774436090225564


As we can see above, 68% of the population accepted coupons when visiting coffee house <= 3 times in a month. More than 3 times is 29%

Next we would like to see how the age demography classifies the data. Lets see for crowd less than 25 years old and more than 25.

In [125]:
people_more_25 = coffee_house_coupons.query('age >= 25')
people_more_25_acceptance_ratio = people_more_25.query('Y == 1')['Y'].count() / coffee_house_coupons.query('Y == 1')['Y'].count()
print(people_more_25_acceptance_ratio)

people_below_25 = coffee_house_coupons.query('age < 25')
people_below_25_acceptance_ratio = people_below_25.query('Y == 1')['Y'].count() / coffee_house_coupons.query('Y == 1')['Y'].count()
print(people_below_25_acceptance_ratio)

0.7137844611528822
0.2862155388471178


In [127]:
fig = px.violin(coffee_house_coupons, x='CoffeeHouse', y='age', color='Y', title ='Violin plot of CoffeeHouse')
fig.show()
fig.write_image("images/coffee_house_violin1.png")

From the analysis, we could see that people below 71% of the overall population accepting coupons are of people who are >= 25 years.

In [52]:
more_than_once = ['1~3','4~8','gt8']
people_more_than_once = coffee_house_coupons.query('CoffeeHouse in @more_than_once')
people_more_than_acceptance_ratio = people_more_than_once.query('Y == 1')['Y'].count() / coffee_house_coupons.query('Y == 1')['Y'].count()
print(people_more_than_acceptance_ratio)

0.6360902255639098


43% of the crowd who are above 25 and visit coffee house more than once.

In [59]:
fig = px.histogram(people_above_25_more_than_once[['CoffeeHouse','Y', 'age']], x='age', color='Y')
fig.show()
fig.write_image("images/coffee_house_hist2.png")

As we can see in the above histogram, age group below 25 generally tend to accept more coupons when visiting coffee houses more than once. Another interesting thing we can note that irrespective of age group, people tend to accept coupon more than not accepting when visiting coffee houses more than once.

In [84]:
more_than_once = ['1~3','4~8','gt8']
marital_status_to_be_excluded = ['Widowed']
passengers = ['Friends','Partner']
cheap_restaurant_trips = ['4~8','gt8']
people_visiting_bar = coffee_house_coupons.query('(CoffeeHouse in @more_than_once and passenger in @passengers and maritalStatus not in @marital_status_to_be_excluded) or (CoffeeHouse in @more_than_once and age < 30) or (RestaurantLessThan20 in @cheap_restaurant_trips and income < 50000)')
people_visiting_bar_acceptance_ratio = people_visiting_bar.query('Y == 1')['Y'].count() / coffee_house_coupons.query('Y == 1')['Y'].count()
print(people_visiting_bar_acceptance_ratio)

0.5463659147869674


54% of population who accepted coupons comprises of people visiting coffee house more than once and had passengers other than kids and not widowed or visits coffee house more than once and age below 30 or vists more than 4 times in cheap restaurants and have less than 50K$ income.

In [83]:
more_than_once = ['1~3','4~8','gt8']
marital_status_to_be_excluded = ['Widowed']
passengers = ['Friends','Partner']
cheap_restaurant_trips = ['4~8','gt8']
group_one = bar_coupons.query('CoffeeHouse in @more_than_once and passenger in @passengers and maritalStatus not in @marital_status_to_be_excluded')
group_one_acceptance_ratio = group_one.query('Y == 1')['Y'].count() / coffee_house_coupons.query('Y == 1')['Y'].count()
print(group_one_acceptance_ratio)

group_two = bar_coupons.query('CoffeeHouse in @more_than_once and age < 30')
group_two_acceptance_ratio = group_two.query('Y == 1')['Y'].count() / coffee_house_coupons.query('Y == 1')['Y'].count()
print(group_two_acceptance_ratio)

group_three = bar_coupons.query('RestaurantLessThan20 in @cheap_restaurant_trips and income < 50000')
group_three_acceptance_ratio = group_three.query('Y == 1')['Y'].count() / coffee_house_coupons.query('Y == 1')['Y'].count()
print(group_three_acceptance_ratio)

0.07719298245614035
0.12882205513784462
0.07819548872180451


Like above bar coupon analysis, the coffee house coupon is also revealing similar hypothesis:
* Age group below 25 tend to visit coffee houses more and accepts coffee house coupons.
* Acceptance of coupons is higher for visits more than once compared to less visits.
* Of course, share of people visiting with passengers other than kids is higher than people visiting with kids.
* People having income < 50K visiting cheap restaurants more than 4 times accepts bar coupon less.

### Conclusion

So from the above 2 feature analysis, we can say:
* People of age group below 25 tend to accept more coupons.
* Coffee house and bar almost generated same results. Acceptance rates between the 2 is not very far-off.
* Of course, people visiting more than once tend to accept coupon more than visiting less to a place (bar, coffee house, etc.)