### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon? 

### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

#### Overview:
In this first practical application assignment of the program, you will seek to answer the question, “Will a customer accept the coupon?” The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not. Use the Practical Application 1 Jupyter Notebook Links to an external site.to complete this assignment.

#### Data:
This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios, including the destination, current time, weather, passenger, etc., and then asks people whether they will accept the coupon if they are the driver. Answers given that the users will drive there “right away” or “later before the coupon expires” are labeled as “Y = 1”, and answers “no, I do not want the coupon” are labeled as “Y = 0”. There are five different types of coupons—less expensive restaurants (under $20), coffee houses, carry out and take away, bars, and more expensive restaurants ($20–$50).

#### Deliverables:
Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons. To explore the data, you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing GitHub repository as your first portfolio piece.



In [1]:
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.io as pio

# Set global color sequence
pio.templates.default = 'plotly'
pio.templates["plotly"].layout.colorway = px.colors.qualitative.G10

import pandas as pd
pd.set_option('display.max_columns', None)


### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [2]:
data = pd.read_csv('data/coupons.csv')

In [3]:
data.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,car,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0


2. Investigate the dataset for missing or problematic data.

In [4]:
data.isnull().sum()

destination                 0
passanger                   0
weather                     0
temperature                 0
time                        0
coupon                      0
expiration                  0
gender                      0
age                         0
maritalStatus               0
has_children                0
education                   0
occupation                  0
income                      0
car                     12576
Bar                       107
CoffeeHouse               217
CarryAway                 151
RestaurantLessThan20      130
Restaurant20To50          189
toCoupon_GEQ5min            0
toCoupon_GEQ15min           0
toCoupon_GEQ25min           0
direction_same              0
direction_opp               0
Y                           0
dtype: int64

3. Decide what to do about your missing data -- drop, replace, other...

- based on the number of values that are null for the `car` field vs the size of the entire dataset, dropping the `car` column would be the best. 

In [5]:
data['car'] .value_counts()

car
Scooter and motorcycle                      22
Mazda5                                      22
do not drive                                22
crossover                                   21
Car that is too old to install Onstar :D    21
Name: count, dtype: int64

In [6]:
# drop the car column first, it has too many missing values to use dropna
data.drop(labels='car', axis=1, inplace=True)

### Categorical Values
Familiarizing myself with all of the categorical values for the categorical fields using `.value_counts()`

In [7]:
data['Bar'].value_counts()

Bar
never    5197
less1    3482
1~3      2473
4~8      1076
gt8       349
Name: count, dtype: int64

In [8]:
data['CoffeeHouse'].value_counts()

CoffeeHouse
less1    3385
1~3      3225
never    2962
4~8      1784
gt8      1111
Name: count, dtype: int64

In [9]:
data['CarryAway'].value_counts()

CarryAway
1~3      4672
4~8      4258
less1    1856
gt8      1594
never     153
Name: count, dtype: int64

In [10]:
data['RestaurantLessThan20'].value_counts()

RestaurantLessThan20
1~3      5376
4~8      3580
less1    2093
gt8      1285
never     220
Name: count, dtype: int64

In [11]:
data['Restaurant20To50'].value_counts()

Restaurant20To50
less1    6077
1~3      3290
never    2136
4~8       728
gt8       264
Name: count, dtype: int64

### What to do with the missing data
From above we can see that all values are 1 of 5 categorical bins. replacing null values with a random value would skew the data so it's best to drop any null rows in this case.

#### Car Column
Most of the values in the car Column were null or n/a. With less than 200 valid values, in a dataset of 12K+ dropping this column would make no difference.

#### Other missing data
Bar                       107<br>
CoffeeHouse               217<br>
CarryAway                 151<br>
RestaurantLessThan20      130<br>
Restaurant20To50          189<br>

The rest of the null data is in categorical columns used for frequency of visits, so a None or Null type doesn't make sense. Drop these rows.

In [12]:
df = data.dropna().copy()
df.isnull().sum()

destination             0
passanger               0
weather                 0
temperature             0
time                    0
coupon                  0
expiration              0
gender                  0
age                     0
maritalStatus           0
has_children            0
education               0
occupation              0
income                  0
Bar                     0
CoffeeHouse             0
CarryAway               0
RestaurantLessThan20    0
Restaurant20To50        0
toCoupon_GEQ5min        0
toCoupon_GEQ15min       0
toCoupon_GEQ25min       0
direction_same          0
direction_opp           0
Y                       0
dtype: int64

4. What proportion of the total observations chose to accept the coupon? 



In [13]:
proportion_accepted = df.query('Y == 1').shape[0] / df.shape[0]
print(proportion_accepted)

0.5693352098683666


5. Use a bar plot to visualize the `coupon` column.

In [14]:
color_map = {'Accepted': 'green', 'Rejected': 'red'}

df['YCategory'] = df['Y'].map({1: 'Accepted', 0: 'Rejected'})
fig = px.bar(df, x='coupon', color='YCategory', color_discrete_map=color_map, labels={'coupon': 'Coupon Type', 'YCategory': 'Acceptance'}, title='Coupon Acceptance by Type')
fig.update_traces(dict(marker_line_width=0)) # need on large datasets
fig.show()

6. Use a histogram to visualize the temperature column.

In [15]:
figHist = px.histogram(df, x='temperature', color='YCategory', color_discrete_map=color_map, labels={'temperature': 'Temperature (F)'}, title='Bar Coupon Acceptance by Temperature')
figHist.show()

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [16]:
df_bar = df.query('coupon == "Bar"')

2. What proportion of bar coupons were accepted?


In [17]:
print(df_bar.query('Y == 1').shape[0] / df_bar.shape[0])

0.41191845269210664


3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [18]:
lt_list = ['never', 'less1', '1~3']
fewer = df.query('Bar in @lt_list')
greater = df.query('Bar not in @lt_list')
accept_rate_fewer = fewer.query('Y == 1').shape[0] / fewer.shape[0]
accept_rate_greater = greater.query('Y == 1').shape[0] / greater.shape[0]
print("Acceptance rate for fewer: {:.2f}".format(accept_rate_fewer))
print("Acceptance rate for greater: {:.2f}".format(accept_rate_greater))

Acceptance rate for fewer: 0.56
Acceptance rate for greater: 0.62


In [19]:
df['age'].value_counts()

age
21         2537
26         2399
31         1925
50plus     1732
36         1253
41         1065
46          664
below21     504
Name: count, dtype: int64

4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [20]:
less_than_1_list = ['never', 'less1']
over_25_list = ['26', '31', '36', '41', '46', '50plus']
fil = df.query('Bar not in @less_than_1_list and age in @over_25_list')
neg_fil = df.query('Bar in @less_than_1_list and age not in @over_25_list')

accept_rate_fil = fil.query('Y == 1').shape[0] / fil.shape[0]
accept_rate_neg_fil = neg_fil.query('Y == 1').shape[0] / neg_fil.shape[0]
print("Acceptance rate for drivers who go to a bar more than once a month and are over the age of 25: {:.2f}".format(accept_rate_fil))
print("Acceptance rate for all others: {:.2f}".format(accept_rate_neg_fil))

Acceptance rate for drivers who go to a bar more than once a month and are over the age of 25: 0.62
Acceptance rate for all others: 0.59


5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


In [21]:
print(df['passanger'].value_counts(), "\n")
print(df['occupation'].value_counts())

passanger
Alone        6969
Friend(s)    3148
Partner      1024
Kid(s)        938
Name: count, dtype: int64 

occupation
Unemployed                                   1814
Student                                      1497
Computer & Mathematical                      1368
Sales & Related                              1072
Education&Training&Library                    855
Management                                    772
Office & Administrative Support               617
Arts Design Entertainment Sports & Media      564
Business & Financial                          516
Retired                                       473
Food Preparation & Serving Related            276
Healthcare Support                            242
Healthcare Practitioners & Technical          222
Legal                                         219
Community & Social Services                   219
Transportation & Material Moving              218
Protective Service                            175
Architecture & Engineering   

In [22]:
pass_not_kid_list = ['Alone', 'Kid(s)']
fil2 = df.query('Bar not in @less_than_1_list and passanger not in @pass_not_kid_list and occupation != "Farming Fishing & Forestry"')
neg_fil2 = df.query('Bar in @less_than_1_list and passanger in @pass_not_kid_list and occupation == "Farming Fishing & Forestry"')

accept_rate_fil2 = fil2.query('Y == 1').shape[0] / fil2.shape[0]
accept_rate_neg_fil2 = neg_fil2.query('Y == 1').shape[0] / neg_fil2.shape[0]
print("Acceptance rate for drivers who go to a bar more than once a month, had passengers that were not a kid and had occupations other than farming, fishing and forestry: {:.2f}".format(accept_rate_fil2))
print("Acceptance rate for all others: {:.2f}".format(accept_rate_neg_fil2))

Acceptance rate for drivers who go to a bar more than once a month, had passengers that were not a kid and had occupations other than farming, fishing and forestry: 0.69
Acceptance rate for all others: 0.56


6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



In [23]:
print(df['maritalStatus'].value_counts(), "\n")
print(df['RestaurantLessThan20'].value_counts(), "\n")
print(df['income'].value_counts())

maritalStatus
Married partner      4831
Single               4588
Unmarried partner    2048
Divorced              504
Widowed               108
Name: count, dtype: int64 

RestaurantLessThan20
1~3      5163
4~8      3450
less1    2005
gt8      1285
never     176
Name: count, dtype: int64 

income
$25000 - $37499     1919
$12500 - $24999     1728
$100000 or More     1692
$37500 - $49999     1689
$50000 - $62499     1565
Less than $12500    1014
$62500 - $74999      840
$87500 - $99999      818
$75000 - $87499      814
Name: count, dtype: int64


In [24]:
age_under_30_list = ['21', '26', 'below21']
rest_gt_4_list = ['4~8', 'gt8']
inc_less_50_list = ['$37500 - $49999', '$25000 - $37499', '$12500 - $24999', 'Less than $12500']

a61 = df.query('Bar not in @less_than_1_list and passanger not in @pass_not_kid_list and maritalStatus != "Widowed"')
a62 = a61.query('Bar not in @less_than_1_list and age in @age_under_30_list')
a63 = a62.query('RestaurantLessThan20 in @rest_gt_4_list and income in @inc_less_50_list')

accept_rate_a61 = a61.query('Y == 1').shape[0] / a61.shape[0]
accept_rate_a62 = a62.query('Y == 1').shape[0] / a62.shape[0]
accept_rate_a63 = a63.query('Y == 1').shape[0] / a63.shape[0]

print("Acceptance rate for drivers who go to bars more than once a month, had passengers that were not a kid, and were not widowed: {:.2f}".format(accept_rate_a61))
print("Acceptance rate for drivers who go to bars more than once a month and are under the age of 30: {:.2f}".format(accept_rate_a62))
print("Acceptance rate for drivers who go to cheap restaurants more than 4 times a month and income is less than 50K: {:.2f}".format(accept_rate_a63))

Acceptance rate for drivers who go to bars more than once a month, had passengers that were not a kid, and were not widowed: 0.69
Acceptance rate for drivers who go to bars more than once a month and are under the age of 30: 0.70
Acceptance rate for drivers who go to cheap restaurants more than 4 times a month and income is less than 50K: 0.72


In [25]:
fig_barage = px.bar(df.query('coupon == "Bar"'), x='Bar', color='YCategory', color_discrete_map=color_map, labels={'Bar': 'Bar Frequency'}, title='Bar Coupon Acceptance by Frequency')
fig_barage.update_traces(dict(marker_line_width=0)) # need on large datasets
fig_barage.show()

# a = df.query('age == "')

# accept_rate_a61 = a61.query('Y == 1').shape[0] / a61.shape[0]

In [26]:
fig_barage = px.bar(df.query('coupon == "Bar" and Bar != "never"'), x='age', color='YCategory', color_discrete_map=color_map, labels={'age': 'Age Frequency'}, title='Bar Coupon Acceptance by Age')
fig_barage.update_traces(dict(marker_line_width=0)) # need on large datasets
fig_barage.show()

7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

My hypothesis for who accepts bar coupons is based on 2 categories. `age` is one factor. where the highest conversion rates are with ages 21, 26, 31 tapering down the approval rate as you go older. For the age category, I'd target 21-31. The other category was the fequency field. How often they go to bars. 1-3 times a month and up you start seeing a greater than 50% accepted rate. In this case I would recommend stoping any marketing on people that never go to the bar or go less than once a month.

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

### Analysis
I've choosen my coupon type to be 'Coffee House'. I'll begin by filtering the main dataframe by the coupon type so I'm working with only data I care about.

Next, I'll use a bar plot using the YCategory field as the `color` argument to visually see that specific features acceptance and rejection rate. From there I'll choose a couple of the highest frequency and higest acceptance rates to see if they correlate with any of the other variables.

In [27]:
df_cf = df.query('coupon == "Coffee House"')

In [28]:
df_cf.head(5)

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y,YCategory
23,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Male,21,Single,0,Bachelors degree,Architecture & Engineering,$62500 - $74999,never,less1,4~8,4~8,less1,1,0,0,0,1,0,Rejected
26,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Male,21,Single,0,Bachelors degree,Architecture & Engineering,$62500 - $74999,never,less1,4~8,4~8,less1,1,0,0,0,1,0,Rejected
27,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Male,21,Single,0,Bachelors degree,Architecture & Engineering,$62500 - $74999,never,less1,4~8,4~8,less1,1,1,0,0,1,0,Rejected
28,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Male,21,Single,0,Bachelors degree,Architecture & Engineering,$62500 - $74999,never,less1,4~8,4~8,less1,1,1,0,0,1,0,Rejected
30,No Urgent Place,Friend(s),Sunny,80,6PM,Coffee House,2h,Male,21,Single,0,Bachelors degree,Architecture & Engineering,$62500 - $74999,never,less1,4~8,4~8,less1,1,0,0,0,1,0,Rejected


In [29]:
arate_by_age = df_cf.groupby('age')['Y'].mean().sort_values(ascending=False)
arate_by_gender = df_cf.groupby('gender')['Y'].mean().sort_values(ascending=False)
arate_by_income = df_cf.groupby('income')['Y'].mean().sort_values(ascending=False)
arate_by_occupation = df_cf.groupby('occupation')['Y'].mean().sort_values(ascending=False)
arate_by_mstatus = df_cf.groupby('maritalStatus')['Y'].mean().sort_values(ascending=False)
arate_by_temp = df_cf.groupby('temperature')['Y'].mean().sort_values(ascending=False)
arate_by_time = df_cf.groupby('time')['Y'].mean().sort_values(ascending=False)
arate_by_destination = df_cf.groupby('destination')['Y'].mean().sort_values(ascending=False)
arate_by_passanger = df_cf.groupby('passanger')['Y'].mean().sort_values(ascending=False)
arate_by_exp = df_cf.groupby('expiration')['Y'].mean().sort_values(ascending=False)
arate_by_child = df_cf.groupby('has_children')['Y'].mean().sort_values(ascending=False)
arate_by_freq = df_cf.groupby('CoffeeHouse')['Y'].mean().sort_values(ascending=False)
arate_by_5min = df_cf.groupby('toCoupon_GEQ5min')['Y'].mean().sort_values(ascending=False)
arate_by_15min = df_cf.groupby('toCoupon_GEQ15min')['Y'].mean().sort_values(ascending=False)
arate_by_25min = df_cf.groupby('toCoupon_GEQ25min')['Y'].mean().sort_values(ascending=False)

print("Acceptance rate by age: \n", arate_by_age)
print("\nAcceptance rate by gender: \n", arate_by_gender)
print("\nAcceptance rate by income: \n", arate_by_income)
print("\nAcceptance rate by occupation: \n", arate_by_occupation)
print("\nAcceptance rate by marital status: \n", arate_by_mstatus)
print("\nAcceptance rate by temperature: \n", arate_by_temp)
print("\nAcceptance rate by time: \n", arate_by_time)
print("\nAcceptance rate by destination: \n", arate_by_destination)
print("\nAcceptance rate by passanger: \n", arate_by_passanger)
print("\nAcceptance rate by expiration: \n", arate_by_exp)
print("\nAcceptance rate by has children: \n", arate_by_child)
print("\nAcceptance rate by frequency: \n", arate_by_freq)
print("\nAcceptance rate by 5min: \n", arate_by_5min)
print("\nAcceptance rate by 15min: \n", arate_by_15min)
print("\nAcceptance rate by 25min: \n", arate_by_25min)

Acceptance rate by age: 
 age
below21    0.678322
21         0.517773
26         0.513174
46         0.506912
41         0.492114
31         0.483816
36         0.468586
50plus     0.419660
Name: Y, dtype: float64

Acceptance rate by gender: 
 gender
Male      0.501895
Female    0.491112
Name: Y, dtype: float64

Acceptance rate by income: 
 income
$12500 - $24999     0.552212
$37500 - $49999     0.547406
Less than $12500    0.540268
$87500 - $99999     0.539419
$50000 - $62499     0.498047
$100000 or More     0.489524
$25000 - $37499     0.465154
$62500 - $74999     0.435424
$75000 - $87499     0.298246
Name: Y, dtype: float64

Acceptance rate by occupation: 
 occupation
Healthcare Practitioners & Technical         0.760563
Building & Grounds Cleaning & Maintenance    0.727273
Transportation & Material Moving             0.618421
Healthcare Support                           0.615385
Student                                      0.614737
Installation Maintenance & Repair            0.568

In [30]:
# combine the highest acceptance rate fields to see if it increases the acceptance rate
more_than_1_list = ['1~3', '4~8', 'gt8']
over_25_list = ['26', '31', '36', '41', '46', '50plus']

occupation_list = ['Building & Grounds Cleaning & Maintenance', 'Healthcare Practitioners & Technical']
df1 = df_cf.query('occupation in @occupation_list and CoffeeHouse in @more_than_1_list and age in @over_25_list')

print("Acceptance rate: ", df1.query('Y == 1').shape[0] / df1.shape[0])
print("size: ", df1.shape[0])

Acceptance rate:  0.9
size:  50


### Coffee House Conclusion
Taking into consideration the entire dataset for `coupon='Coffee House'`. We see above that `occupation`, `age`, and frequency of visits are the three highest indicators *individually* for whether a coupon will be Accepted or not. I then attempted to combine those filtered sets to see if it correlated to a higher acceptance rate. Which it did. Getting a 90% acceptance rate. This is a rather high acceptance rate which may indicate a rather small set of data which fits the highest indicators in each fields. 
Shown above we see the entire set is only 50 points. More data would need to be collected to see if the same rate is true for a much larger data set.

Now we know where to look for people that go to coffee shops often and will use coupons. This includes occupations in `Healthcare Practitioners & Technical` and `Building  Grounds Cleaning & Maintenance`, who go to a Coffee House more than 1 time a month and are greater than the age of 25.

### Next Steps
What if we wanted to answer the question of who to target if we wanted to get new people (those who go to a Coffee Shop less than once, or never a month) to use a coupon?

## Next Steps Hypothesis
To answer this question
1) let's omit the datapoints we used to find what is already working
2) Rerun the acceptance rate calculations on each field
3) analyze the calculations to see which fields hold the most weight in the less effective set of data.
4) choose the fields along with the filter to specifically target those indicators to increase coupon use.

In [31]:
df_potential = df_cf.query('age not in @over_25_list and CoffeeHouse not in @more_than_1_list')

In [32]:
fig_barage = px.bar(df_potential, x='occupation', color='YCategory', color_discrete_map=color_map, labels={'occupation': 'Occupation Frequency'}, title='Coffee House Coupon Acceptance by Occupation')
fig_barage.update_traces(dict(marker_line_width=0)) # need on large datasets
fig_barage.show()

In [33]:
arate_potential_by_age = df_potential.groupby('age')['Y'].mean().sort_values(ascending=False)
arate_potential_by_gender = df_potential.groupby('gender')['Y'].mean().sort_values(ascending=False)
arate_potential_by_income = df_potential.groupby('income')['Y'].mean().sort_values(ascending=False)
arate_potential_by_occupation = df_potential.groupby('occupation')['Y'].mean().sort_values(ascending=False)
arate_potential_by_mstatus = df_potential.groupby('maritalStatus')['Y'].mean().sort_values(ascending=False)
arate_potential_by_temp = df_potential.groupby('temperature')['Y'].mean().sort_values(ascending=False)
arate_potential_by_time = df_potential.groupby('time')['Y'].mean().sort_values(ascending=False)
arate_potential_by_destination = df_potential.groupby('destination')['Y'].mean().sort_values(ascending=False)
arate_potential_by_passanger = df_potential.groupby('passanger')['Y'].mean().sort_values(ascending=False)
arate_potential_by_exp = df_potential.groupby('expiration')['Y'].mean().sort_values(ascending=False)
arate_potential_by_child = df_potential.groupby('has_children')['Y'].mean().sort_values(ascending=False)
arate_potential_by_freq = df_potential.groupby('CoffeeHouse')['Y'].mean().sort_values(ascending=False)
arate_potential_by_5min = df_potential.groupby('toCoupon_GEQ5min')['Y'].mean().sort_values(ascending=False)
arate_potential_by_15min = df_potential.groupby('toCoupon_GEQ15min')['Y'].mean().sort_values(ascending=False)
arate_potential_by_25min = df_potential.groupby('toCoupon_GEQ25min')['Y'].mean().sort_values(ascending=False)

print("Acceptance rate by age: \n", arate_potential_by_age)
print("\nAcceptance rate by gender: \n", arate_potential_by_gender)
print("\nAcceptance rate by income: \n", arate_potential_by_income)
print("\nAcceptance rate by occupation: \n", arate_potential_by_occupation)
print("\nAcceptance rate by marital status: \n", arate_potential_by_mstatus)
print("\nAcceptance rate by temperature: \n", arate_potential_by_temp)
print("\nAcceptance rate by time: \n", arate_potential_by_time)
print("\nAcceptance rate by destination: \n", arate_potential_by_destination)
print("\nAcceptance rate by passanger: \n", arate_potential_by_passanger)
print("\nAcceptance rate by expiration: \n", arate_potential_by_exp)
print("\nAcceptance rate by has children: \n", arate_potential_by_child)
print("\nAcceptance rate by frequency: \n", arate_potential_by_freq)
print("\nAcceptance rate by 5min: \n", arate_potential_by_5min)
print("\nAcceptance rate by 15min: \n", arate_potential_by_15min)
print("\nAcceptance rate by 25min: \n", arate_potential_by_25min)

Acceptance rate by age: 
 age
below21    0.454545
21         0.340376
Name: Y, dtype: float64

Acceptance rate by gender: 
 gender
Male      0.376471
Female    0.313725
Name: Y, dtype: float64

Acceptance rate by income: 
 income
$37500 - $49999     0.482143
$12500 - $24999     0.404494
$100000 or More     0.381818
$87500 - $99999     0.363636
$50000 - $62499     0.344262
$75000 - $87499     0.333333
Less than $12500    0.277778
$25000 - $37499     0.256757
$62500 - $74999     0.241379
Name: Y, dtype: float64

Acceptance rate by occupation: 
 occupation
Office & Administrative Support             0.666667
Student                                     0.487342
Architecture & Engineering                  0.392857
Unemployed                                  0.376812
Arts Design Entertainment Sports & Media    0.307692
Sales & Related                             0.296296
Management                                  0.280000
Computer & Mathematical                     0.250000
Life Physical So

### Next Steps (Conclusion)

The data was filtered to exclude anyone over the age of 25, and goes to a coffee shop more than once a month. The purpose of this as mentioned above, was to explore if there was were fields which indicated coupon acceptance in a subset of data. This subset of data could be used to focus marketing to specific indicators to grow coupon use for this subset. This can also be done for any subset of data you choose. Giving you the most out of the data I believe.

As we can see from the calculations above, `occupation` again is the greatest indicator of whether the coupon will be used or not. However, in this case the occupation with the highest acceptance rate is `Office & Administrative Support` with a lower volume. With this in mind I would also use the `Student` occupation as it has a larger volume which would target a larger group of people. My recommendation to increase coupon use would be to target individuals with occupation `Office & Administrative Support` or `Student` and are under the age of 25.