### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [152]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [153]:
data = pd.read_csv('data/coupons.csv')

In [154]:
pd.set_option('display.max_column',30)
data.head(20)

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,car,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0
5,No Urgent Place,Friend(s),Sunny,80,6PM,Restaurant(<20),2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
6,No Urgent Place,Friend(s),Sunny,55,2PM,Carry out & Take away,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
7,No Urgent Place,Kid(s),Sunny,80,10AM,Restaurant(<20),2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
8,No Urgent Place,Kid(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,1
9,No Urgent Place,Kid(s),Sunny,80,10AM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,,never,never,,4~8,1~3,1,1,0,0,1,0


In [155]:
for row in data:
    print(data[row].value_counts())

No Urgent Place    6283
Home               3237
Work               3164
Name: destination, dtype: int64
Alone        7305
Friend(s)    3298
Partner      1075
Kid(s)       1006
Name: passanger, dtype: int64
Sunny    10069
Snowy     1405
Rainy     1210
Name: weather, dtype: int64
80    6528
55    3840
30    2316
Name: temperature, dtype: int64
6PM     3230
7AM     3164
10AM    2275
2PM     2009
10PM    2006
Name: time, dtype: int64
Coffee House             3996
Restaurant(<20)          2786
Carry out & Take away    2393
Bar                      2017
Restaurant(20-50)        1492
Name: coupon, dtype: int64
1d    7091
2h    5593
Name: expiration, dtype: int64
Female    6511
Male      6173
Name: gender, dtype: int64
21         2653
26         2559
31         2039
50plus     1788
36         1319
41         1093
46          686
below21     547
Name: age, dtype: int64
Married partner      5100
Single               4752
Unmarried partner    2186
Divorced              516
Widowed               1

In [156]:
data.isna().sum()









destination                 0
passanger                   0
weather                     0
temperature                 0
time                        0
coupon                      0
expiration                  0
gender                      0
age                         0
maritalStatus               0
has_children                0
education                   0
occupation                  0
income                      0
car                     12576
Bar                       107
CoffeeHouse               217
CarryAway                 151
RestaurantLessThan20      130
Restaurant20To50          189
toCoupon_GEQ5min            0
toCoupon_GEQ15min           0
toCoupon_GEQ25min           0
direction_same              0
direction_opp               0
Y                           0
dtype: int64

2. Investigate the dataset for missing or problematic data.

### Missing/Problematic data
1. From the above value_counts() function, we can see that out of 12684 rows of data there are 12576 rows having no data. So we have very insignificant data captured for the car column which is less than 1%. So we can confidently drop the car data.
2. Additionally direction_same and direction_opp are redundant and we only need one of the 2 fields. They are opposite values and can't be same values for a given row.

3. Decide what to do about your missing data -- drop, replace, other...

In [157]:
data.drop(columns=['car','direction_opp'],inplace=True)
 



4. What proportion of the total observations chose to accept the coupon? 



In [173]:
df_accept = data.Y.value_counts()
#By taking percentage of the column Y for the values 1 and 0 we can find out what percentage of the observations accepted vs not accepted the coupons.
print("Total number of users that chose to accept the coupon {} and it accounts to {} percent".format(df_accept[1],(df_accept[1] / data.shape[0]) * 100))
print("Total number of users that chose not to accept the coupon {} and it accounts to {} percent".format(df_accept[0],(df_accept[0] / data.shape[0]) * 100))

Total number of users that chose to accept the coupon 7210 and it accounts to 56.84326710816777 percent
Total number of users that chose not to accept the coupon 5474 and it accounts to 43.15673289183223 percent


5. Use a bar plot to visualize the `coupon` column.

In [183]:
coupon_data = pd.DataFrame(data['coupon'].unique())
coupon_data['Accepted'] = list(data[data.Y==1].groupby('coupon').Y.count())
coupon_data['Not_Accepted'] = list(data[data.Y==0].groupby('coupon').Y.count())
coupon_data['Accepted_Percentage'] = (coupon_data['Accepted'] / (coupon_data['Accepted'] + coupon_data['Not_Accepted'])) *100
coupon_data['Not_Accepted_Percentage'] = (coupon_data['Not_Accepted'] / (coupon_data['Accepted'] + coupon_data['Not_Accepted'])) *100
coupon_data.rename(columns={0:"Coupon_Type"},inplace=True)
px.bar(coupon_data,x="Coupon_Type",y=["Accepted_Percentage","Not_Accepted_Percentage"])


6. Use a histogram to visualize the temperature column.

In [186]:
weather_data = pd.DataFrame(data['temperature'].unique())
weather_data['Accepted'] = list(data[data.Y==1].groupby('temperature').Y.count())
weather_data['Not_Accepted'] = list(data[data.Y==0].groupby('temperature').Y.count())
weather_data['Accepted_Percentage'] = (weather_data['Accepted'] / (weather_data['Accepted'] + weather_data['Not_Accepted'])) *100
weather_data['Not_Accepted_Percentage'] = (weather_data['Not_Accepted'] / (weather_data['Accepted'] + weather_data['Not_Accepted'])) *100
weather_data.rename(columns={0:"Weather"},inplace=True)
#weather_data
px.bar(weather_data,x="Weather",y=["Accepted_Percentage","Not_Accepted_Percentage"])

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [218]:
bar_data = data.loc[data['coupon'] == "Bar"]
bar_data

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income,Bar,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,Y
9,No Urgent Place,Kid(s),Sunny,80,10AM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,,4~8,1~3,1,1,0,0,0
13,Home,Alone,Sunny,55,6PM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,,4~8,1~3,1,0,0,1,1
17,Work,Alone,Sunny,55,7AM,Bar,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999,never,never,,4~8,1~3,1,1,1,0,0
24,No Urgent Place,Friend(s),Sunny,80,10AM,Bar,1d,Male,21,Single,0,Bachelors degree,Architecture & Engineering,$62500 - $74999,never,less1,4~8,4~8,less1,1,0,0,0,1
35,Home,Alone,Sunny,55,6PM,Bar,1d,Male,21,Single,0,Bachelors degree,Architecture & Engineering,$62500 - $74999,never,less1,4~8,4~8,less1,1,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12663,No Urgent Place,Friend(s),Sunny,80,10PM,Bar,1d,Male,26,Single,0,Bachelors degree,Sales & Related,$75000 - $87499,never,never,1~3,4~8,1~3,1,1,0,0,0
12664,No Urgent Place,Friend(s),Sunny,55,10PM,Bar,2h,Male,26,Single,0,Bachelors degree,Sales & Related,$75000 - $87499,never,never,1~3,4~8,1~3,1,1,0,0,0
12667,No Urgent Place,Alone,Rainy,55,10AM,Bar,1d,Male,26,Single,0,Bachelors degree,Sales & Related,$75000 - $87499,never,never,1~3,4~8,1~3,1,1,0,0,0
12670,No Urgent Place,Partner,Rainy,55,6PM,Bar,2h,Male,26,Single,0,Bachelors degree,Sales & Related,$75000 - $87499,never,never,1~3,4~8,1~3,1,1,0,0,0


2. What proportion of bar coupons were accepted?


In [261]:
bar_data = data.loc[data['coupon'] == "Bar"]
bar_data1 = pd.DataFrame(bar_data['coupon'].unique())

bar_data1['Accepted'] = list(bar_data[bar_data.Y==1].groupby('coupon').Y.count())
bar_data1
bar_data1['Not_Accepted'] = list(bar_data[bar_data.Y==0].groupby('coupon').Y.count())
bar_data1['Accepted_Percentage'] = (bar_data1['Accepted'] / (bar_data1['Accepted'] + bar_data1['Not_Accepted'])) *100
bar_data1['Not_Accepted_Percentage'] = (bar_data1['Not_Accepted'] / (bar_data1['Accepted'] + bar_data1['Not_Accepted'])) *100
bar_data1.rename(columns={0:"coupon"},inplace=True)
px.bar(bar_data1,x="coupon",y=["Accepted_Percentage","Not_Accepted_Percentage"])
### Based on the plot, 41% of the bar coupons were accepted

3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [264]:
bar_data = data.loc[data['coupon'] == "Bar"]
bar_data1= pd.DataFrame(bar_data['Bar'].unique())
bar_data[bar_data.Y==1].groupby('Bar').Y.count()
bar_data1.dropna(inplace=True)
bar_data1['Accepted'] = list(bar_data[bar_data.Y==1].groupby('Bar').Y.count())
bar_data1['Not_Accepted'] = list(bar_data[data.Y==0].groupby('Bar').Y.count())
bar_data1['Accepted_Percentage'] = (bar_data1['Accepted'] / (bar_data1['Accepted'] + bar_data1['Not_Accepted'])) *100
bar_data1['Not_Accepted_Percentage'] = (bar_data1['Not_Accepted'] / (bar_data1['Accepted'] + bar_data1['Not_Accepted'])) *100
bar_data1.rename(columns={0:"Bar_Coupon"},inplace=True)
#bar_data1
px.bar(bar_data1,x="Bar_Coupon",y=["Accepted_Percentage","Not_Accepted_Percentage"])

### Based on the plot - Acceptance rate for people who went to bar 3 or fewer times is higher than those went to bar 4 and more
### People who have been to bars less than once have the highest acceptance rate




Boolean Series key will be reindexed to match DataFrame index.



4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [283]:
def bivariate_analysis(col1,col2,c1,c2,no_row):
    a = list(data[col1].unique())
    b = list(data[col2].unique())
    df = pd.DataFrame([(i, j) for i in a for j in b], columns=[col1,col2])
    Total_Count=[]; Accepted=[]; Rejected=[]
    for i in range(df.shape[0]):
        Total_Count.append((data[(c1==df[col1][i]) & (c2==df[col2][i])]).shape[0])
        Accepted.append((data[(c1==df[col1][i]) & (c2==df[col2][i]) & (data.Y==1)]).shape[0])
        Rejected.append((data[(c1==df[col1][i]) & (c2==df[col2][i]) & (data.Y==0)]).shape[0])
    df['Total_Count'] = Total_Count
    df['Accepted'] = Accepted
    df['Rejected'] = Rejected
    df['%Accepted'] = round(df['Accepted']/df['Total_Count']*100,3)
    df['%Rejected'] = round(df['Rejected']/df['Total_Count']*100,3)
    df = df.dropna()
    df = df.sort_values(by='%Accepted', ascending=False)
    d = pd.concat([df.head(no_row), df.tail(no_row)], axis=0)
    return d

df = bivariate_analysis('Bar','age',data.Bar,data.age,13)
df.sort_values('age',inplace=True)
px.bar(df,y="Bar", x="age", color='%Accepted')
#df
 



5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  