### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [3]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# This is just to better utilize the screen width with a high resolution monitor
from IPython.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [4]:
data = pd.read_csv('data/coupons.csv')

In [102]:
data.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


2. Investigate the dataset for missing or problematic data.

In [103]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar                   12577 non-null

3. Decide what to do about your missing data -- drop, replace, other...

In [104]:
data.iloc[:,0:14]

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,has_children,education,occupation,income
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,1,Some college - no degree,Unemployed,$37500 - $49999
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12679,Home,Partner,Rainy,55,6PM,Carry out & Take away,1d,Male,26,Single,0,Bachelors degree,Sales & Related,$75000 - $87499
12680,Work,Alone,Rainy,55,7AM,Carry out & Take away,1d,Male,26,Single,0,Bachelors degree,Sales & Related,$75000 - $87499
12681,Work,Alone,Snowy,30,7AM,Coffee House,1d,Male,26,Single,0,Bachelors degree,Sales & Related,$75000 - $87499
12682,Work,Alone,Snowy,30,7AM,Bar,1d,Male,26,Single,0,Bachelors degree,Sales & Related,$75000 - $87499


In [23]:
# Evaluating and cleaning data

In [22]:
# Creating data frame to store catalog of categories that are updated/modified
catalog = {}

In [105]:
data['destination'].value_counts()

destination
No Urgent Place    6283
Home               3237
Work               3164
Name: count, dtype: int64

In [175]:
data['destination'] = data['destination'].str.replace('No Urgent Place','None',regex=True)
data['destination'].value_counts()

destination
None    6283
Home    3237
Work    3164
Name: count, dtype: int64

In [107]:
data['passanger'].value_counts()

passanger
Alone        7305
Friend(s)    3298
Partner      1075
Kid(s)       1006
Name: count, dtype: int64

In [28]:
data['weather'].value_counts()

weather
Sunny    10069
Snowy     1405
Rainy     1210
Name: count, dtype: int64

In [29]:
data['time'].value_counts()

time
6PM     3230
7AM     3164
10AM    2275
2PM     2009
10PM    2006
Name: count, dtype: int64

In [185]:
data['time'] = data['time'].apply(lambda x: x.replace('AM','') if np.char.endswith(x,'AM') else\
                                  (str(int(x.replace('PM','')) + 12) if np.char.endswith(x,'PM') else x))
data['time'].value_counts()

time
18    3230
7     3164
10    2275
14    2009
22    2006
Name: count, dtype: int64

In [212]:
catalog['time'] = {'7':'7:00','10':'10:00','14':'14:00','18':'18:00','22':'22:00'}
print(catalog['time'])

{'7': '7:00', '10': '10:00', '14': '14:00', '18': '18:00', '22': '22:00'}


In [173]:
data['coupon'].value_counts()

coupon
Coffee      3996
Rest        2786
ToGo        2393
Bar         2017
RestPlus    1492
Name: count, dtype: int64

In [206]:
data['coupon'] = data['coupon'].str.replace(' House','',regex=True).str.replace('Restaurant\(<20\)','Rest',regex=True).str.replace('Carry out & Take away','ToGo',regex=True)\
                .str.replace('Restaurant\(20-50\)','RestPlus',regex=True)
data['coupon'].value_counts()

coupon
Coffee      3996
Rest        2786
ToGo        2393
Bar         2017
RestPlus    1492
Name: count, dtype: int64

In [211]:
catalog['coupon'] = {'Coffee':'Coffee House','Rest':'Restaurant(<20)','ToGo':'Carry out & Take away','Bar':'Bar','RestPlus':'Restaurant(20-50)'}
print(catalog['coupon'])

{'Coffee': 'Coffee House', 'Rest': 'Restaurant(<20)', 'ToGo': 'Carry out & Take away', 'Bar': 'Bar', 'RestPlus': 'Restaurant(20-50)'}


In [112]:
data['expiration'].value_counts()

expiration
1d    7091
2h    5593
Name: count, dtype: int64

In [116]:
data['expiration'] = data['expiration'].str.replace('1d','24h',regex=True)
data['expiration'].value_counts()

expiration
24h    7091
2h     5593
Name: count, dtype: int64

In [117]:
data['gender'].value_counts()

gender
Female    6511
Male      6173
Name: count, dtype: int64

In [118]:
data['age'].value_counts()

age
21         2653
26         2559
31         2039
50plus     1788
36         1319
41         1093
46          686
below21     547
Name: count, dtype: int64

In [120]:
data['age'] = data['age'].str.replace('plus','',regex=True).str.replace('below21','19',regex=True)
data['age'].value_counts()

age
21    2653
26    2559
31    2039
50    1788
36    1319
41    1093
46     686
19     547
Name: count, dtype: int64

In [210]:
catalog['age'] = {'19':'below21','21':'21','26':'26','31':'31','36':'36','41':'41','46':'46','50':'50plus'}
print(catalog['age'])

{'19': 'below21', '21': '21', '26': '26', '31': '31', '36': '36', '41': '41', '46': '46', '50': '50plus'}


In [121]:
data['maritalStatus'].value_counts()

maritalStatus
Married partner      5100
Single               4752
Unmarried partner    2186
Divorced              516
Widowed               130
Name: count, dtype: int64

In [123]:
data['maritalStatus'] = data['maritalStatus'].str.replace(' partner','',regex=True).str.replace('Unmarried','Partner')
data['maritalStatus'].value_counts()

maritalStatus
Married     5100
Single      4752
Partner     2186
Divorced     516
Widowed      130
Name: count, dtype: int64

In [214]:
catalog['maritalStaus'] = {'Married':'Married partner','Partner':'Unmarried partner','Divorced':'Divorced','Single':'Single','Widowed':'Widowed'}
print(catalog['maritalStaus'])

{'Married': 'Married partner', 'Single': 'Single', 'Partner': 'Unmarried partner', 'Divorced': 'Divorced', 'Widowed': 'Widowed'}


In [124]:
data['has_children'].value_counts()

has_children
0    7431
1    5253
Name: count, dtype: int64

In [125]:
data['education'].value_counts()

education
Some college - no degree                  4351
Bachelors degree                          4335
Graduate degree (Masters or Doctorate)    1852
Associates degree                         1153
High School Graduate                       905
Some High School                            88
Name: count, dtype: int64

In [134]:
data['education'] = data['education'].str.replace(' degree','',regex=True).str.replace('Some college - no','Undergrad',regex=True).str.replace(' \(Masters or Doctorate\)','',regex=True)\
                                        .str.replace('High School Graduate','High School',regex=True).str.replace('Some High School','Elementary',regex=True)
data['education'].value_counts()

education
Undergrad      4351
Bachelors      4335
Graduate       1852
Associates     1153
High School     905
Elementary       88
Name: count, dtype: int64

In [215]:
catalog['age'] = {'Graduate':'Graduate degree (Masters or Doctorate)','Bachelors':'Bachelors degree','Associates':'Associates degree','Undergrad':'Some college - no degree',\
                  'High School':'High School Graduate','Elementary':'Some High School'}
print(catalog['age'])

{'Graduate': 'Graduate degree (Masters or Doctorate)', 'Bachelors': 'Bachelors degree', 'Associates': 'Associates degree', 'Undergrad': 'Some college - no degree', 'High School': 'High School Graduate', 'Elementary': 'Some High School'}


In [129]:
data['occupation'].value_counts()

occupation
Unemployed                                   1870
Student                                      1584
Computer & Mathematical                      1408
Sales & Related                              1093
Education&Training&Library                    943
Management                                    838
Office & Administrative Support               639
Arts Design Entertainment Sports & Media      629
Business & Financial                          544
Retired                                       495
Food Preparation & Serving Related            298
Healthcare Practitioners & Technical          244
Healthcare Support                            242
Community & Social Services                   241
Legal                                         219
Transportation & Material Moving              218
Architecture & Engineering                    175
Personal Care & Service                       175
Protective Service                            175
Life Physical Social Science           

In [14]:
data['income'].value_counts()

income
$25000 - $37499     2013
$12500 - $24999     1831
$37500 - $49999     1805
$100000 or More     1736
$50000 - $62499     1659
Less than $12500    1042
$87500 - $99999      895
$75000 - $87499      857
$62500 - $74999      846
Name: count, dtype: int64

In [19]:
data['income'] = data['income'].str.replace('Less than $12500','1').str.replace('$12500 - $24999','2').str.replace('$25000 - $37499','3')\
                                .str.replace('$37500 - $49999','4').str.replace('$50000 - $62499','5').str.replace('$62500 - $74999','6')\
                                .str.replace('$75000 - $87499','7').str.replace('$87500 - $99999','8').str.replace('$100000 or More','9')
data['income'].value_counts()

income
3    2013
2    1831
4    1805
9    1736
5    1659
1    1042
8     895
7     857
6     846
Name: count, dtype: int64

In [24]:
catalog['income'] = {'1':'Less than $12500','2':'$12500 - $24999','3':'$25000 - $37499','4':'$37500 - $49999','5':'$50000 - $62499','6':'$62500 - $74999',\
                    '7':'$75000 - $87499','8':'$87500 - $99999','9':'$100000 or More'}
print(catalog['income'])

{'1': 'Less than $12500', '2': '$12500 - $24999', '3': '$25000 - $37499', '4': '$37500 - $49999', '5': '$50000 - $62499', '6': '$62500 - $74999', '7': '$75000 - $87499', '8': '$87500 - $99999', '9': '$100000 or More'}


In [None]:
# Looking at the data set, there are a few columns with NaN values. 

In [8]:
data['car'].value_counts()

car
Scooter and motorcycle                      22
Mazda5                                      22
do not drive                                22
crossover                                   21
Car that is too old to install Onstar :D    21
Name: count, dtype: int64

In [9]:
data['Bar'].value_counts()

Bar
never    5197
less1    3482
1~3      2473
4~8      1076
gt8       349
Name: count, dtype: int64

In [10]:
data['CoffeeHouse'].value_counts()

CoffeeHouse
less1    3385
1~3      3225
never    2962
4~8      1784
gt8      1111
Name: count, dtype: int64

In [11]:
data['CarryAway'].value_counts()

CarryAway
1~3      4672
4~8      4258
less1    1856
gt8      1594
never     153
Name: count, dtype: int64

In [12]:
data['RestaurantLessThan20'].value_counts()

RestaurantLessThan20
1~3      5376
4~8      3580
less1    2093
gt8      1285
never     220
Name: count, dtype: int64

In [13]:
data['Restaurant20To50'].value_counts()

Restaurant20To50
less1    6077
1~3      3290
never    2136
4~8       728
gt8       264
Name: count, dtype: int64

4. What proportion of the total observations chose to accept the coupon? 



5. Use a bar plot to visualize the `coupon` column.

6. Use a histogram to visualize the temperature column.

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


2. What proportion of bar coupons were accepted?


3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  