### Will a Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \\$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\\$20 - \\$50). 

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece. 





### Data Description

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import plotly.express as px
import cufflinks as cf
cf.go_offline()
%matplotlib inline

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [2]:
data = pd.read_csv('data/coupons.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'data/coupons.csv'

In [None]:
data.head()

# First checking number of rows and columns, the dimension, and summary of the data

In [None]:
data.index

In [None]:
data.columns 

In [None]:
data.info()

In [None]:
data.shape

Summary:From the above information, the dataset has 12684 entries and 26 columns. Each entries has a 1 step.

# Geting some descriptive statistics of the dataset

In [None]:
data.describe()

The describe function only provided statitcal information of columns of the dataset that has only numerical value.

Note from descriptive statistics:

From the above statistics we have seen that min,max,mean, std, and quartils of the dataset that has a numeric value column.


# 2. Investigate the dataset for missing or problematic data.

Suggestion: Befor we start to add and delete the missing data, it is good idea to have a copy of the original data

In [None]:
# this is the copy of the original dataset 
data_copy=data.copy()

In [None]:
# Now you can add and drop the missing data
pd.options.display.max_columns=None
data.head(3)

In [None]:
data.dtypes

# To check the missing values from the dataset, isnull() function is a good tool to perform it

In [None]:
data.isnull().sum()

summary: The results from the above .isnull() function, the columns car,Bar,coffeehouse,carryAway,RestaurantLessThan20,and
          Restaurant20To50 have a missing values. 

In [None]:
# get percentage of missing value
data.isnull().sum()/data.shape[0]*100

Summary: Car column has more 99% missing values and Bar,coffeehouse,carryAway,RestaurantLessThan20,and Restaurant20To50 have under 2% missing values.

# Since most columns has no missing value, the ratio is 0.00. Lets grap only columns that has a missing value, greater than 0

In [None]:
# All missing values greater than 0
for columns in data.columns:
    if data[columns].isnull().sum()>0:
        print(columns,':missing data in percentage is {values_gt0:.2f}%'.format(values_gt0=data[columns].isnull().sum()/data[columns].shape[0]*100))

summary:
From the above result car column has huge number of value missing. It has more that 99% missing values.
 So, since this column has a huge number of missing values, it is quite had to fill it and give to machine. probablly,the result will affect by this decision. The good idea is its better to drop the column.
 
 The other columns such as Bar,coffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50 each column has below 2% missing values. Better to keep and fill them. But, important question think what to fill.  

3. # Decide what to do about your missing data -- drop, replace, other...

In [None]:
# graping datasets than contains onlt either 'never','less1','1~3','4~8', or 'gt8' for bar column
Bar_never=data.loc[data['Bar'].str.contains('never',na=False)]
Bar_less1=data.loc[data['Bar'].str.contains('less1',na=False)]
Bar_b13=data.loc[data['Bar'].str.contains('1~3',na=False)]
Bar_b48=data.loc[data['Bar'].str.contains('4~8',na=False)]
Bar_gt8=data.loc[data['Bar'].str.contains('gt8',na=False)]

In [None]:
# lets start to fill the Bar 
# lets see first the unique of the data
print('Number of unique of a Bar:\n',data['Bar'].unique()) 
print('\n')
print('Total values for each unique: \n',data['Bar'].value_counts())
print('\n')
print('the number of missing data of a Bar:\n', data['Bar'].isnull().sum())

#this process can go to all others that has a missing value

In [None]:
#Look at the Bar data set what to fill the missing value. 
#since in the of case "Bar", 'never' has the highest value, I will fill the missing with 'never'
data['Bar'].fillna(data['Bar'].value_counts().idxmax(), inplace=True)

In [None]:
# for the CoffeeHouse,CarryAway,RestaurantLessThan20, and Restaurant20To50 
data['CoffeeHouse'].fillna(data['CoffeeHouse'].value_counts().idxmax(), inplace=True) 
data['CarryAway'].fillna(data['CarryAway'].value_counts().idxmax(), inplace=True) 
data['RestaurantLessThan20'].fillna(data['RestaurantLessThan20'].value_counts().idxmax(), inplace=True) 
data['Restaurant20To50'].fillna(data['Restaurant20To50'].value_counts().idxmax(), inplace=True) 

In [None]:
data.isnull().sum()

In [None]:
# Last the Car column, 

# The Car column has a huge number of missing values. Which 12576 or 99.15% values are missing. I highly sujjested that to drop
#it than to fill it.

data.drop('car', axis=1, inplace=True)

In [None]:
data.isnull().sum()

Finally, we have a clean dataset after I applied a fillna and drop function, to fill if the missing values of the column are
small pecentage and drop if large percentage

# Extra exploratory data analysis and Data Visualization

In [None]:
sns.pairplot(data)

The pairplot gives us some dataset relationship that has only a numeric value.

In [None]:
fig =plt.figure(figsize=(10,8))
sns.heatmap(data.corr(), annot=True, cmap='Greens', linewidths=5, linecolor='white')
#sns.color_palette("flare", as_cmap=True)

From the seaborn heatmap, there is no strong relationship between the columns that have a numeric values. 

In [None]:
data.columns

In [None]:
sns.jointplot(data=data, x ="toCoupon_GEQ5min",y='Y', kind='hex')

In [None]:
px.scatter(data_frame=data, x ='temperature', y= 'toCoupon_GEQ5min', marginal_x='histogram', marginal_y='histogram')

# Count each unique of the column using 

value_counts() function 

Lets discore what are the total number of coupons are accepted or 
             what are the total number of coupons are rejected from the total amount provided from Amazon

In [None]:
# to know the total number of customers has accepted or rejected

data['Y'].value_counts()

In [None]:
# lets see this on graph
fig=plt.figure()
plt.bar(data['Y'].unique(),data['Y'].value_counts(), color='blue', width=0.5, edgecolor='black')
plt.xlabel('Y-coupon')
plt.ylabel('Y-value counts')
plt.title('Total number of customers who has accepted or rejected coupon', color='purple')
fig.show()

In [None]:
#Calulating tota numbers of customers whop has accepted and rejected in percentage
print('The total percentage of customers who accepted the coupn is %.2f%%' %(data['Y'].value_counts()[1]/data.shape[0]*100))
print('The total percentage of customers who rejected the coupn is %.2f%%' %(data['Y'].value_counts()[0]/data.shape[0]*100))


So, from the above bar graph and total percentage ratio calculation, I have concluded the from the total number of coupons amazon had sent, more 56% were accepted by the customers and under 43% are rejected. 

Conclusion: Amazon still have known that a higher numbers of coupons had accepted

In [None]:
data.head()

4. # What proportion of the total observations chose to accept the coupon? 



In [None]:
data_only_acceted_coupon=data.loc[data['Y']==1]

In [None]:
print(data_only_acceted_coupon.shape)
print('\n')
# from the total observations, the proportion of accepted coupons 
print(data['Y'].value_counts()[1]/data.shape[0])
print('\n')
# Total percentage accepted coupons
print(f'The total percentage of customers who accepted the coupn is %.2f%%' %(data['Y'].value_counts()[1]/data.shape[0]*100))

Summary: from the total survey, 7210 coupons were accpted. This is 56.84% from the total coupons are accepted.

5. # Use a bar plot to visualize the `coupon` column.

In [None]:
#showing total number of coupons provided by amazon
print('Total number of coupons provided by Amazon are:', data['coupon'].nunique())
print('\n')
print('list of all coupons \n',data['coupon'].unique())

In [None]:
#total values of each coupons
data['coupon'].value_counts()

In [None]:
#a plotly for each coupons 
arranged_coupons=['Coffee House','Restaurant(<20)', 'Carry out & Take away', 'Bar', 'Restaurant(20-50)']
px.bar(x=arranged_coupons, y=data['coupon'].value_counts(),title='bar plot to visualize the coupon column')

In [None]:
#matplotlib of each coupons
arranged_coupons=['Coffee House','Restaurant(<20)', 'Carry out & Take away', 'Bar', 'Restaurant(20-50)']
plt.bar(arranged_coupons,data['coupon'].value_counts())
plt.xlabel('Coupons')
plt.ylabel('Its values')
plt.title('bar plot to visualize the coupon column')
plt.xticks(rotation=45)

In [None]:
#percentage of acceptance for each coupons 
print('The percentage of coupons accepted from coffee house is %.2f%%' %(data['coupon'].value_counts()[0]/data.shape[0]*100),)
print('The percentage of coupons accepted from Restaurant(<20) is %.2f%%' %(data['coupon'].value_counts()[1]/data.shape[0]*100),)
print('The percentage of coupons accepted from Carry out & Take away is %.2f%%' %(data['coupon'].value_counts()[2]/data.shape[0]*100),)
print('The percentage of coupons accepted from bar is %.2f%%' %(data['coupon'].value_counts()[3]/data.shape[0]*100),)
print('The percentage coupons accepted from Restaurant(20-50) is %.2f%%' %(data['coupon'].value_counts()[4]/data.shape[0]*100))

Summary: 

From the given survey data, the coffee house get higher number of coupon acceptance compare to the others, 
with 31.50% acceptance and Restaurant(20-50) get low number of coupon acceptance.

So, amazon can aware that most number of customers love coupons from the coffee house. It is good to send more a coffee house coupon than the others.

6. # Use a histogram to visualize the temperature column.

In [None]:
#number of uniques and its corresponding values for the temeperature column
data['temperature'].unique()
data['temperature'].value_counts()

In [None]:
#a seaborn of temperature column
sns.histplot(data=data, x ='temperature')
plt.xlabel('Temperature')
plt.ylabel("It's Count")
plt.title('Temperature Histogram')

In [None]:
data[data['temperature']==80]['Y'].value_counts()
data[data['temperature']==80]['Y'].value_counts()/data['temperature'].value_counts()[80]*100

In [None]:
data[data['temperature']==55]['Y'].value_counts()
data[data['temperature']==55]['Y'].value_counts()/data['temperature'].value_counts()[55]*100

Summary: From the total coupon have sent at 55 temperature, more 53 percent have accepted and rest rejected

In [None]:
data[data['temperature']==30]['Y'].value_counts()
data[data['temperature']==30]['Y'].value_counts()/data['temperature'].value_counts()[30]*100

Summary: From the total coupon have sent at30 temperature, more 53 percent have accepted and rest rejected

# Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

# 1. Create a new `DataFrame` that contains just the bar coupons.

In [None]:
data.sample(10)

In [None]:
# graping coupon uniques
data['coupon'].unique()

In [None]:
# graping all rows that contains "Bar" by using .str.contains function
bar_coupon=data.loc[data['coupon'].str.contains('Bar',na=False)]
bar_coupon.head()

In [None]:
# Total bar coupons sent to the drivers from the entire survey given
bar_coupon.shape

In [None]:
# accepted bar coupon
bar_coupon.loc[bar_coupon['Y']==1].shape

# 2. What proportion of bar coupons were accepted?


In [None]:
#creating a finction that calculate the ratio of bar coupons to the total bar coupon offers
def proportion_bar_coupon(data1,data2):
    
    return data2.shape[0]/data1.shape[0]

In [None]:
proportion_bar_coupon(bar_coupon,bar_coupon.loc[bar_coupon['Y']==1])

Summary: 0.41 fron the total survey of a Bar coupon where accepted. We conculed that more 59% of the bar coupons were rejected. 

# 3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [None]:
# graping bar uniques
data['Bar'].unique()

In [None]:
#bar total number of rows and columns
data['Bar'].shape

In [None]:
#counting a value for each uniques of bar coupons
data['Bar'].value_counts()

In [None]:
#creating a list that contains drivers went to the bar 3 or fewer times a month and who went more repectively
driver_go_bar_lthan3_list=['never', 'less1', '1~3']
driver_go_bar_mthan4_list=['4~8','gt8']

In [None]:
#use a query to get the darFrame of drivers went to the bar 3 or fewer times a month and who went more repectively
query_driver_go_bar_lthan3=bar_coupon.query('Bar in @driver_go_bar_lthan3_list')
query_driver_go_bar_mthan4=bar_coupon.query('Bar in @driver_go_bar_mthan4_list')

In [None]:
#total number of coupons accepted by those who went to bar less than 3 or fever
# and those who went to bar more times
print('Total number of bar coupons accepted by those who went to bar less than 3 or fever is: {}'.format(query_driver_go_bar_lthan3.loc[query_driver_go_bar_lthan3['Y']==1].shape[0]))
print('Total number of bar coupons accepted by those who went to bar more times is: {}'.format(query_driver_go_bar_mthan4.loc[query_driver_go_bar_mthan4['Y']==1].shape[0]))

In [None]:
#total number of coupons rejected by those who went to bar less than 3 or fever
# and those who went to bar more times
print('Total number of bar coupons rejected by those who went to bar less than 3 or fever is: {}'.format(query_driver_go_bar_lthan3.loc[query_driver_go_bar_lthan3['Y']==0].shape[0]))
print('Total number of bar coupons rejected by those who went to bar more times is: {}'.format(query_driver_go_bar_mthan4.loc[query_driver_go_bar_mthan4['Y']==0].shape[0]))

In [None]:
# acceptance rate of bar coupons by those who went to bar less than 3 or fever to
# 'Total number of bar coupons accepted by those who went to bar more times
def compare_acceptance_rate(data1,data2):
    return data2.shape[0]/data1.shape[0]

In [None]:
dataquery1=query_driver_go_bar_lthan3.loc[query_driver_go_bar_lthan3['Y']==1]
dataquery2=query_driver_go_bar_mthan4.loc[query_driver_go_bar_mthan4['Y']==1]
compare_acceptance_rate(dataquery1,dataquery2)

Summary:Total number of bar coupons accepted by those who went to bar less than 3 or fewer is 6323 and Total number of bar coupons accepted by those who went to bar more times is 887. So, more 12% coupons where accepted by  those who went to bar less than 3 or fever than those who went to bar more times

# 4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [None]:
#creating a list that contains drivers went to the bar less than one and more than one
drivers_go_bar_lthan_once_list=['never','less1']
drivers_go_bar_mthan_once_list=['1~3','4~8','gt8']

In [None]:
#use .isin() function to get the darFrame of drivers who went more once and are over the age of 25 #

#drivers who went more once
drivers_go_bar_mthan_once= bar_coupon.loc[bar_coupon['Bar'].isin(drivers_go_bar_mthan_once_list)]
drivers_go_bar_mthan_once=drivers_go_bar_mthan_once[drivers_go_bar_mthan_once['Y']==1]

#over the age of 25
drivers_go_bar_mthan_once_and_gt25=drivers_go_bar_mthan_once[drivers_go_bar_mthan_once['age']>'25']
drivers_go_bar_mthan_once_and_gt25.shape[0]

In [None]:
#use .isin() function to get the darFrame of drivers went to the bar less than once
drivers_go_bar_lthan_once=bar_coupon.loc[bar_coupon['Bar'].isin(drivers_go_bar_lthan_once_list)]
drivers_go_bar_lthan_once=drivers_go_bar_lthan_once[drivers_go_bar_lthan_once['Y']==1]
drivers_go_bar_lthan_once.shape[0]

In [None]:
#acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 
#to the all others is
drivers_go_bar_mthan_once_and_gt25.shape[0]/drivers_go_bar_lthan_once.shape[0]*100

Summary: More 70% of the bar coupons are accepted by those who go to the bar less than one than drivers who go to a bar more than once a month and are over the age of 25. 

# 5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry. 


In [None]:
#compare_passenger_withoutkids=
drivers_go_bar_mthan_once=['1~3','4~8','gt8']
drivers_go_bar_mthan_once=bar_coupon.loc[bar_coupon['Bar'].isin(drivers_go_bar_mthan_once)]

In [None]:
#list of passengers went to bar with out kids
list_passenger_withoutkids=['Alone','Friend(s)','Partner']

#list occupations other than farming, fishing, or forestry.
occupation_list=['Unemployed', 'Architecture & Engineering', 'Student',
       'Education&Training&Library', 'Healthcare Support',
       'Healthcare Practitioners & Technical', 'Sales & Related',
       'Management', 'Arts Design Entertainment Sports & Media',
       'Computer & Mathematical', 'Life Physical Social Science',
       'Personal Care & Service', 'Community & Social Services',
       'Office & Administrative Support', 'Construction & Extraction',
       'Legal', 'Retired', 'Installation Maintenance & Repair',
       'Transportation & Material Moving', 'Business & Financial',
       'Protective Service', 'Food Preparation & Serving Related',
       'Production Occupations',
       'Building & Grounds Cleaning & Maintenance']

In [None]:
bar_passenger_withoutkids=drivers_go_bar_mthan_once.loc[drivers_go_bar_mthan_once['passanger'].isin(list_passenger_withoutkids)]
bar_passenger_withoutkids.shape[0]

In [None]:
bar_occupations=bar_coupon.loc[bar_coupon['occupation'].isin(occupation_list)]
bar_occupations.shape[0]

In [None]:
#comparsion ratio between drivers who go to bars more than once a month and had passengers that were not a kid 
# and had occupations other than farming, fishing, or forestry

bar_passenger_withoutkids.shape[0]/bar_occupations.shape[0]*100

 Summary: More 27% drivers who had an occupations other than farming, fishing, or forestry than drivers who go to bars more than once a month and had passengers that were not a kid than
          

# 6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K. 



In [None]:
# drivers go to bars more than once a month, had passengers that were not a kid, and were not widowed 

#drivers go to bars more than once a month
drivers_go_bar_mthan_once_list=['1~3','4~8','gt8']
drivers_go_bar_mthan_once= bar_coupon.loc[bar_coupon['Bar'].isin(drivers_go_bar_mthan_once_list)]
drivers_go_bar_mthan_once=drivers_go_bar_mthan_once[drivers_go_bar_mthan_once['Y']==1]

#drivers had passengers that were not a kid
list_passenger_withoutkids=['Alone','Friend(s)','Partner']
passenger_withoutkids_and_go_bar_mthan_once=drivers_go_bar_mthan_once.loc[drivers_go_bar_mthan_once['passanger'].isin(list_passenger_withoutkids)]

#drivers were not widowed 
passenger_withoutkids_and_go_bar_mthan_once_maritalStatus_notwidowed=passenger_withoutkids_and_go_bar_mthan_once['maritalStatus']
passenger_withoutkids_and_go_bar_mthan_once_maritalStatus_notwidowed.shape[0]

In [None]:
#drivers go to bars more than once a month and are under the age of 30 

#drivers go to bars more than once a month
drivers_go_bar_mthan_once_list=['1~3','4~8','gt8']
drivers_go_bar_mthan_once= bar_coupon.loc[bar_coupon['Bar'].isin(drivers_go_bar_mthan_once_list)]
drivers_go_bar_mthan_once=drivers_go_bar_mthan_once[drivers_go_bar_mthan_once['Y']==1]

#under the age of 30
drivers_go_bar_mthan_once_under_30=drivers_go_bar_mthan_once.loc[drivers_go_bar_mthan_once['age']<'30']
drivers_go_bar_mthan_once_under_30.shape[0]


In [None]:
#drivers go to cheap restaurants more than 4 times a month and income is less than 50K

#drivers go to cheap restaurants more than 4 times a month
restaurant_mthan4_list=['4~8','gt8']
drivers_go_restaurant_mthan4=bar_coupon.loc[bar_coupon['RestaurantLessThan20'].isin(restaurant_mthan4_list)]

#income is less than 50K
drivers_go_restaurant_mthan4_and_lthan50=drivers_go_restaurant_mthan4.loc[drivers_go_restaurant_mthan4['income']<'50000']
drivers_go_restaurant_mthan4_and_lthan50.shape[0]

Summary: A total number of drivers go to bars more than once a month, had passengers that were not a kid, and were not widowed are 393, total number of drivers go to bars more than once a month and are under the age of 30 are 245, and drivers go to cheap restaurants more than 4 times a month and income is less than 50K 681.

Summary question 6:

From the total survey, 2017 bar coupons were sent to the drivers. Only 827 or 41% from the bar coupons were accpted 
and more than 59% were rejected. From the total accepted bar coupons, 674 were accpted by drivers who went to bar less than 3 or fever and rest (153) accepted by drivers who went to bar more.

drivers who went to bar more once and are over the age of 25 is 296 and drivers went to the bar less than once is 417. 

The total number acceptance drivers who go to bars more than once a month and had passengers that were not a kid is 551 and drivers who had occupations other than farming, fishing, or forestry 2008.

The total number acceptance drivers who go to bars more than once a month, had passengers that were not a kid, and were not widowed is 393, who go to bars more than once a month and are under the age of 30 is 245, and who go to cheap restaurants more than 4 times a month and income is less than 50K 681.


# 7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

Hypothesis:From the survey, more coupons were accpeted by drivers who went to the bar less than 3 or fewer.

The total number acceptance drivers who go to bars more than once a month and had passengers that were not a kid is higher than same drivers who went to bar more once a month and are over the age of 25. Even more drivers with passenger no kid and widowed. 



# Independent Investigation

Doea the weather affects the acceptance coupons?, Doea the gender affects acceptance coupons?,Does it a matter the direction of the driver or how far he/she? what about the temperature?

In [None]:
#grabing bar coupon data
data.head()

In [None]:
#lets choose one coupon and see if the above mentioned questions are affected to the acceptance rate or not

#I choose restaurant Less than 20. Since I guess most drivers can accept the offer.But, lets see if they accpetd or not

In [None]:
data['coupon'].unique()
data['coupon'].value_counts()

In [None]:
RestaurantLessThan20_coupon=data.loc[data['coupon']=='Restaurant(<20)']
RestaurantLessThan20_coupon

In [None]:
sns.barplot(data=RestaurantLessThan20_coupon, x ='gender', y='Y', hue='gender')

In [None]:
RestaurantLessThan20_coupon_only_accepted=RestaurantLessThan20_coupon[RestaurantLessThan20_coupon['Y']==1]
RestaurantLessThan20_coupon_only_accepted.shape[0]

In [None]:
plt.bar(RestaurantLessThan20_coupon['Y'].unique(),RestaurantLessThan20_coupon['Y'].value_counts(), color='green')
plt.xlabel("RestaurantLessThan20_coupon['Y'].unique")
plt.ylabel('counts')
plt.title('accpeted coupons by RestaurantLessThan20')

Summary: From the total number of RestaurantLessThan20 coupons (2786), 1970 where accepted. 

In [None]:
# toCoupon_GEQ5min a 5 min is affected or not
RestaurantLessThan20_coupon['toCoupon_GEQ25min'].unique()
RestaurantLessThan20_coupon['toCoupon_GEQ25min'].value_counts()

In [None]:
px.bar(x=RestaurantLessThan20_coupon['toCoupon_GEQ25min'].unique(),y=RestaurantLessThan20_coupon['toCoupon_GEQ25min'].value_counts())

In [None]:
#weather affected or not
RestaurantLessThan20_coupon['weather'].unique()
RestaurantLessThan20_coupon['weather'].value_counts()

In [None]:
RestaurantLessThan20_weather_sunny=RestaurantLessThan20_coupon[RestaurantLessThan20_coupon['weather']=='Sunny']
RestaurantLessThan20_weather_sunny_accepted=RestaurantLessThan20_weather_sunny[RestaurantLessThan20_weather_sunny['Y']==1]
RestaurantLessThan20_weather_sunny_accepted


A 2240 RestaurantLessThan20 coupons where sent during the sunny time. From the total  1721 where accepted.

In [None]:
data.iplot(kind='')