<a href="https://colab.research.google.com/github/mkulk2025/Driver_Coupon_acceptance/blob/main/5_1AssignmentSubmission.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Required Assignment 5.1: Will the Customer Accept the Coupon?

**Context**

Imagine driving through town and a coupon is delivered to your cell phone for a restaurant near where you are driving. Would you accept that coupon and take a short detour to the restaurant? Would you accept the coupon but use it on a subsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaurant? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?

Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?

**Overview**

The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.

**Data**

This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’.  There are five different types of coupons -- less expensive restaurants (under \$20), coffee houses, carry out & take away, bar, and more expensive restaurants (\$20 - $50).

**Deliverables**

Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons.  To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece.





### Data Description
Keep in mind that these values mentioned below are average values.

The attributes of this data set include:
1. User attributes
    -  Gender: male, female
    -  Age: below 21, 21 to 25, 26 to 30, etc.
    -  Marital Status: single, married partner, unmarried partner, or widowed
    -  Number of children: 0, 1, or more than 1
    -  Education: high school, bachelors degree, associates degree, or graduate degree
    -  Occupation: architecture & engineering, business & financial, etc.
    -  Annual income: less than \\$12500, \\$12500 - \\$24999, \\$25000 - \\$37499, etc.
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she buys takeaway food: 0, less than 1, 1 to 3, 4 to 8 or greater
    than 8
    -  Number of times that he/she goes to a coffee house: 0, less than 1, 1 to 3, 4 to 8 or
    greater than 8
    -  Number of times that he/she eats at a restaurant with average expense less than \\$20 per
    person: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    -  Number of times that he/she goes to a bar: 0, less than 1, 1 to 3, 4 to 8 or greater than 8
    

2. Contextual attributes
    - Driving destination: home, work, or no urgent destination
    - Location of user, coupon and destination: we provide a map to show the geographical
    location of the user, destination, and the venue, and we mark the distance between each
    two places with time of driving. The user can see whether the venue is in the same
    direction as the destination.
    - Weather: sunny, rainy, or snowy
    - Temperature: 30F, 55F, or 80F
    - Time: 10AM, 2PM, or 6PM
    - Passenger: alone, partner, kid(s), or friend(s)


3. Coupon attributes
    - time before it expires: 2 hours or one day

In [3]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

### Problems

Use the prompts below to get started with your data analysis.  

1. Read in the `coupons.csv` file.




In [5]:
data = pd.read_csv('/content/sample_data/coupons.csv')

In [6]:
data = pd.read_csv('/content/sample_data/coupons.csv')
data.head()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


2. Investigate the dataset for missing or problematic data.

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# import and understand the coupons data

df = pd.read_csv('/content/sample_data/coupons.csv')

column_counts = df.info()

destination             12684
passanger               12684
weather                 12684
temperature             12684
time                    12684
coupon                  12684
expiration              12684
gender                  12684
age                     12684
maritalStatus           12684
has_children            12684
education               12684
occupation              12684
income                  12684
car                       108
Bar                     12577
CoffeeHouse             12467
CarryAway               12533
RestaurantLessThan20    12554
Restaurant20To50        12495
toCoupon_GEQ5min        12684
toCoupon_GEQ15min       12684
toCoupon_GEQ25min       12684
direction_same          12684
direction_opp           12684
Y                       12684
dtype: int64


In [13]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# import and understand the coupons data shape

df = pd.read_csv('/content/sample_data/coupons.csv')

df.shape

(12684, 26)

In [14]:
#The objective of thise code is to check for the null values.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# finding null values

df = pd.read_csv('/content/sample_data/coupons.csv')


null_counts = df.isna().sum()
print("\nNull counts per column:\n", null_counts)


Null counts per column:
 destination                 0
passanger                   0
weather                     0
temperature                 0
time                        0
coupon                      0
expiration                  0
gender                      0
age                         0
maritalStatus               0
has_children                0
education                   0
occupation                  0
income                      0
car                     12576
Bar                       107
CoffeeHouse               217
CarryAway                 151
RestaurantLessThan20      130
Restaurant20To50          189
toCoupon_GEQ5min            0
toCoupon_GEQ15min           0
toCoupon_GEQ25min           0
direction_same              0
direction_opp               0
Y                           0
dtype: int64


In [15]:
#find duplicate rows

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

df = pd.read_csv('/content/sample_data/coupons.csv')

# Find duplicate rows
all_duplicates = df[df.duplicated(keep=False)]
#print('\nAll duplicate rows including first occurance:\n', all_duplicates)

#remove all the duplictes
df_no_duplicates = df.drop_duplicates()

df_no_duplicates.count()
print(df_no_duplicates)
#print("\nDataFrame without duplicates:\n", df_no_duplicates)

           destination  passanger weather  temperature  time  \
0      No Urgent Place      Alone   Sunny           55   2PM   
1      No Urgent Place  Friend(s)   Sunny           80  10AM   
2      No Urgent Place  Friend(s)   Sunny           80  10AM   
3      No Urgent Place  Friend(s)   Sunny           80   2PM   
4      No Urgent Place  Friend(s)   Sunny           80   2PM   
...                ...        ...     ...          ...   ...   
12679             Home    Partner   Rainy           55   6PM   
12680             Work      Alone   Rainy           55   7AM   
12681             Work      Alone   Snowy           30   7AM   
12682             Work      Alone   Snowy           30   7AM   
12683             Work      Alone   Sunny           80   7AM   

                      coupon expiration  gender age      maritalStatus  ...  \
0            Restaurant(<20)         1d  Female  21  Unmarried partner  ...   
1               Coffee House         2h  Female  21  Unmarried partner  .

3. Decide what to do about your missing data -- drop, replace, other...

In [None]:
# Since the car column has very high number of missing data rows -12576 this column can be discarded from the analysis.

4. What proportion of the total observations chose to accept the coupon?



In [17]:
#The objetive of this code is to calculate % of accepted coupons after removing duplicates, nulll
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

df = pd.read_csv('/content/sample_data/coupons.csv')

# Find duplicate rows
all_duplicates = df[df.duplicated(keep=False)]
#print('\nAll duplicate rows including first occurance:\n', all_duplicates)

#remove all the duplictes

df = df.drop_duplicates()

#calculate the % of coupon accepted after removing the duplicates

(df[df["Y"] == 1].shape[0] / df.shape[0]) * 100


56.75654242664552

5. Use a bar plot to visualize the `coupon` column.

In [22]:
#the objective of this code is to plot different types of coupons offered to the customers to undestand the data. Column= Coupon

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

df = pd.read_csv('/content/sample_data/coupons.csv')

# Find duplicate rows
all_duplicates = df[df.duplicated(keep=False)]

#remove all the duplictes

df = df.drop_duplicates()

#remove all the null

df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

coupon_column_barplot = px.bar(df_nonull["coupon"].value_counts()).update_layout(xaxis_title = "Plot#1: Offered Coupon Type", yaxis_title = "Coupon Counts",
                                                                                  title = "Total Number of offered coupon types",
                                                                                  title_x = 0.5, showlegend = False)

#save the plot under /content/sample_data
plt.savefig('/content/sample_data/Coupon_distribution_bar_graph.png')
coupon_column_barplot

<Figure size 640x480 with 0 Axes>

6. Use a histogram to visualize the temperature column.

In [23]:
# histogram to plot temp column
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')


all_duplicates = df[df.duplicated(keep=False)]

df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]



temp_histogram = px.histogram(df_nonull["temperature"], nbins = 10).update_layout(xaxis_title = "Temperature (F)", yaxis_title = "Count",
                                                                                  title = " Temperatures count distribution ",
                                                                                  title_x = 0.5, showlegend = False)

plt.savefig('/content/sample_data/temp_histogram_graph.png')
temp_histogram


<Figure size 640x480 with 0 Axes>

**Investigating the Bar Coupons**

Now, we will lead you through an exploration of just the bar related coupons.  

1. Create a new `DataFrame` that contains just the bar coupons.


In [18]:
# Creating a new dataframe from the original csv file filtering on bar coupon
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')


all_duplicates = df[df.duplicated(keep=False)]

df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_bar = df_nonull[df_nonull["coupon"] == "Bar"]

df_bar.shape

(1906, 26)

2. What proportion of bar coupons were accepted?


In [16]:
# Creating a new dataframe from the original csv file filtering on bar coupon
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')


all_duplicates = df[df.duplicated(keep=False)]

df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_bar = df_nonull[df_nonull["coupon"] == "Bar"]

(df_bar[df_bar["Y"] == 1].shape[0] / df_bar.shape[0]) * 100

41.185729275970616

3. Compare the acceptance rate between those who went to a bar 3 or fewer times a month to those who went more.


In [22]:
# Part#1 : first determine the acceptance for less than 3 category.  Which includes 3 segemnts - never (788), less1(546), and 1~3(379) counts.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')

#remove duplictes

all_duplicates = df[df.duplicated(keep=False)]
df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_bar = df_nonull[df_nonull["coupon"] == "Bar"]

#based on the df_bar["Bar"].value_counts() noted in below section include all under 3
# creating a less than3 dataset using OR for to combine the 3 segments

lessthan3_bar_acceptance = df_bar[(df_bar["Bar"] == "never") | (df_bar["Bar"] == "less1") | (df_bar["Bar"] == "1~3")]

(lessthan3_bar_acceptance[lessthan3_bar_acceptance["Y"] == 1].shape[0] / lessthan3_bar_acceptance.shape[0]) * 100






37.244600116754235

In [25]:
# Part#2 :   Next- calculate the acceptance for category who went more than 3 which include 4~8 (147) and gt8(46)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')

#remove duplictes

all_duplicates = df[df.duplicated(keep=False)]
df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_bar = df_nonull[df_nonull["coupon"] == "Bar"]


#based on the df_bar["Bar"].value_counts() noted in below section greater than 3 will  include 4~8 (147) and gt8(46)


morethan3_acceptance = df_bar[(df_bar["Bar"] == "4~8") | (df_bar["Bar"] == "gt8")]

(morethan3_acceptance[morethan3_acceptance["Y"] == 1].shape[0] / morethan3_acceptance.shape[0]) * 100



76.16580310880829

4. Compare the acceptance rate between drivers who go to a bar more than once a month and are over the age of 25 to the all others.  Is there a difference?


In [13]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')

#remove duplictes

all_duplicates = df[df.duplicated(keep=False)]
df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_bar = df_nonull[df_nonull["coupon"] == "Bar"]

# create a datset for the bar going more than one & age > 25

#morethan1_acceptance_25Plus = df_bar[((df_bar["Bar"] == "1~3") | (df_bar["Bar"] == "4~8") | (df_bar["Bar"] == "gt8") ) & (df_bar["age"] != "21") & (df_bar["age"] != "below21")  ]

morethan1_acceptance_25Plus = df_bar[((df_bar["Bar"] == "1~3") | (df_bar["Bar"] == "4~8") | (df_bar["Bar"] == "gt8") )   ]

#morethan1_acceptance_25Plus["age"].value_counts() Verifying all are above 25


(morethan1_acceptance_25Plus[morethan1_acceptance_25Plus["Y"] == 1].shape[0] / morethan1_acceptance_25Plus.shape[0]) * 100


68.53146853146853

In [1]:
#objective of this code is determine aceptance rate of "all other"  than "more than once a month and are over the age of 25 " which technically is ['never' 'less1' nan ] nan are the blanks which were remove.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')

#remove duplictes

all_duplicates = df[df.duplicated(keep=False)]
df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_bar = df_nonull[df_nonull["coupon"] == "Bar"]



# create a datset for the bar going more than one & age > 25. Since this is a bar the below 21 group is not allowed and hence not considered.


allother_bar = df_bar[((df_bar["Bar"] == "never") | (df_bar["Bar"] == "less1")  )]

allother_bar_no25plus = allother_bar[(allother_bar["age"] != "21") ]


allother_bar_no25plus["age"].value_counts()

(allother_bar_no25plus[allother_bar_no25plus["Y"] == 1].shape[0] / allother_bar_no25plus.shape[0]) * 100

27.645985401459853

5. Use the same process to compare the acceptance rate between drivers who go to bars more than once a month and had passengers that were not a kid and had occupations other than farming, fishing, or forestry.


In [7]:
#Note: I think this question is confusing. It is not worded clearly what we are comparing.
#part1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')

#remove duplictes

all_duplicates = df[df.duplicated(keep=False)]
df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_bar = df_nonull[df_nonull["coupon"] == "Bar"]


#dataset for the


morethan1 = df_bar[((df_bar["Bar"] == "1~3") | (df_bar["Bar"] == "4~8") | (df_bar["Bar"] == "gt8") ) & (df_bar["age"] != "below21")  ]


# morethan1.shape = 562

# create a datset for the bar going more than once and has a passanger but not a kid pasanger

morethan1_nokids_noalone = df_bar[((df_bar["Bar"] == "1~3") | (df_bar["Bar"] == "4~8") | (df_bar["Bar"] == "gt8") ) & (df_bar["passanger"] != "Kid(s)")  & (df_bar["passanger"] != "Alone") ]


 #removing farm, fish and forestry occupation

morethan1_nokids_nofarmfishfore = morethan1_nokids_noalone[morethan1_nokids_noalone["occupation"] != "Farming Fishing & Forestry"]

 #acceptance
(morethan1_nokids_nofarmfishfore[morethan1_nokids_nofarmfishfore["Y"] == 1].shape[0] / morethan1_nokids_nofarmfishfore.shape[0]) * 100



71.42857142857143

In [8]:
# part2: Calculating the acceptance rate for all other other then "morethan1_nokids_nofarmfishfore"
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')

#remove duplictes

all_duplicates = df[df.duplicated(keep=False)]
df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_bar = df_nonull[df_nonull["coupon"] == "Bar"]


#dataset for the


morethan1 = df_bar[((df_bar["Bar"] == "1~3") | (df_bar["Bar"] == "4~8") | (df_bar["Bar"] == "gt8") ) & (df_bar["age"] != "below21")  ]


# morethan1.shape = 562

# create a datset for the bar going more than once and has a passanger but not a kid pasanger

morethan1_nokids_noalone = df_bar[((df_bar["Bar"] == "1~3") | (df_bar["Bar"] == "4~8") | (df_bar["Bar"] == "gt8") ) & (df_bar["passanger"] != "Kid(s)")  & (df_bar["passanger"] != "Alone") ]


 #removing farm, fish and forestry occupation

morethan1_nokids_nofarmfishfore = morethan1_nokids_noalone[morethan1_nokids_noalone["occupation"] != "Farming Fishing & Forestry"]

# clarificaiton for simplicity- lets call the above dataset - morethan1_nokids_nofarmfishfore as part1 for the below calculation

allother_nopart1 = df_bar.loc[df_bar.index.difference(morethan1_nokids_nofarmfishfore, sort = False)]

(allother_nopart1[allother_nopart1["Y"] == 1].shape[0] / allother_nopart1.shape[0]) * 100


41.185729275970616

6. Compare the acceptance rates between those drivers who:

- go to bars more than once a month, had passengers that were not a kid, and were not widowed *OR*
- go to bars more than once a month and are under the age of 30 *OR*
- go to cheap restaurants more than 4 times a month and income is less than 50K.



In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')

#remove duplictes

all_duplicates = df[df.duplicated(keep=False)]
df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_bar = df_nonull[df_nonull["coupon"] == "Bar"]

# part1: creating a subset as defined in the question with the 3 conditions.
part1 = df_bar[((df_bar["Bar"].isin(["1~3", "4~8", "gt8"])) &
                                               (df_bar["passanger"].isin(["Friend(s)", "Partner"])) &
                                               (df_bar["maritalStatus"] != "Widowed")) |

                                              ((df_bar["Bar"].isin(["1~3", "4~8", "gt8"])) &
                                               (df_bar["age"].isin(["below21", "21", "26"]))) |

                                              ((df_bar["RestaurantLessThan20"].isin(["4~8", "gt8"])) &
                                               (df_bar["income"].isin(["Less than $12500", "$12500 - $24999", "$25000 - $37499", "$37500 - $49999"])))]

#calculate the aceptance

(part1[part1["Y"] == 1].shape[0] / part1.shape[0]) * 100


56.69291338582677

In [12]:
#part2: calculating the acceptance for remaining ( other than the dataset for the above 3 conditions)

#part1 code from the above
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')

#remove duplictes

all_duplicates = df[df.duplicated(keep=False)]
df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_bar = df_nonull[df_nonull["coupon"] == "Bar"]

# part1: creating a subset as defined in the question with the 3 conditions.
part1 = df_bar[((df_bar["Bar"].isin(["1~3", "4~8", "gt8"])) &
                                               (df_bar["passanger"].isin(["Friend(s)", "Partner"])) &
                                               (df_bar["maritalStatus"] != "Widowed")) |

                                              ((df_bar["Bar"].isin(["1~3", "4~8", "gt8"])) &
                                               (df_bar["age"].isin(["below21", "21", "26"]))) |

                                              ((df_bar["RestaurantLessThan20"].isin(["4~8", "gt8"])) &
                                               (df_bar["income"].isin(["Less than $12500", "$12500 - $24999", "$25000 - $37499", "$37500 - $49999"])))]


part2 = df_bar.loc[df_bar.index.difference(part1, sort = False)]

(part2[part2["Y"] == 1].shape[0] / part2.shape[0]) * 100



41.185729275970616

7.  Based on these observations, what do you hypothesize about drivers who accepted the bar coupons?

Overall, drivers accept bar coupons less than half the time (41.2%). However, acceptance rates significantly increase with bar visitation frequency and specific demographic factors. Drivers who visit bars more than three times a month show a dramatically higher acceptance rate of 76.2%, more than double the rate of less frequent visitors (37.3%). Similarly, drivers over 25 who visit bars more than once a month exhibit a 68.3% acceptance rate, and those meeting specific criteria—frequent bar visits, no child passengers, and non-agricultural occupations—show a 71.4% acceptance rate. These findings suggest that targeting frequent bar-goers and those with certain demographic profiles can significantly increase bar coupon redemption.

### Independent Investigation

Using the bar coupon example as motivation, you are to explore one of the other coupon groups and try to determine the characteristics of passengers who accept the coupons.  

In [10]:
#investigate bar coupon acceptance by income

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')

#remove duplictes

all_duplicates = df[df.duplicated(keep=False)]
df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_CH = df_nonull[df_nonull["coupon"] == "Coffee House"]


df_CH_acceted = df_CH[df_CH["Y"] == 1]


CH_acceptance_perincome = px.histogram(df_CH_acceted["income"], nbins = 10).update_layout(xaxis_title = "income", yaxis_title = "Count",
                                                                                  title = " CoffeeHouse coupon accetpance distribution by income ",
                                                                                  title_x = 0.5, showlegend = False)


plt.savefig('/content/sample_data/CH_acceptance_perincome_perincome_graph.png')
CH_acceptance_perincome


<Figure size 640x480 with 0 Axes>

In [21]:
#Compare the acceptance rate between drivers who go to a coffee house more than once a month and are below21 to the all others. Is there a difference?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')

#remove duplictes

all_duplicates = df[df.duplicated(keep=False)]
df = df.drop_duplicates()

#remove all the null


df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_CH = df_nonull[df_nonull["coupon"] == "Coffee House"]


# create a datset for the coffee shop going more than one & age == below21

morethan1_acceptance_below21 = df_CH[((df_CH["CoffeeHouse"] == "1~3") | (df_CH["CoffeeHouse"] == "4~8") | (df_CH["CoffeeHouse"] == "gt8") ) &  (df_CH["age"] != "below21") ]

morethan1_acceptance_below21.value_counts()

(morethan1_acceptance_below21[morethan1_acceptance_below21["Y"] == 1].shape[0] / morethan1_acceptance_below21.shape[0]) * 100


65.33637400228051

In [24]:
#objective of this code is determine acceptance rate of "all other"  than "more than once a month and are not below21 " which technically is ['never' 'less1' nan ] nan are the blanks which were remove.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
df = pd.read_csv('/content/sample_data/coupons.csv')

#remove duplictes

all_duplicates = df[df.duplicated(keep=False)]
df = df.drop_duplicates()

#remove all the null
df_nonull = df[df["Bar"].notnull() & df["CoffeeHouse"].notnull() &
                  df["CarryAway"].notnull() & df["RestaurantLessThan20"].notnull() &
                 df["Restaurant20To50"].notnull()]

df_CH = df_nonull[df_nonull["coupon"] == "Coffee House"]

# create a datset for the bar going more than one & age > 25. Since this is a bar the below 21 group is not allowed and hence not considered.


allother_CH = df_CH[((df_CH["CoffeeHouse"] == "never") | (df_CH["CoffeeHouse"] == "less1")  )]

allother_CH_no21below = allother_CH[(allother_CH["age"] != "below21") ]

(allother_CH_no21below[allother_CH_no21below["Y"] == 1].shape[0] / allother_CH_no21below.shape[0]) * 100

33.73430962343097