# UCB Driver Coupon Analysis

## Assignment Instructions

For detailed assignment instructions, data description, and analysis problems, please refer to:
**[Assignment Instructions](docs/assignment-instructions.md)**

This notebook contains the implementation and analysis code for the coupon acceptance study.

In [1]:
## print("Hello World!") #!/usr/bin/env python3
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import src.helpers as helpers

## Data Loading and Initial Exploration

Refer to [Assignment Instructions](docs/assignment-instructions.md) for detailed problem descriptions.

### 1. Read in the data

In [2]:
data = pd.read_csv('data/coupons.csv')
data.head(5)

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
0,No Urgent Place,Alone,Sunny,55,2PM,Restaurant(<20),1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,1
1,No Urgent Place,Friend(s),Sunny,80,10AM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,0,0,0,1,0
2,No Urgent Place,Friend(s),Sunny,80,10AM,Carry out & Take away,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,1
3,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,2h,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0
4,No Urgent Place,Friend(s),Sunny,80,2PM,Coffee House,1d,Female,21,Unmarried partner,...,never,,4~8,1~3,1,1,0,0,1,0


### 2. Investigate the dataset for missing or problematic data

In [3]:
data.describe()

Unnamed: 0,temperature,has_children,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
count,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0,12684.0
mean,63.301798,0.414144,1.0,0.561495,0.119126,0.214759,0.785241,0.568433
std,19.154486,0.492593,0.0,0.496224,0.32395,0.410671,0.410671,0.495314
min,30.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,55.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0
50%,80.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0
75%,80.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0
max,80.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar                   12577 non-null

In [5]:
rows = data.shape[0]
columns = data.shape[1]
#import importlib
#import src.helpers
#help(helpers.draw_bar_plot_column_null_values)

helpers.draw_bar_plot_column_null_values(data_frame=data, save_path='images/missing_values.png')

In [6]:
data.car.value_counts()

car
Scooter and motorcycle                      22
Mazda5                                      22
do not drive                                22
crossover                                   21
Car that is too old to install Onstar :D    21
Name: count, dtype: int64

### 3. Explore the data for null records and operate on them

In [7]:
data.drop(columns=['car'], inplace=True)

helpers.draw_bar_plot_column_null_values(data_frame=data, save_path='images/missing_values_after_dropping_car.png')

### 4. Proportion of observations that accepted coupons

*Implementation needed - see assignment instructions for details*

In [8]:
accept_summary = data.groupby('Y').size().reset_index(name='counts')
fig = px.pie(accept_summary, names='Y', values='counts', title='How many coupons were accepted vs. not accepted')
fig.show()



In [9]:
coupon_summary = data.groupby('coupon').size().reset_index(name='counts')
#fig = px.bar(coupon_summary, x='coupon', y='counts', title='Coupon Distribution by Type', labels={'coupon': 'Coupon Type', 'counts': 'Number of Records'})
fig = px.bar_polar(coupon_summary, r='counts', theta='coupon', title='Coupon Distribution by Type', labels={'coupon': 'Coupon Type', 'counts': 'Number of Records'}, color='counts', color_continuous_scale=px.colors.sequential.Tealgrn)
fig.write_image('images/coupon_counts_by_type_bar_polar.png')
fig.show()

In [10]:
df = data[['temperature', 'has_children', 'direction_same', 'direction_opp', 'Y']]
df.corr()

Unnamed: 0,temperature,has_children,direction_same,direction_opp,Y
temperature,1.0,-0.019716,0.097085,-0.097085,0.06124
has_children,-0.019716,1.0,-0.03162,0.03162,-0.045557
direction_same,0.097085,-0.03162,1.0,-1.0,0.01457
direction_opp,-0.097085,0.03162,-1.0,1.0,-0.01457
Y,0.06124,-0.045557,0.01457,-0.01457,1.0


### 4.a Convert text (enum) values to numeric to find a correlation map

In [11]:
numeric_df, mappings = helpers.convert_categorical_to_numeric(data)
mappings

{'destination': {'No Urgent Place': 1, 'Home': 2, 'Work': 3},
 'passanger': {'Alone': 1, 'Friend(s)': 2, 'Partner': 3, 'Kid(s)': 4},
 'weather': {'Sunny': 1, 'Snowy': 2, 'Rainy': 3},
 'time': {'6PM': 1, '7AM': 2, '10AM': 3, '2PM': 4, '10PM': 5},
 'coupon': {'Coffee House': 1,
  'Restaurant(<20)': 2,
  'Carry out & Take away': 3,
  'Bar': 4,
  'Restaurant(20-50)': 5},
 'expiration': {'1d': 1, '2h': 2},
 'gender': {'Female': 1, 'Male': 2},
 'age': {'21': 1,
  '26': 2,
  '31': 3,
  '50plus': 4,
  '36': 5,
  '41': 6,
  '46': 7,
  'below21': 8},
 'maritalStatus': {'Married partner': 1,
  'Single': 2,
  'Unmarried partner': 3,
  'Divorced': 4,
  'Widowed': 5},
 'education': {'Some college - no degree': 1,
  'Bachelors degree': 2,
  'Graduate degree (Masters or Doctorate)': 3,
  'Associates degree': 4,
  'High School Graduate': 5,
  'Some High School': 6},
 'occupation': {'Unemployed': 1,
  'Student': 2,
  'Computer & Mathematical': 3,
  'Sales & Related': 4,
  'Education&Training&Library': 5

In [12]:
numeric_df.corr()

Unnamed: 0,destination,passanger,weather,temperature,time,coupon,expiration,gender,age,maritalStatus,...,CoffeeHouse,CarryAway,RestaurantLessThan20,Restaurant20To50,toCoupon_GEQ5min,toCoupon_GEQ15min,toCoupon_GEQ25min,direction_same,direction_opp,Y
destination,1.0,-0.558556,0.106435,-0.134751,-0.381406,0.117994,-0.067697,-0.002871,-0.012043,0.024601,...,0.015181,0.000721,-0.005708,0.006771,,0.026947,0.414993,0.433947,-0.433947,-0.119311
passanger,-0.558556,1.0,-0.0783,0.06604,0.226892,0.014848,0.072109,-0.038311,0.049969,-0.131156,...,-0.026098,-0.021091,-0.01533,-0.026533,,0.121877,-0.217041,-0.286984,0.286984,0.036345
weather,0.106435,-0.0783,1.0,-0.434497,0.068345,0.111445,-0.017702,0.027003,0.015781,0.010187,...,0.016374,0.035439,-0.007897,-0.006447,,0.121698,0.202572,-0.017712,0.017712,-0.0988
temperature,-0.134751,0.06604,-0.434497,1.0,-0.106854,-0.142489,0.12409,-0.025504,-0.025559,0.018055,...,-0.013964,-0.031635,0.013813,0.014617,,-0.155332,-0.216254,0.097085,-0.097085,0.06124
time,-0.381406,0.226892,0.068345,-0.106854,1.0,0.040806,-0.004884,0.001065,0.009641,-0.000423,...,0.006031,0.008757,-0.002418,-0.001936,,0.075805,-0.108282,-0.21651,0.21651,0.005892
coupon,0.117994,0.014848,0.111445,-0.142489,0.040806,1.0,-0.207902,0.000518,0.021064,-0.015986,...,0.008744,0.016056,-0.005951,-0.013563,,0.124398,0.088273,0.024315,-0.024315,-0.061538
expiration,-0.067697,0.072109,-0.017702,0.12409,-0.004884,-0.207902,1.0,-0.001264,0.006603,-0.021294,...,-0.007321,-0.001318,-0.003818,-0.003579,,0.04274,-0.032977,0.033584,-0.033584,-0.12992
gender,-0.002871,-0.038311,0.027003,-0.025504,0.001065,0.000518,-0.001264,1.0,-0.020596,-0.040501,...,0.053615,-0.014544,0.031177,0.013428,,-0.007028,0.002743,-0.004496,0.004496,0.043969
age,-0.012043,0.049969,0.015781,-0.025559,0.009641,0.021064,0.006603,-0.020596,1.0,-0.070822,...,0.005342,0.028921,-0.006353,-0.045907,,0.02729,-0.004622,-0.007767,0.007767,-0.016235
maritalStatus,0.024601,-0.131156,0.010187,0.018055,-0.000423,-0.015986,-0.021294,-0.040501,-0.070822,1.0,...,0.032908,-0.031722,0.010198,0.045011,,-0.034704,0.009361,0.022836,-0.022836,0.006716


In [42]:
#numeric_df.columns.tolist()
#sns.pairplot(numeric_df, hue='Y', vars=['destination', 'passanger', 'weather', 'temperature', 'time', 'Y'])
#px.scatter_matrix(numeric_df, dimensions=['destination', 'passanger', 'weather', 'temperature', 'time'], color='Y', title='Scatter Matrix of Numeric Features Colored by Coupon Acceptance')
#px.scatter(data, x='temperature', y='passanger', color='coupon', size='Y', title='Temperature vs Time Colored by Coupon Acceptance')
#numeric_df.groupby('Y')[['temperature', 'weather']].sum() / numeric_df.groupby('Y')[['temperature', 'weather']].count()
numeric_df.groupby('Y')[['passanger', 'destination']].describe()


Unnamed: 0_level_0,passanger,passanger,passanger,passanger,passanger,passanger,passanger,passanger,destination,destination,destination,destination,destination,destination,destination,destination
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
Y,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
0,5474.0,1.628608,0.96872,1.0,1.0,1.0,2.0,4.0,5474.0,1.867373,0.83103,1.0,1.0,2.0,3.0,3.0
1,7210.0,1.696949,0.900889,1.0,1.0,1.0,2.0,4.0,7210.0,1.6681,0.814009,1.0,1.0,1.0,2.0,3.0


### 5. Bar plot visualization of coupon column

*Implementation needed - see assignment instructions for details*

In [None]:
px.bar(coupon_summary, x='coupon', y='counts', title='Coupon Distribution', labels={'coupon': 'Coupon Type', 'counts': 'Number of Records'})


### 6. Histogram of temperature column

*Implementation needed - see assignment instructions for details*

## Bar Coupons Investigation

*Refer to [Assignment Instructions](docs/assignment-instructions.md) for detailed analysis steps*

### Analysis Steps 1-7

*Implementation needed for each step outlined in the assignment instructions*

## Independent Investigation

*Refer to [Assignment Instructions](docs/assignment-instructions.md) for guidance on exploring other coupon groups*