# Data Exploration

The purpose of this notebook is to perform exploratory data analysis tasks to provide insights into the data that can further lead us towards understanding user/offer type match. This notebook is built using the cleaned data sets from the Data Wrangling notebooks.

In this analysis, I treat the other 2 offer types, bogo and discount, separately. At this point, we do not know whether or not users who are responsive to bogo offers are also responsive to discount offers, hence, we should analyze data for these two categories of offers separately. 

For each offer type, I compare Respond Yes group to Respond No group using these guidance questions and common assumptions:
1. Do users who respond yes to offers also spend more on the app in general?
2. How is income correlated with total spending? Do users with higher spending also have higher income? 
3. How is age related to response?
4. How is number of transactions related to response? (Assumption: user who is already more active on the app would be more likely to respond yes to offer)
5. Do users who respond yes also have more reward points? (Assumption: users who are engaged and motivated to earn more points would be more likely to respond yes to offers)
6. How does a user's membership length affect their response?
7. Does a user's gender affect response?
8. Does a user's average spending amount per transaction affect their response?

Note: Although "informational" offer type is included in the dataset, we should remove it from the data analysis. "Informational" offers have difficulty value of 0, which means that they do not cost Starbucks anything to send out, so we assume that Starbucks should send informational offers to _all_ of its reward users to inform them of new products. 

## Import libraries and data sets

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

# read in the csv files
offer_response = pd.read_csv('data/offer_response.csv')
user = pd.read_csv('data/profile_cleaned.csv')

In [2]:
# Merge profile and number of offers completed
df = pd.merge(offer_response, 
             user, 
             how='left', 
             on=['person'])

In [3]:
# TODO: explore NaNs

In [4]:
df = df.dropna().replace({True:1,False:0})

## BOGO - Data Exploration

In [5]:
bogo_df = df[df.offer_type == 'bogo']

In [6]:
bogo_df.describe()

Unnamed: 0,completed_offer,age,income,membership_length,total_spending,response_rate,total_transactions,avg_spent_per_transaction,total_rewards
count,14941.0,14941.0,14941.0,14941.0,14941.0,14941.0,14941.0,14941.0,14941.0
mean,0.605314,55.052005,67525.332976,1161.338799,146.347012,0.590831,9.645004,17.288406,16.351315
std,0.488799,16.96914,21416.72591,417.575322,137.748989,0.24072,5.27972,16.212098,9.480823
min,0.0,18.0,30000.0,587.0,5.28,0.166667,1.0,1.311667,2.0
25%,0.0,44.0,52000.0,833.0,60.78,0.4,6.0,8.74375,9.0
50%,1.0,56.0,66000.0,1061.0,118.42,0.6,9.0,16.26,15.0
75%,1.0,67.0,83000.0,1438.0,187.52,0.8,13.0,22.408571,23.0
max,1.0,101.0,120000.0,2410.0,1608.69,1.0,36.0,301.31,55.0


## Discount - Data Exploration

In [None]:
discount_df = df[df.offer_type == 'discount']