# Starbucks Capstone Challenge

### Introduction

This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. 

Not all users receive the same offer, and that is the challenge to solve with this data set.

Your task is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You'll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.

You'll be given transactional data showing user purchases made on the app including the timestamp of purchase and the amount of money spent on a purchase. This transactional data also has a record for each offer that a user receives as well as a record for when a user actually views the offer. There are also records for when a user completes an offer. 

Keep in mind as well that someone using the app might make a purchase through the app without having received an offer or seen an offer.

### Example

To give an example, a user could receive a discount offer buy 10 dollars get 2 off on Monday. The offer is valid for 10 days from receipt. If the customer accumulates at least 10 dollars in purchases during the validity period, the customer completes the offer.

However, there are a few things to watch out for in this data set. Customers do not opt into the offers that they receive; in other words, a user can receive an offer, never actually view the offer, and still complete the offer. For example, a user might receive the "buy 10 dollars get 2 dollars off offer", but the user never opens the offer during the 10 day validity period. The customer spends 15 dollars during those ten days. There will be an offer completion record in the data set; however, the customer was not influenced by the offer because the customer never viewed the offer.

### Cleaning

This makes data cleaning especially important and tricky.

You'll also want to take into account that some demographic groups will make purchases even if they don't receive an offer. From a business perspective, if a customer is going to make a 10 dollar purchase without an offer anyway, you wouldn't want to send a buy 10 dollars get 2 dollars off offer. You'll want to try to assess what a certain demographic group will buy when not receiving any offers.

### Final Advice

Because this is a capstone project, you are free to analyze the data any way you see fit. For example, you could build a machine learning model that predicts how much someone will spend based on demographics and offer type. Or you could build a model that predicts whether or not someone will respond to an offer. Or, you don't need to build a machine learning model at all. You could develop a set of heuristics that determine what offer you should send to each customer (i.e., 75 percent of women customers who were 35 years old responded to offer A vs 40 percent from the same demographic to offer B, so send offer A).

# Data Sets

The data is contained in three files:

* portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
* profile.json - demographic data for each customer
* transcript.json - records for transactions, offers received, offers viewed, and offers completed




## 1. Import and Load

In this section we import the needed libraries and load the three datasets: portfolio, profile and transcript

In [1]:
import pandas as pd
import numpy as np
import math
import json
% matplotlib inline

# read in the json files
portfolio = pd.read_json('data/portfolio.json', orient='records', lines=True)
profile = pd.read_json('data/profile.json', orient='records', lines=True)
transcript = pd.read_json('data/transcript.json', orient='records', lines=True)

## 2. Exploring Data

In this section we explore the three datasets and analize their characteristics

**2.1 portfolio.json**
* id (string) - offer id
* offer_type (string) - type of offer ie BOGO, discount, informational
* difficulty (int) - minimum required spend to complete an offer
* reward (int) - reward given for completing an offer
* duration (int) - time for offer to be open, in days
* channels (list of strings)


In [2]:
portfolio

Unnamed: 0,channels,difficulty,duration,id,offer_type,reward
0,"[email, mobile, social]",10,7,ae264e3637204a6fb9bb56bc8210ddfd,bogo,10
1,"[web, email, mobile, social]",10,5,4d5c57ea9a6940dd891ad53e9dbe8da0,bogo,10
2,"[web, email, mobile]",0,4,3f207df678b143eea3cee63160fa8bed,informational,0
3,"[web, email, mobile]",5,7,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5
4,"[web, email]",20,10,0b1e1539f2cc45b7b9fa7c272da2e1d7,discount,5
5,"[web, email, mobile, social]",7,7,2298d6c36e964ae4a3e7e9706d1fb8c2,discount,3
6,"[web, email, mobile, social]",10,10,fafdcd668e3743c1bb461111dcafc2a4,discount,2
7,"[email, mobile, social]",0,3,5a8bc65990b245e5a138643cd4eb9837,informational,0
8,"[web, email, mobile, social]",5,5,f19421c1d4aa40978ebb69ca19b0e20d,bogo,5
9,"[web, email, mobile]",10,7,2906b810c7d4411798c6938adc9daaa5,discount,2


In [3]:
portfolio.shape

(10, 6)

In [4]:
portfolio.describe()

Unnamed: 0,difficulty,duration,reward
count,10.0,10.0,10.0
mean,7.7,6.5,4.2
std,5.831905,2.321398,3.583915
min,0.0,3.0,0.0
25%,5.0,5.0,2.0
50%,8.5,7.0,4.0
75%,10.0,7.0,5.0
max,20.0,10.0,10.0


In [5]:
portfolio.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 6 columns):
channels      10 non-null object
difficulty    10 non-null int64
duration      10 non-null int64
id            10 non-null object
offer_type    10 non-null object
reward        10 non-null int64
dtypes: int64(3), object(3)
memory usage: 560.0+ bytes


there are no null vaues in this dataset

In [6]:
portfolio.groupby('offer_type').count()

Unnamed: 0_level_0,channels,difficulty,duration,id,reward
offer_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
bogo,4,4,4,4,4
discount,4,4,4,4,4
informational,2,2,2,2,2


**2.2 profile.json**
* age (int) - age of the customer 
* became_member_on (int) - date when customer created an app account
* gender (str) - gender of the customer (note some entries contain 'O' for other rather than M or F)
* id (str) - customer id
* income (float) - customer's income

In [7]:
profile.head()

Unnamed: 0,age,became_member_on,gender,id,income
0,118,20170212,,68be06ca386d4c31939f3a4f0e3dd783,
1,55,20170715,F,0610b486422d4921ae7d2bf64640c50b,112000.0
2,118,20180712,,38fe809add3b4fcf9315a9694bb96ff5,
3,75,20170509,F,78afa995795e4d85b5d9ceeca43f5fef,100000.0
4,118,20170804,,a03223e636434f42ac4c3df47e8bac43,


In [8]:
profile.shape

(17000, 5)

In [9]:
profile.describe()

Unnamed: 0,age,became_member_on,income
count,17000.0,17000.0,14825.0
mean,62.531412,20167030.0,65404.991568
std,26.73858,11677.5,21598.29941
min,18.0,20130730.0,30000.0
25%,45.0,20160530.0,49000.0
50%,58.0,20170800.0,64000.0
75%,73.0,20171230.0,80000.0
max,118.0,20180730.0,120000.0


In [10]:
profile.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17000 entries, 0 to 16999
Data columns (total 5 columns):
age                 17000 non-null int64
became_member_on    17000 non-null int64
gender              14825 non-null object
id                  17000 non-null object
income              14825 non-null float64
dtypes: float64(1), int64(2), object(2)
memory usage: 664.1+ KB


there are a few null values in the gender and income variables, 2175 in each variable

In [11]:
profile.groupby('gender').count()

Unnamed: 0_level_0,age,became_member_on,id,income
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
F,6129,6129,6129,6129
M,8484,8484,8484,8484
O,212,212,212,212


**2.3 transcript.json**
* event (str) - record description (ie transaction, offer received, offer viewed, etc.)
* person (str) - customer id
* time (int) - time in hours since start of test. The data begins at time t=0
* value - (dict of strings) - either an offer id or transaction amount depending on the record


In [12]:
transcript.head()

Unnamed: 0,event,person,time,value
0,offer received,78afa995795e4d85b5d9ceeca43f5fef,0,{'offer id': '9b98b8c7a33c4b65b9aebfe6a799e6d9'}
1,offer received,a03223e636434f42ac4c3df47e8bac43,0,{'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'}
2,offer received,e2127556f4f64592b11af22de27a7932,0,{'offer id': '2906b810c7d4411798c6938adc9daaa5'}
3,offer received,8ec6ce2a7e7949b1bf142def7d0e0586,0,{'offer id': 'fafdcd668e3743c1bb461111dcafc2a4'}
4,offer received,68617ca6246f4fbc85e91a2a49552598,0,{'offer id': '4d5c57ea9a6940dd891ad53e9dbe8da0'}


In [13]:
transcript.shape

(306534, 4)

In [14]:
transcript.describe()

Unnamed: 0,time
count,306534.0
mean,366.38294
std,200.326314
min,0.0
25%,186.0
50%,408.0
75%,528.0
max,714.0


In [15]:
transcript.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 306534 entries, 0 to 306533
Data columns (total 4 columns):
event     306534 non-null object
person    306534 non-null object
time      306534 non-null int64
value     306534 non-null object
dtypes: int64(1), object(3)
memory usage: 9.4+ MB


there are no null values in this dataset

In [16]:
transcript.groupby('event').count()

Unnamed: 0_level_0,person,time,value
event,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
offer completed,33579,33579,33579
offer received,76277,76277,76277
offer viewed,57725,57725,57725
transaction,138953,138953,138953


## 3. Data Cleaning and Preparing

In this section we clean and prepare the data 

**3.1 Portfolio**

This section transforms the portfolio Dataframe, renaming variables and creating new dummy variables

In [17]:
portfolio.head()

Unnamed: 0,channels,difficulty,duration,id,offer_type,reward
0,"[email, mobile, social]",10,7,ae264e3637204a6fb9bb56bc8210ddfd,bogo,10
1,"[web, email, mobile, social]",10,5,4d5c57ea9a6940dd891ad53e9dbe8da0,bogo,10
2,"[web, email, mobile]",0,4,3f207df678b143eea3cee63160fa8bed,informational,0
3,"[web, email, mobile]",5,7,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5
4,"[web, email]",20,10,0b1e1539f2cc45b7b9fa7c272da2e1d7,discount,5


In [18]:
def clean_portfolio(df = portfolio):
    
    '''
    clean_portfolio creates a new DataFrame from the original portfolio where we have a dummy variable 
    for each possible channel
    
    INPUT
    df: in this case the original portfolio DataFrame
    
    OUTPUT
    dfportfolio: the new DataFrame with the newly created dummy variables
    
    '''

    dummies = pd.get_dummies(df.channels.apply(pd.Series).stack()).sum(level = 0)
    dfportfolio = pd.concat([df, dummies], axis = 1)
    dfportfolio = dfportfolio.drop('channels', axis = 1)
    dfportfolio.rename(columns={'id':'offer_id'}, inplace = True)
    
    return dfportfolio

In [19]:
dfportfolio = clean_portfolio(portfolio)
dfportfolio.head()

Unnamed: 0,difficulty,duration,offer_id,offer_type,reward,email,mobile,social,web
0,10,7,ae264e3637204a6fb9bb56bc8210ddfd,bogo,10,1,1,1,0
1,10,5,4d5c57ea9a6940dd891ad53e9dbe8da0,bogo,10,1,1,1,1
2,0,4,3f207df678b143eea3cee63160fa8bed,informational,0,1,1,0,1
3,5,7,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5,1,1,0,1
4,20,10,0b1e1539f2cc45b7b9fa7c272da2e1d7,discount,5,1,0,0,1


**3.2 Profile**

This section transforms the profile DataFrame, renaming and fixing variables

In [20]:
profile.head()

Unnamed: 0,age,became_member_on,gender,id,income
0,118,20170212,,68be06ca386d4c31939f3a4f0e3dd783,
1,55,20170715,F,0610b486422d4921ae7d2bf64640c50b,112000.0
2,118,20180712,,38fe809add3b4fcf9315a9694bb96ff5,
3,75,20170509,F,78afa995795e4d85b5d9ceeca43f5fef,100000.0
4,118,20170804,,a03223e636434f42ac4c3df47e8bac43,


In [21]:
def clean_profile(df = profile):
    
    '''
    clean_profile creates a new DataFrame from the original profile where some variables are fixed and 
    rename for merging purposes 
    
    INPUT
    df: in this case the original profile DataFrame
    
    OUTPUT
    dfprofile: the new DataFrame with the fixed and renamed variables
    
    '''
    
    dfprofile = profile
    dfprofile['became_member_on'] = pd.to_datetime(profile['became_member_on'], format='%Y%m%d')
    dfprofile.rename(columns={'id':'person'}, inplace = True)
    
    return dfprofile

In [22]:
dfprofile = clean_profile(profile)
dfprofile.head()

Unnamed: 0,age,became_member_on,gender,person,income
0,118,2017-02-12,,68be06ca386d4c31939f3a4f0e3dd783,
1,55,2017-07-15,F,0610b486422d4921ae7d2bf64640c50b,112000.0
2,118,2018-07-12,,38fe809add3b4fcf9315a9694bb96ff5,
3,75,2017-05-09,F,78afa995795e4d85b5d9ceeca43f5fef,100000.0
4,118,2017-08-04,,a03223e636434f42ac4c3df47e8bac43,


In [23]:
dfprofile.shape

(17000, 5)

**3.3 Transcript**

This section transforms the transcript DataFrame

In [24]:
transcript.head()

Unnamed: 0,event,person,time,value
0,offer received,78afa995795e4d85b5d9ceeca43f5fef,0,{'offer id': '9b98b8c7a33c4b65b9aebfe6a799e6d9'}
1,offer received,a03223e636434f42ac4c3df47e8bac43,0,{'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'}
2,offer received,e2127556f4f64592b11af22de27a7932,0,{'offer id': '2906b810c7d4411798c6938adc9daaa5'}
3,offer received,8ec6ce2a7e7949b1bf142def7d0e0586,0,{'offer id': 'fafdcd668e3743c1bb461111dcafc2a4'}
4,offer received,68617ca6246f4fbc85e91a2a49552598,0,{'offer id': '4d5c57ea9a6940dd891ad53e9dbe8da0'}


In [25]:
value = transcript['value'].apply(pd.Series)
value.head()

Unnamed: 0,offer id,amount,offer_id,reward
0,9b98b8c7a33c4b65b9aebfe6a799e6d9,,,
1,0b1e1539f2cc45b7b9fa7c272da2e1d7,,,
2,2906b810c7d4411798c6938adc9daaa5,,,
3,fafdcd668e3743c1bb461111dcafc2a4,,,
4,4d5c57ea9a6940dd891ad53e9dbe8da0,,,


In [26]:
value['o_id'] = np.where(value['offer id'].isnull() & value['offer_id'].notnull(),value['offer_id'],value['offer id'])
value.head()

Unnamed: 0,offer id,amount,offer_id,reward,o_id
0,9b98b8c7a33c4b65b9aebfe6a799e6d9,,,,9b98b8c7a33c4b65b9aebfe6a799e6d9
1,0b1e1539f2cc45b7b9fa7c272da2e1d7,,,,0b1e1539f2cc45b7b9fa7c272da2e1d7
2,2906b810c7d4411798c6938adc9daaa5,,,,2906b810c7d4411798c6938adc9daaa5
3,fafdcd668e3743c1bb461111dcafc2a4,,,,fafdcd668e3743c1bb461111dcafc2a4
4,4d5c57ea9a6940dd891ad53e9dbe8da0,,,,4d5c57ea9a6940dd891ad53e9dbe8da0


In [27]:
df1 = pd.concat([transcript, value['o_id']], axis=1)
df2 = pd.concat([df1, value['amount']], axis=1)
dftranscript = pd.concat([df2, value['reward']], axis=1)
dftranscript.rename(columns={'o_id':'offer_id'}, inplace = True)
dftranscript = dftranscript.drop('value', axis = 1)

dftranscript.head()

Unnamed: 0,event,person,time,offer_id,amount,reward
0,offer received,78afa995795e4d85b5d9ceeca43f5fef,0,9b98b8c7a33c4b65b9aebfe6a799e6d9,,
1,offer received,a03223e636434f42ac4c3df47e8bac43,0,0b1e1539f2cc45b7b9fa7c272da2e1d7,,
2,offer received,e2127556f4f64592b11af22de27a7932,0,2906b810c7d4411798c6938adc9daaa5,,
3,offer received,8ec6ce2a7e7949b1bf142def7d0e0586,0,fafdcd668e3743c1bb461111dcafc2a4,,
4,offer received,68617ca6246f4fbc85e91a2a49552598,0,4d5c57ea9a6940dd891ad53e9dbe8da0,,


## 4. Merging Data

In this section we merge the three DataFrames that resulted from the cleaning and preparing process:
* dfportfolio
* dfprofile
* dftranscript



In [28]:
t1 = dftranscript.merge(dfprofile, on = ['person'])

final_table = t1.merge(dfportfolio, how = 'left', on = ['offer_id'])

final_table

Unnamed: 0,event,person,time,offer_id,amount,reward_x,age,became_member_on,gender,income,difficulty,duration,offer_type,reward_y,email,mobile,social,web
0,offer received,78afa995795e4d85b5d9ceeca43f5fef,0,9b98b8c7a33c4b65b9aebfe6a799e6d9,,,75,2017-05-09,F,100000.0,5.0,7.0,bogo,5.0,1.0,1.0,0.0,1.0
1,offer viewed,78afa995795e4d85b5d9ceeca43f5fef,6,9b98b8c7a33c4b65b9aebfe6a799e6d9,,,75,2017-05-09,F,100000.0,5.0,7.0,bogo,5.0,1.0,1.0,0.0,1.0
2,transaction,78afa995795e4d85b5d9ceeca43f5fef,132,,19.89,,75,2017-05-09,F,100000.0,,,,,,,,
3,offer completed,78afa995795e4d85b5d9ceeca43f5fef,132,9b98b8c7a33c4b65b9aebfe6a799e6d9,,5.0,75,2017-05-09,F,100000.0,5.0,7.0,bogo,5.0,1.0,1.0,0.0,1.0
4,transaction,78afa995795e4d85b5d9ceeca43f5fef,144,,17.78,,75,2017-05-09,F,100000.0,,,,,,,,
5,offer received,78afa995795e4d85b5d9ceeca43f5fef,168,5a8bc65990b245e5a138643cd4eb9837,,,75,2017-05-09,F,100000.0,0.0,3.0,informational,0.0,1.0,1.0,1.0,0.0
6,offer viewed,78afa995795e4d85b5d9ceeca43f5fef,216,5a8bc65990b245e5a138643cd4eb9837,,,75,2017-05-09,F,100000.0,0.0,3.0,informational,0.0,1.0,1.0,1.0,0.0
7,transaction,78afa995795e4d85b5d9ceeca43f5fef,222,,19.67,,75,2017-05-09,F,100000.0,,,,,,,,
8,transaction,78afa995795e4d85b5d9ceeca43f5fef,240,,29.72,,75,2017-05-09,F,100000.0,,,,,,,,
9,transaction,78afa995795e4d85b5d9ceeca43f5fef,378,,23.93,,75,2017-05-09,F,100000.0,,,,,,,,


## 5. Data Analysis

In this section we analyze the resulting data and develop heuristics in order to determine which demographic groups respond best to which offer type

**5.1 Gender**

In [29]:
final_table.groupby('gender')['event'].count()

gender
F    113101
M    155690
O      3971
Name: event, dtype: int64

Women were involved in 113101 events, men were involved in 155690 events and others were involved in 3971 events 

In [30]:
F_proportion  = final_table.groupby('gender')['event'].count()[0] / final_table.shape[0]*100
M_proportion = final_table.groupby('gender')['event'].count()[1] / final_table.shape[0]*100
O_proportion = final_table.groupby('gender')['event'].count()[2] / final_table.shape[0]*100

F_proportion ,M_proportion ,O_proportion

(36.896722712651773, 50.790450651477492, 1.2954517280301696)

That means men were 50.79% of events, women 36.89% and other 1.29%

In [31]:
final_table.groupby(['gender', 'event'])['event'].count()

gender  event          
F       offer completed    15477
        offer received     27456
        offer viewed       20786
        transaction        49382
M       offer completed    16466
        offer received     38129
        offer viewed       28301
        transaction        72794
O       offer completed      501
        offer received       916
        offer viewed         773
        transaction         1781
Name: event, dtype: int64

In [32]:
# We want to know the number of offers received, viewed and completed for each gender as well as the number of transactions

# Females
F_offers_received = final_table.groupby(['gender', 'event'])['event'].count()[1]
F_offers_viewed = final_table.groupby(['gender', 'event'])['event'].count()[2]
F_offers_completed = final_table.groupby(['gender', 'event'])['event'].count()[0]
F_transactions = final_table.groupby(['gender', 'event'])['event'].count()[3]

# Males
M_offers_received = final_table.groupby(['gender', 'event'])['event'].count()[5]
M_offers_viewed = final_table.groupby(['gender', 'event'])['event'].count()[6]
M_offers_completed = final_table.groupby(['gender', 'event'])['event'].count()[4]
M_transactions = final_table.groupby(['gender', 'event'])['event'].count()[7]

# Others
O_offers_received = final_table.groupby(['gender', 'event'])['event'].count()[9]
O_offers_viewed = final_table.groupby(['gender', 'event'])['event'].count()[10]
O_offers_completed = final_table.groupby(['gender', 'event'])['event'].count()[8]
O_transactions = final_table.groupby(['gender', 'event'])['event'].count()[11]

In [33]:
# With the previews information we can create rates that describe the behavior of different genders regarding offers

F_offers_C_V = F_offers_completed / F_offers_viewed
F_offers_C_R = F_offers_completed / F_offers_received
F_offers_V_R = F_offers_viewed / F_offers_received

M_offers_C_V = M_offers_completed / M_offers_viewed
M_offers_C_R = M_offers_completed / M_offers_received
M_offers_V_R = M_offers_viewed / M_offers_received

O_offers_C_V = O_offers_completed / O_offers_viewed
O_offers_C_R = O_offers_completed / O_offers_received
O_offers_V_R = O_offers_viewed / O_offers_received

F_offers_C_V, F_offers_C_R, F_offers_V_R, M_offers_C_V, M_offers_C_R, M_offers_V_R, O_offers_C_V, O_offers_C_R, O_offers_V_R

(0.74458770326181078,
 0.56370192307692313,
 0.75706585081585076,
 0.58181689692943717,
 0.43184977313855594,
 0.74224343675417659,
 0.64812419146183697,
 0.54694323144104806,
 0.84388646288209612)

Women are the gender that complete the most offers in proportion with the offers viewed and the offers received. Women complete 74.45% of the offers viewed and 56.37% of the offers received, compared to men with 58% and 43% respectively and others with 64% and 54% respectively.

In [34]:
final_table.groupby('gender')['amount'].sum()

gender
F    863695.00
M    844890.86
O     26356.54
Name: amount, dtype: float64

Even though more events involve men and there are more over 2000 more men than women in the sample, women are the gender that spend the most. Overall women expenditures were $863695.00$ compared to $844890.68$ and $26356$ from men and others respectively.

In [35]:
final_table.groupby('gender')['reward_x'].count()

gender
F    15477
M    16466
O      501
Name: reward_x, dtype: int64

Interestingly enough, even though women spend more and women complete more offers, men gain more rewards than women do.

In [36]:
final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()

gender  event            offer_type   
F       offer completed  bogo              7501
                         discount          7976
        offer received   bogo             10975
                         discount         10943
                         informational     5538
        offer viewed     bogo              9143
                         discount          7733
                         informational     3910
M       offer completed  bogo              7512
                         discount          8954
        offer received   bogo             15208
                         discount         15354
                         informational     7567
        offer viewed     bogo             12581
                         discount         10431
                         informational     5289
O       offer completed  bogo               245
                         discount           256
        offer received   bogo               354
                         discount           367
 

In [37]:
# To expand the previous analysis, we want to know the how many offers were received, viewed and completed for each offer type

# Females
F_bogo_offers_received = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[2]
F_bogo_offers_viewed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[5]
F_bogo_offers_completed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[0]

F_discount_offers_received = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[3]
F_discount_offers_viewed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[6]
F_discount_offers_completed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[1]

# Males
M_bogo_offers_received = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[10]
M_bogo_offers_viewed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[13]
M_bogo_offers_completed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[8]

M_discount_offers_received = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[11]
M_discount_offers_viewed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[14]
M_discount_offers_completed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[9]

# Others
O_bogo_offers_received = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[18]
O_bogo_offers_viewed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[21]
O_bogo_offers_completed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[16]

O_discount_offers_received = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[19]
O_discount_offers_viewed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[22]
O_discount_offers_completed = final_table.groupby(['gender', 'event', 'offer_type'])['event'].count()[17]


F_bogo_offers_received, F_bogo_offers_viewed, F_bogo_offers_completed, F_discount_offers_received, F_discount_offers_viewed, F_discount_offers_completed, M_bogo_offers_received, M_bogo_offers_viewed, M_bogo_offers_completed, M_discount_offers_received, M_discount_offers_viewed, M_discount_offers_completed, O_bogo_offers_received, O_bogo_offers_viewed, O_bogo_offers_completed, O_discount_offers_received, O_discount_offers_viewed, O_discount_offers_completed

(10975,
 9143,
 7501,
 10943,
 7733,
 7976,
 15208,
 12581,
 7512,
 15354,
 10431,
 8954,
 354,
 315,
 245,
 367,
 297,
 256)

In [38]:
F_bogo_offers_C_V = F_bogo_offers_completed / F_bogo_offers_viewed
F_bogo_offers_C_R = F_bogo_offers_completed / F_bogo_offers_received
F_bogo_offers_V_R = F_bogo_offers_viewed / F_bogo_offers_received

F_discount_offers_C_V = F_discount_offers_completed / F_discount_offers_viewed
F_discount_offers_C_R = F_discount_offers_completed / F_discount_offers_received
F_discount_offers_V_R = F_discount_offers_viewed / F_discount_offers_received

M_bogo_offers_C_V = M_bogo_offers_completed / M_bogo_offers_viewed
M_bogo_offers_C_R = M_bogo_offers_completed / M_bogo_offers_received
M_bogo_offers_V_R = M_bogo_offers_viewed / M_bogo_offers_received

M_discount_offers_C_V = M_discount_offers_completed / M_discount_offers_viewed
M_discount_offers_C_R = M_discount_offers_completed / M_discount_offers_received
M_discount_offers_V_R = M_discount_offers_viewed / M_discount_offers_received

O_bogo_offers_C_V = O_bogo_offers_completed / O_bogo_offers_viewed
O_bogo_offers_C_R = O_bogo_offers_completed / O_bogo_offers_received
O_bogo_offers_V_R = O_bogo_offers_viewed / O_bogo_offers_received

O_discount_offers_C_V = O_discount_offers_completed / O_discount_offers_viewed
O_discount_offers_C_R = O_discount_offers_completed / O_discount_offers_received
O_discount_offers_V_R = O_discount_offers_viewed / O_discount_offers_received

F_bogo_offers_C_V, F_bogo_offers_C_R, F_bogo_offers_V_R, F_discount_offers_C_V, F_discount_offers_C_R, F_discount_offers_V_R, M_bogo_offers_C_V, M_bogo_offers_C_R, M_bogo_offers_V_R, M_discount_offers_C_V, M_discount_offers_C_R, M_discount_offers_V_R, O_bogo_offers_C_V, O_bogo_offers_C_R, O_bogo_offers_V_R, O_discount_offers_C_V, O_discount_offers_C_R, O_discount_offers_V_R

(0.82040905610849835,
 0.6834624145785877,
 0.83307517084282456,
 1.0314237682658736,
 0.72886776935026953,
 0.70666179292698528,
 0.59709085128368178,
 0.49395055234087321,
 0.82726196738558655,
 0.85840283769533121,
 0.58317050931353398,
 0.67936694021101995,
 0.77777777777777779,
 0.69209039548022599,
 0.88983050847457623,
 0.86195286195286192,
 0.6975476839237057,
 0.80926430517711168)

For women, they complete more discount offers in proportion of received offers 72.88% than bogo offers 68.34%, never the less they complete more discount offers than the discount offers they view. This means they don't need the offer to spend the needed amonts

For men, they complete a lot more discount offers proportionally than bogo offers. Men complete 85.84% of the discount offers they see compared to only 59.70% of bogo offers they see

For others, they complete  69.20% and 69.75% of bogo offers viewed and discount offers viewed respectively, which is a very similar ratio

## 6. Conclusions

In this section we present conclusions of the results

After the analysis made with the data we can conclude that discount offers are more effective for men than bogo offers. Meanwhile, bogo offers would be more adequate for women, since women complete more discount offers that the offers they see, which means that they don't need the incentive of discounts to spend. Finally, for other genders there is no clear answer due to the fact that they have very similar offer completed rates for both discount and bogo offers. This could be a a possible improvement for this project, to analyze and study the case for other genders. 