# Solutions for chapter 2 exercises

## Set up

In [1]:
# Common libraries
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
#Loading the data
dat_df = pd.read_csv("AirCnC_MnM_exercises_data.csv")

# 0. The problem as seen by the PM

This preliminary section shows the calculations that the PM ran, for reference.

In [4]:
# The booking rate is lower for customers who have seen the ad
dat_df.groupby('ad').agg(bkg_rate = ('bkd', lambda x: np.mean(x)))

Unnamed: 0_level_0,bkg_rate
ad,Unnamed: 1_level_1
0,0.464139
1,0.448417


0.b. What is the booking rate for customers who have seen the ad, restricting to customers considering an M&M property? Customers who haven’t seen the ad, with the same restriction?

In [7]:
# This remains true even when restricting to customers considering an M&M property
dat_df[(dat_df['mm']==1)].groupby('ad').agg(bkg_rate = ('bkd', lambda x: np.mean(x)))

Unnamed: 0_level_0,bkg_rate
ad,Unnamed: 1_level_1
0,0.932051
1,0.911111


# 1. Understanding the behaviors

1.a. What are the behavioral categories for the variables in the data (Income, Ad, MM, Bkd)?

Income is a personal characteristic.
Ad is a business behavior.
MM is a customer behavior. 
Bkd is a customer behavior.

1.b. What is (are) the goal(s) of the ad?

The goals of the ad are 
- to increase the percentage of customers who consider an M&M property
- to increase the percentage of customers who book an M&M property

In [8]:
# The ad indeed increases the probability that a customer will consider an M&M property
mod_mm = smf.logit('mm ~ ad', data = dat_df)
res_mm = mod_mm.fit()
res_mm.summary()

Optimization terminated successfully.
         Current function value: 0.295525
         Iterations 6


0,1,2,3
Dep. Variable:,mm,No. Observations:,10000.0
Model:,Logit,Df Residuals:,9998.0
Method:,MLE,Df Model:,1.0
Date:,"Fri, 30 Apr 2021",Pseudo R-squ.:,5.536e-05
Time:,07:55:18,Log-Likelihood:,-2955.3
converged:,True,LL-Null:,-2955.4
Covariance Type:,nonrobust,LLR p-value:,0.5673

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-2.3576,0.037,-62.933,0.000,-2.431,-2.284
ad,0.0673,0.117,0.576,0.564,-0.162,0.296


In [9]:
#The ad increases the probability that a customer will book an M&M property
dat_df['bkd_mm'] = dat_df['bkd'] * dat_df['mm'] # Equal to 1 if and only if a customer books an M&M property

mod_bkd_mm = smf.logit('bkd_mm ~ ad', data = dat_df)
res_bkd_mm = mod_bkd_mm.fit()
res_bkd_mm.summary()

Optimization terminated successfully.
         Current function value: 0.280956
         Iterations 6


0,1,2,3
Dep. Variable:,bkd_mm,No. Observations:,10000.0
Model:,Logit,Df Residuals:,9998.0
Method:,MLE,Df Model:,1.0
Date:,"Fri, 30 Apr 2021",Pseudo R-squ.:,2.103e-05
Time:,07:57:20,Log-Likelihood:,-2809.6
converged:,True,LL-Null:,-2809.6
Covariance Type:,nonrobust,LLR p-value:,0.731

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-2.4344,0.039,-62.937,0.000,-2.510,-2.359
ad,0.0420,0.122,0.345,0.730,-0.196,0.281


# 2. Resolving the mystery 

2.a. How does income affect these behaviors?

In [10]:
# Income increases the probability that a customer will consider an M&M property
mod_mm = smf.logit('mm ~ income + ad', data = dat_df)
res_mm = mod_mm.fit()
res_mm.summary()

Optimization terminated successfully.
         Current function value: 0.092007
         Iterations 9


0,1,2,3
Dep. Variable:,mm,No. Observations:,8489.0
Model:,Logit,Df Residuals:,8486.0
Method:,MLE,Df Model:,2.0
Date:,"Fri, 30 Apr 2021",Pseudo R-squ.:,0.6023
Time:,07:58:52,Log-Likelihood:,-781.04
converged:,True,LL-Null:,-1964.1
Covariance Type:,nonrobust,LLR p-value:,0.0

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-5.0019,0.119,-41.925,0.000,-5.236,-4.768
income,9.768e-06,3.47e-07,28.160,0.000,9.09e-06,1.04e-05
ad,0.4934,0.464,1.063,0.288,-0.417,1.403


In [11]:
# Income increases the probability that a customer will book an M&M property
mod_bkd_mm = smf.logit('bkd_mm ~ income + ad', data = dat_df)
res_bkd_mm = mod_bkd_mm.fit()
res_bkd_mm.summary()

Optimization terminated successfully.
         Current function value: 0.064053
         Iterations 9


0,1,2,3
Dep. Variable:,bkd_mm,No. Observations:,8489.0
Model:,Logit,Df Residuals:,8486.0
Method:,MLE,Df Model:,2.0
Date:,"Fri, 30 Apr 2021",Pseudo R-squ.:,0.7007
Time:,07:59:37,Log-Likelihood:,-543.74
converged:,True,LL-Null:,-1816.8
Covariance Type:,nonrobust,LLR p-value:,0.0

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-5.7731,0.161,-35.783,0.000,-6.089,-5.457
income,1.127e-05,4.23e-07,26.658,0.000,1.04e-05,1.21e-05
ad,0.2978,0.725,0.411,0.681,-1.124,1.719


2.b. What is the average income of customers considering an M&M property after seeing the ad? Without seeing the ad? 

In [13]:
# Customers considering an M&M property after seeing the ad have a lower income than customers 
# considering an M&M property without having seen the ad
dat_df.groupby(['ad', 'mm']).agg(avg_income = ('income', lambda x: np.mean(x)))

Unnamed: 0_level_0,Unnamed: 1_level_0,avg_income
ad,mm,Unnamed: 2_level_1
0,0,65419.63
0,1,1098118.0
1,0,22362.17
1,1,19363.8


2.c. Can you now explain to the PM what happened?

The ad was effective at driving more customers to consider an M&M property across the board (i.e. irrespective of income). However, because there are more customers with a lower income than with a higher income, this added proportionately more lower-income customers to the pool of customers considering an M&M property. These lower-income customers have a lower likelihood to book a property, so the average booking rate across customers considering an M&M property decreased. In other words, the mix of customers considering an M&M property changed, but the individual probability that a customer would consider and book an M&M property increased. This ad is a resounding success!