# Do sales have impact on revenue during the Xmas season?
(or do customers buy no matter what)

In [3]:
# import libary
import pandas as pd 
import numpy as np 
import statsmodels.formula.api as smf

In [6]:
data = pd.read_csv("./Causal inference/Quasi-experiement/xmas_sales.csv")

In [7]:
data.head()

Unnamed: 0,store,weeks_to_xmas,avg_week_sales,is_on_sale,weekly_amount_sold
0,1,3,12.98,1,219.6
1,1,2,12.98,1,184.7
2,1,1,12.98,1,145.75
3,1,0,12.98,0,102.45
4,2,3,19.92,0,103.22


## Spot the confounders
### Causal problem review
treatment = is_on_sale (1 mean treated and 0 mean controled)
outcome = weekly_amount_sold

The causal problem to solve: Do sales increase the amount sold during the Xmas season?

### Confounders review
A large businesses tend to have more promotions (because they can endure the cost) and also tend to have higher revenue.

Another confounder to consider: time of the sales
If sales happen the week before Xmas, customers might ramp their purchase whether or not stores have sales.

________________________________________________
## Pick appropriate statistical methods 
I have data at user (store) level, broken down by 2 groups - treated and controled. So I can use: <br>
- Regression (seriously!) <br>
    If we keep everything (the confounders) the same, if we change the treatement from 0 to 1, how much the outcome change? <br>
    <br>
    weekly_amount_sold = beta0 + beta1 * treatment + beta2 * covariates 
    <br><br>
- Prospensity Score Matching <br>
    Similar to Matching on confounders, but match on the probability the store can have sales. If two stores have the same probability of having sales given the covariates, they are mostly similar (on covariates).
    <br><br>
Of course, there are other methods but these are those that I use the most given their intuitive nature (easier for business partners to consume).

In this analysis, I'll go with Regression method.

In [8]:
data.head()

Unnamed: 0,store,weeks_to_xmas,avg_week_sales,is_on_sale,weekly_amount_sold
0,1,3,12.98,1,219.6
1,1,2,12.98,1,184.7
2,1,1,12.98,1,145.75
3,1,0,12.98,0,102.45
4,2,3,19.92,0,103.22


In [10]:
model = smf.ols('weekly_amount_sold ~ is_on_sale+avg_week_sales+weeks_to_xmas', data=data).fit()

model.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-56.2071,5.026,-11.184,0.000,-66.063,-46.351
is_on_sale,52.0264,2.282,22.797,0.000,47.551,56.502
avg_week_sales,3.7947,0.242,15.671,0.000,3.320,4.270
weeks_to_xmas,37.2359,0.972,38.294,0.000,35.329,39.143


## Result
The result shows that having sales do increase store revenue with an average treatment effect is 52, statistical significant at confidence level of 95%.

ATE = 52

## Regression with debiasing and denoising (Frisch-Waugh-Lovell Theorem)

In [11]:
data.head()

Unnamed: 0,store,weeks_to_xmas,avg_week_sales,is_on_sale,weekly_amount_sold
0,1,3,12.98,1,219.6
1,1,2,12.98,1,184.7
2,1,1,12.98,1,145.75
3,1,0,12.98,0,102.45
4,2,3,19.92,0,103.22


In [12]:
debiasing_model = smf.ols('is_on_sale ~ avg_week_sales + weeks_to_xmas', data=data).fit()
data_debiased = data.assign(sale_residual = debiasing_model.resid)

In [13]:
data_debiased.head()

Unnamed: 0,store,weeks_to_xmas,avg_week_sales,is_on_sale,weekly_amount_sold,sale_residual
0,1,3,12.98,1,219.6,0.601591
1,1,2,12.98,1,184.7,0.683591
2,1,1,12.98,1,145.75,0.765591
3,1,0,12.98,0,102.45,-0.152409
4,2,3,19.92,0,103.22,-0.625373


In [14]:
denoising_model = smf.ols('weekly_amount_sold ~ avg_week_sales + weeks_to_xmas', data= data_debiased).fit()

data_denoised = data_debiased.assign(revenue_residual = denoising_model.resid)

In [15]:
fwl_model = smf.ols('revenue_residual ~ sale_residual', data=data_denoised).fit()

fwl_model.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-2.751e-13,1.066,-2.58e-13,1.000,-2.091,2.091
sale_residual,52.0264,2.281,22.808,0.000,47.553,56.500


## Conclusion
Sales in Xmas season help business increase revenue with average treatment effect of $52,000.