In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import OneHotEncoder as SklearnOneHotEncoder
from statsmodels.regression.linear_model import OLS

### Using linear regression with dummy variables for promo testing



Dependent variable - $Y$ - our target  for prediction - final check value

Independet Variable - $X$ - the feature - (our promo)

Promo is a categorical variable which we will transform into a dummy variable.

${\alpha}$ will represent a change (delta) in the average check.

${\beta}$ is just the average bill for all orders without promo.


The main formula for linear regrission will be:
$$
Y = {\alpha} * X + {\beta}
$$

With dummy variable we need to re-write linear regression, because we want it to represent the expected value of Y at some value of X.  

$$
E[Y | X] = {\alpha} * X + {\beta}
$$

Also when using dummay variable for the promo, a purchase with a promo will be binary (with promo = 1, and without = 0).

So when we bought something with a promo, X=1, and we multiply 1 by the check value. When X=0, our increase in check value ${\alpha}$ is multiplied by 0, and at the end we have only ${\beta}$.

So in these cases the forimula from above will look like this: 

$$
E[Y | X=1] = {\alpha} + {\beta}
$$

$$
E[Y | X=0] = {\beta}
$$


${\beta}$ will essentially be the average of all checks without a promo. 

In [56]:
#Entire class to transform categorical feature into a dummy (binary) variable. 
#There is a standard OneHotEncoder in the sklearn.preprocessing but it will leave new columns without names,
#so we have to make a new one 
class OneHotEncoder(SklearnOneHotEncoder):
    def __init__(self, **kwargs):
        super(OneHotEncoder, self).__init__(**kwargs)
        self.fit_flag = False

    def fit(self, X, **kwargs):
        out = super().fit(X)
        self.fit_flag = True
        return out

    def transform(self, X, **kwargs):
        sparse_matrix = super(OneHotEncoder, self).transform(X)
        new_columns = self.get_new_columns(X=X)
        d_out = pd.DataFrame(sparse_matrix.toarray(), columns=new_columns, index=X.index)
        return d_out

    def fit_transform(self, X, **kwargs):
        self.fit(X)
        return self.transform(X)

    def get_new_columns(self, X):
        new_columns = []
        for i, column in enumerate(X.columns):
            j = 0
            while j < len(self.categories_[i]):
                new_columns.append(f'{column}_<{self.categories_[i][j]}>')
                j += 1
        return new_columns

In [57]:
df = pd.read_csv('one_promo_df.csv', index_col=[0])
df.head(20)

Unnamed: 0,order_id,order_value,promo_type
0,89014417,22,no_promo
1,89027235,37,no_promo
2,88979766,27,no_promo
3,89065392,30,no_promo
4,88992397,32,no_promo
5,89054226,25,no_promo
6,89019462,30,no_promo
7,89004871,25,no_promo
8,89040172,21,no_promo
9,89040144,31,no_promo


In [58]:
df.promo_type.unique()

array(['no_promo', 'SALE15'], dtype=object)

In [59]:
encoder = OneHotEncoder()


In [60]:
encoder.fit_transform(df[['promo_type']]) #getting new columns from the original promo_type column unique values

Unnamed: 0,promo_type_<SALE15>,promo_type_<no_promo>
0,0.0,1.0
1,0.0,1.0
2,0.0,1.0
3,0.0,1.0
4,0.0,1.0
...,...,...
99995,0.0,1.0
99996,0.0,1.0
99997,0.0,1.0
99998,0.0,1.0


We should drop promo_type_<no_promo> column because we'll a 3d coeficient (3d feature which won't give us anything new). If we keep it we will get alpha, beta (which is a free member) and a 3d unkown which will drag our linear regression.   

In [61]:
#now we can assign our variables and drop no_promo column
X = encoder.fit_transform(
    df[['promo_type']])\
    .drop('promo_type_<no_promo>', axis=1)\
    .assign(aov=1)  #aov adds a free member coeficient of 1 into regression. OLS can't evaluate intercept if it's not added 
                    #(in sklearn fit intercept). So when aov = 1 

Y = df['order_value']

In [62]:
estimator = OLS(Y, X).fit()

In linear regression we are testing the following:

H0: that coeficient is equal to 0

H1: coeficient is not = 0

In [63]:
print(estimator.summary())

                            OLS Regression Results                            
Dep. Variable:            order_value   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                  0.000
Method:                 Least Squares   F-statistic:                     1.166
Date:                Sun, 02 Oct 2022   Prob (F-statistic):              0.280
Time:                        13:46:47   Log-Likelihood:            -3.5255e+05
No. Observations:               99969   AIC:                         7.051e+05
Df Residuals:                   99967   BIC:                         7.051e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
promo_type_<SALE15>    -0.0938    

Good thing about this linear regression is we don't need to look at MAPE, and don't need to maximize R^2. We only need to look at statistical significance of our coefficients: 

promo_type_SALE15 p-values is 0.28 which is much higher than our significance level, meaning we can't reject H0. So this coefficient is essentially = 0. Our dispersion is somewhat high and confidence interval includes 0 (-0.264 to 0.076). This means that promo has very low effect on the mean check value (which is good for us!). If we see an increase in in number of orders in total when we are offering this promo, we can say that this is a good promo because it increases number of orders but doesn't decrease (has very low effect) our mean order value. However, it is also possible that our promo is just used on a small amount of orders. 
    
aov P-values is statistically significant meaning it is not a random feature.

-----------
### What if we have multiple promos?

In [64]:
df_multi = pd.read_csv('multiple_promo_df.csv', index_col=[0])
df_multi.head()

Unnamed: 0,gmv,title,delivery_discount,surge_increment,order_id
0,22,SALE15,0,0,768977643
1,44,LUCKY,1,0,768977644
2,26,SUMMER,0,0,768977645
3,26,no_promo,0,0,768977646
4,39,no_promo,0,0,768977647


gmv - order value

title - promo code

delivery_discount - wheather ther is or no discount on delivery

surge_increment - this is when our delivery cost increased 

order_id - self-explanatory

In [65]:
df_multi.query('title == "no_promo"').shape[0] / df_multi.shape[0]

0.5904545516019529

59% of orders have no promotions, and therefore 41% with

In [66]:
encoder.fit_transform(df_multi[['title']]).head(3)

Unnamed: 0,title_<LUCKY>,title_<SALE15>,title_<SORRY>,title_<SUMMER>,title_<TAKE30>,title_<WINTER>,title_<no_promo>
0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,1.0,0.0,0.0,0.0


In [67]:
#again we don't need the title_<no_promo> column because it doesn't add anything new (you can derive it from all other columns)

In [68]:
Ym = df_multi['gmv']

Xm = encoder.fit_transform(df_multi[['title']])\
    .drop('title_<no_promo>', axis=1)\
    .assign(aov=1)

Xm['delivery_discount'] = df_multi['delivery_discount']
Xm['surge_increment'] = df_multi['surge_increment']

In [69]:
Xm.head(5)

Unnamed: 0,title_<LUCKY>,title_<SALE15>,title_<SORRY>,title_<SUMMER>,title_<TAKE30>,title_<WINTER>,aov,delivery_discount,surge_increment
0,0.0,1.0,0.0,0.0,0.0,0.0,1,0,0
1,1.0,0.0,0.0,0.0,0.0,0.0,1,1,0
2,0.0,0.0,0.0,1.0,0.0,0.0,1,0,0
3,0.0,0.0,0.0,0.0,0.0,0.0,1,0,0
4,0.0,0.0,0.0,0.0,0.0,0.0,1,0,0


In [70]:
Ym.head(5)

0    22
1    44
2    26
3    26
4    39
Name: gmv, dtype: int64

In [71]:
estimator_multi = OLS(Ym, Xm).fit()

In [72]:
print(estimator_multi.summary())

                            OLS Regression Results                            
Dep. Variable:                    gmv   R-squared:                       0.058
Model:                            OLS   Adj. R-squared:                  0.058
Method:                 Least Squares   F-statistic:                     2844.
Date:                Sun, 02 Oct 2022   Prob (F-statistic):               0.00
Time:                        13:46:58   Log-Likelihood:            -1.3137e+06
No. Observations:              369705   AIC:                         2.627e+06
Df Residuals:                  369696   BIC:                         2.628e+06
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
title_<LUCKY>        -4.7214      0.05

Here we see that for all features p-value is 0.00 and is statistically significant. 

We can see that all promos lower the average check value. Sometimes it's ok. We can see that for example for SALE15 code our average order value dropped by USD3.39 which is less than expected USD4.465 (see below). So it's a decent promo. Meaning we a gaining USD1.075 from this promo on each order.
On average we also give USD2 discount for delivery, and our delivery price surges by USD0.83.

In [76]:
#Our average 15% discount should be around $4.46
Ym.mean() * 0.15

4.465749178398994