# Data generation process
## Overview

This notebook demonstrates the process of generating some synthetic data for causal inference problems.

The diagram below shows how the profit from a customer is related to other factors.
- Profit is the outcome we are interested in.
- Promotion is believed to have a direct impact on the profit. The impact of a promotion also varies based on the average amount of previous orders.
- Profit is also associated with three factors.
  - The income of a customer.
  - The average amount of previous orders.
  - The number of years since registration.

  
Below is a more formal mathematical representation.
- $ Y = TE * T + a * X_1 + b * X_2 + c * X_3$
- $ TE = d * X_2^2$


```mermaid
%%{init: {'theme':'default'}}%%

flowchart
subgraph  
direction BT
x1((x1 \n income))
x2((x2 \n avg \n order))
x3((x3 \n yrs since \n registration))
y((Y \n profit))
t((T \n promotion))
x1 --a--> y
t --TE--> y
x2 --b--> y
x3 --c--> y
end
```

In [87]:
np.random.randint(low=0, high=2, size=num_samples)

array([0, 0, 1, ..., 1, 0, 0])

In [89]:
import pandas as pd
import numpy as np
import plotly.express as px

num_samples = 10000
np.random.seed(42)
x1 = abs(np.random.normal(loc=3000, scale=1500, size=num_samples))
x2 = abs(np.random.normal(loc=50, scale=10, size=num_samples))
x3 = abs(np.random.randn(num_samples) + 3) + 0.01
T = np.random.randint(low=0, high=2, size=num_samples)

a = 0.01
b = 0.3
c = 3
d = 2.68

TE = d * x2**2 / 1000
Y = TE * T + a * x1 + b * x2 + c * x3

df = pd.DataFrame({'income': x1, 'avg_order': x2, 'yrs': x3, 'promotion': T, 'TE': TE, 'profit': Y})
df.describe().transpose().drop(columns='count').applymap(lambda x: round(x, 2))

Unnamed: 0,mean,std,min,25%,50%,75%,max
income,3022.98,1451.88,1.67,1992.69,2996.11,4006.62,8889.36
avg_order,50.14,10.01,11.44,43.38,50.16,56.94,94.79
yrs,3.0,0.99,0.02,2.31,3.0,3.67,6.7
promotion,0.49,0.5,0.0,0.0,0.0,1.0,1.0
TE,7.0,2.72,0.35,5.04,6.74,8.69,24.08
profit,57.73,15.93,15.44,46.43,57.45,68.44,128.91


In [90]:
df.head()

Unnamed: 0,income,avg_order,yrs,promotion,TE,profit
0,3745.07123,43.215053,3.358286,1,5.005009,65.495096
1,2792.603548,46.945005,3.293324,1,5.906274,57.795782
2,3971.532807,44.026189,2.07348,1,5.194658,64.338284
3,5284.544785,51.10418,3.589584,1,6.999188,85.944643
4,2648.769938,61.971785,1.519917,0,10.292546,49.638987


In [94]:
px.box(df, x='profit', color='promotion', width=800, height=300, title='The distribution of profit split by promotion')

## Treatment effect estimation

In many scenarios, we want to find out the uplift/increase in profit from a customer that is driven by promotions. It is equivalent to estimating the treatment effect of running a promotion on a customer. 

Since the treatment effect varies based on another attribute, it means that the treatment effect is heterogeneous and we want to model conditional average treatment effects (CATE). Being able to do that would enable us to identify the groups of customers who respond positively to promotions, which means that we would be able to run more sophisticated campaigns to maximise the return.

The more professional term describing the process above is **uplift modelling**. There are several algorithms we can use. I will start with meta learner.

### Meta learner

Meta learner is a technique that integrates information from one or multiple machine learning models to generate combined estimate of causal effects. It includes S-learner, T-learner, X-learner.

#### S-learner​
- Train only one model M() using both treated and control samples.
- Takes both the treatment T and features X to predict the outcome.​
- Scoring: the treatment effect is M(X|T=1) - M(X|T=0)
  
<!-- slearner -->
```mermaid
%%{init: {'theme':'default'}}%%
flowchart
subgraph S_learners
direction LR
subgraph training
direction TB
subgraph all_samples
direction TB
X
t
y
end
input_train[input]
input_train-->m0
target_train[target]
X-->input_train
t-->input_train
y-->target_train
target_train-->m0
m0[(model M)]
end

subgraph predicting
direction TB
input[X]
model[(model M)]
t0[X, t=0]
t1[X, t=1]
t0-->model
t1-->model
input-->t0
input-->t1
y0[y0]
y1[y1]
model --predicts-->y0
model --predicts-->y1
cate[treatment effect/uplift = y1 - y0]

y0-->cate
y1-->cate
end
training --> predicting
end
``````
