# Almost Perfect: A Discussion on Quasi-Experiments Techniques

Quasi-experiments are experiments that leverage the principle from randomized tests, but are not equivalent

Any technique that can be used to estimate causal effects from observational data can be used to extract the causal effect from an quasi-experiment. The use of these causal inferences techniques in quasi-experiments is the reduction in the variance and bias of the calculated ATT (or ATE), similarly to the effect these techniques in randomized experiments. 

However, one of the biggest problems with using causal inference techniques is that they inevitably rely on assumptions about the causal links between variables. While there are advancements in causal discovery, in practice one never consider all possible configurations between cofounding, treatment, and target variables. Instead, we basically always create a Directed Acyclic Graph (DAG) to lay out the causal relationships in such way that the scientists behind, their piers, and clients are satisfied with.



"CUPED is just linear regression using a pre-experimental covariate."[2]

Following, we give a quick overview of the methods we cover in this benchmark, for a better in-depth understading of each method, we provide multiple contents where you can learn more about them

TL:DR:
- the best technique is XXXXXXX
- but it is still worse than when using an ensemble of (XXXXXXXXX) by XXXXXXX
- backtest with historical data to assess accuracy of ATT estimating model
- you can use previous randomized tests to calibrate hyperparameters (and possible even the parameters themselves) of your models

# Techniques Overview

## Matching + Differences-in-Differences (CausalPy)

### Propensity Score

### Mahalanobis Distance

## (Augmented) Synthetic Control (CausalPy & GeoLift)

## Meta-Learners (CausalML)
    
## Double ML (EconML)

## Uplift-Trees (CausalML)

## Do Method (DoWhy)

# Comparisons
## Methodology

## Datasets
- [Iowa Licor Sales](https://www.kaggle.com/datasets/residentmario/iowa-liquor-sales)
- [Wallmart Dataset](https://www.kaggle.com/datasets/yasserh/walmart-dataset)
- [Supermarket Sales](https://www.kaggle.com/datasets/aungpyaeap/supermarket-sales)
- [Superstore Sales Dataset](https://www.kaggle.com/datasets/rohitsahoo/sales-forecasting)
- [Lifetime Value](https://www.kaggle.com/datasets/baetulo/lifetime-value)

## Example: Iowa Licor Sales


# Hacks: Improving your models

## Backtest using historic data

## Calibrate using previous randomized tests

## Don't limit yourself with just one model
Similar to how in typical machine-learning contests the winning contestant usually consists of an ensemble model of distinct methodologies (e.g. neural-networks and tree-based models), we also reduce performance of ATT when using multiple models. Below is a comparison between using either XXXXXXX or XXXXX to using both.

# References
1) [Causal Inference, The Mixtape](https://mixtape.scunning.com)
2) [Causality, Judea Pearl](https://www.amazon.co.uk/Causality-Judea-Pearl/dp/052189560X/ref=sr_1_1?crid=1KVB0KSO1OWMO&keywords=causality+judea&qid=1705423557&sprefix=causality+judea%2Caps%2C78&sr=8-1)
3) [Causal Inference in Statistics, Judea Pearl, Madelyn Glymour, Nicholas P. Jewell](https://www.amazon.co.uk/Causal-Inference-Statistics-Judea-Pearl/dp/1119186846/ref=sr_1_1?crid=1SP7ANTNKW60K&keywords=causal+inference+in+statistics&qid=1705423576&sprefix=causal+inference+in+%2Caps%2C81&sr=8-1)
4) [Variance reduction in experiments using covariate adjustment techniques](https://medium.com/glovo-engineering/variance-reduction-in-experiments-using-covariate-adjustment-techniques-717b1e450185)
5) [How Booking.com increases the power of online experiments with CUPED](https://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d)
6) [CausalML](https://causalml.readthedocs.io/en/latest/index.html)
7) [EconML](https://econml.azurewebsites.net/index.html)
8) [CausalPy](https://causalpy.readthedocs.io/en/latest/)
9) [DoWhy](https://www.pywhy.org/dowhy/v0.11.1/#)

In [1]:
from src.data.load import DataLoader
from pathlib import Path

loader = DataLoader('data')
loader.get_data()

Getting iowa_licor_sales dataset
Dataset iowa_licor_sales already present

Getting wallmart_sales dataset
Dataset wallmart_sales already present

Getting supermarket_sales dataset
Dataset supermarket_sales already present

Getting superstore_sales dataset
Dataset superstore_sales already present

Getting lifetime_value dataset
Dataset lifetime_value already present



In [2]:
from src.data.data_formatter import SupermarketSalesFormatter, IowaLicorSalesFormatter, WallmartSalesFormatter, SuperstoreSalesFormatter, LifetimeValueFormatter

dataset_formatters = {
    'supermarket_sales': SupermarketSalesFormatter(),
    'iowa_licor_sales': IowaLicorSalesFormatter(),
    'wallmart_sales': WallmartSalesFormatter(),
    'superstore_sales': SuperstoreSalesFormatter(),
    'lifetime_value': LifetimeValueFormatter()
    }

# Test if all are running
for dataset_name, formatter in dataset_formatters.items():
    formatter.fit_transform(loader.load_dataset(dataset_name).head())

In [3]:
from src.data.experiment_setup import ExperimentSetup

def z(x):
    return 0.1 if x == 'Yangon' else 0

setup = ExperimentSetup(
    SupermarketSalesFormatter().date_col,
    SupermarketSalesFormatter().target_col,
    "2019-01-01",
    "2019-02-01",
    SupermarketSalesFormatter().treatment_col,
    ["Yangon"]
    )

treated_data = setup.apply_treatment(loader.load_dataset('supermarket_sales'), z)
treated_data

  self.treatment_effect = data.select(pl.col("City").map_elements(treatment_effect_method))


Invoice ID,Branch,City,Customer type,Gender,Product line,Unit price,Quantity,Tax 5%,Total,Date,Time,Payment,cogs,gross margin percentage,gross income,Rating
str,str,str,str,str,str,f64,i64,f64,f64,str,str,str,f64,f64,f64,f64
"""750-67-8428""","""A""","""Yangon""","""Member""","""Female""","""Health and bea…",74.69,7,26.1415,603.86865,"""1/5/2019""","""13:08""","""Ewallet""",522.83,4.761905,26.1415,9.1
"""226-31-3081""","""C""","""Naypyitaw""","""Normal""","""Female""","""Electronic acc…",15.28,5,3.82,80.22,"""3/8/2019""","""10:29""","""Cash""",76.4,4.761905,3.82,9.6
"""631-41-3108""","""A""","""Yangon""","""Normal""","""Male""","""Home and lifes…",46.33,7,16.2155,374.57805,"""3/3/2019""","""13:23""","""Credit card""",324.31,4.761905,16.2155,7.4
"""123-19-1176""","""A""","""Yangon""","""Member""","""Male""","""Health and bea…",58.22,8,23.288,537.9528,"""1/27/2019""","""20:33""","""Ewallet""",465.76,4.761905,23.288,8.4
"""373-73-7910""","""A""","""Yangon""","""Normal""","""Male""","""Sports and tra…",86.31,7,30.2085,697.81635,"""2/8/2019""","""10:37""","""Ewallet""",604.17,4.761905,30.2085,5.3
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""233-67-5758""","""C""","""Naypyitaw""","""Normal""","""Male""","""Health and bea…",40.35,1,2.0175,42.3675,"""1/29/2019""","""13:46""","""Ewallet""",40.35,4.761905,2.0175,6.2
"""303-96-2227""","""B""","""Mandalay""","""Normal""","""Female""","""Home and lifes…",97.38,10,48.69,1022.49,"""3/2/2019""","""17:16""","""Ewallet""",973.8,4.761905,48.69,4.4
"""727-02-1313""","""A""","""Yangon""","""Member""","""Male""","""Food and bever…",31.84,1,1.592,36.7752,"""2/9/2019""","""13:22""","""Cash""",31.84,4.761905,1.592,7.7
"""347-56-2442""","""A""","""Yangon""","""Normal""","""Male""","""Home and lifes…",65.82,1,3.291,76.0221,"""2/22/2019""","""15:33""","""Cash""",65.82,4.761905,3.291,4.1


In [4]:
formatted_data = SupermarketSalesFormatter().fit_transform(treated_data)
formatted_data

Product line,Date,Invoice ID,Payment,Gender,Branch,Total,Customer type,City
str,datetime[μs],str,str,str,str,f64,str,str
"""Health and bea…",2019-01-05 00:00:00,"""750-67-8428""","""Ewallet""","""Female""","""A""",603.86865,"""Member""","""Yangon"""
"""Electronic acc…",2019-03-08 00:00:00,"""226-31-3081""","""Cash""","""Female""","""C""",80.22,"""Normal""","""Naypyitaw"""
"""Home and lifes…",2019-03-03 00:00:00,"""631-41-3108""","""Credit card""","""Male""","""A""",374.57805,"""Normal""","""Yangon"""
"""Health and bea…",2019-01-27 00:00:00,"""123-19-1176""","""Ewallet""","""Male""","""A""",537.9528,"""Member""","""Yangon"""
"""Sports and tra…",2019-02-08 00:00:00,"""373-73-7910""","""Ewallet""","""Male""","""A""",697.81635,"""Normal""","""Yangon"""
…,…,…,…,…,…,…,…,…
"""Health and bea…",2019-01-29 00:00:00,"""233-67-5758""","""Ewallet""","""Male""","""C""",42.3675,"""Normal""","""Naypyitaw"""
"""Home and lifes…",2019-03-02 00:00:00,"""303-96-2227""","""Ewallet""","""Female""","""B""",1022.49,"""Normal""","""Mandalay"""
"""Food and bever…",2019-02-09 00:00:00,"""727-02-1313""","""Cash""","""Male""","""A""",36.7752,"""Member""","""Yangon"""
"""Home and lifes…",2019-02-22 00:00:00,"""347-56-2442""","""Cash""","""Male""","""A""",76.0221,"""Normal""","""Yangon"""


In [6]:

from src.models.preprocessing import SyntheticControlPreProcessing

synth_preprocessing = SyntheticControlPreProcessing(
    SupermarketSalesFormatter().treatment_col,
    SupermarketSalesFormatter().date_col,
    SupermarketSalesFormatter().target_col,
)

synth_preprocessing.fit_transform(formatted_data)

id,date,value
str,datetime[μs],f64
"""Yangon""",2019-02-18 00:00:00,-0.35404
"""Yangon""",2019-03-19 00:00:00,0.946138
"""Naypyitaw""",2019-01-09 00:00:00,1.827146
"""Mandalay""",2019-02-17 00:00:00,-0.427871
"""Naypyitaw""",2019-02-18 00:00:00,-0.947871
…,…,…
"""Naypyitaw""",2019-03-16 00:00:00,0.018371
"""Mandalay""",2019-01-28 00:00:00,2.113695
"""Mandalay""",2019-02-02 00:00:00,0.417523
"""Mandalay""",2019-01-24 00:00:00,1.532236
