Skip to content

shyaginuma/casual_inference

Repository files navigation

ci PyPI version Downloads

casual_inference

The casual_inference is a Python package provides a simple interface to do causal inference. Doing causal analyses is a complicated stuff. We have to pay attention to many things to do such analyses properly. The casual_inference is developed aiming to reduce such effort.

Installation

pip install casual-inference

Overview

This package will provide several types of evaluator. They have evaluate() and some summary_xxx() methods. The evaluate() method evaluates treatment impact by calculating several statistics in it, and the summary_xxx() methods summarize such statistics in some ways. (e.g., table style, bar chart style, ...)

The evaluate() method expects that the data which has a schema like as follows will be passed.

unit variant metric_A metric_B ...
1 1 0 0.01 ...
2 1 1 0.05 ...
3 2 0 0.02 ...
... ... ... ... ...

The table has been already aggregated by the unit column. (i.e. The unit column should be the primary key)

Columns

  • unit: The unit you want to conduct analysis on. Typically it will be user_id, session_id, ... in the web service domain.
  • variant: The group of intervention. This package always assumes 1 is a variant of control group.
  • metrics: metrics you want to evaluate. e.g., The number of purchases, conversion rate, ...

Quick Start

The casual_inference supports not only the evaluation of normal A/B testing and A/A testing, but also advanced causal inference techniques.

A/B test evaluation

from casual_inference.dataset import create_sample_ab_result
from casual_inference.evaluator import ABTestEvaluator

data = create_sample_ab_result(n_variant=3, sample_size=1000000, simulated_lift=[-0.01, 0.01])

evaluator = ABTestEvaluator()
evaluator.evaluate(
    data=data,
    unit_col="rand_unit",
    variant_col="variant",
    metrics=["metric_bin", "metric_cont"]
)

evaluator.summary_plot()

eval_result

It diagnoses Sample Ratio Mismatch (SRM) automatically. When it detects the SRM, it'll display a warning on the output so that the Analyst can interpret the result carefully.

You can also see the example notebook to see more detailed example.

A/A test evaluation

from casual_inference.dataset import create_sample_ab_result
from casual_inference.evaluator import AATestEvaluator

data = create_sample_ab_result(n_variant=2, sample_size=1000000, simulated_lift=[0.0])

evaluator = AATestEvaluator()
evaluator.evaluate(
    data=data,
    unit_col="rand_unit",
    metrics=["metric_bin", "metric_cont"]
)

evaluator.summary_plot()

eval_result

You can also see the example notebook to see more detailed example.

Sample Size evaluation

from casual_inference.dataset import create_sample_ab_result
from casual_inference.evaluator import SampleSizeEvaluator

data = create_sample_ab_result(n_variant=2, sample_size=1000000)

evaluator = SampleSizeEvaluator()
evaluator.evaluate(
    data=data,
    unit_col="rand_unit",
    metrics=["metric_bin", "metric_cont"]
)

evaluator.summary_plot()

eval_result

You can also see the example notebook to see more detailed example.

Advanced causal inference techniques

It also supports advanced causal inference techniques.

  • Linear Regression

Another evaluation methods like Propensity Score Matching are planed to implement in the future.

References

  • Kohavi, Ron, Diane Tang, and Ya Xu. 2020. ​Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press. https://experimentguide.com/
    • A Great book covering comprehensive topics around practical A/B testing. I do recommend to read this book for all people who works on A/B testing.
  • Alex Deng, Ulf Knoblich, and Jiannan Lu. 2018. Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). Association for Computing Machinery, New York, NY, USA, 233–242. https://doi.org/10.1145/3219819.3219919
    • Describing how to approximate variance of relative difference, and when the analysis unit was more granular than the randomization unit.
  • Lucile Lu. 2016. Power, minimal detectable effect, and bucket size estimation in A/B tests. Twitter Engineering Blog. link
    • Describing Concept around Type I error and Type II error, Power Analysis. (Sample size calculation)
  • Aleksander Fabijan, Jayant Gupchup, Somit Gupta, Jeff Omhover, Wen Qin, Lukas Vermeer, and Pavel Dmitriev. 2019. Diagnosing Sample Ratio Mismatch in Online Controlled Experiments: A Taxonomy and Rules of Thumb for Practitioners. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '19). Association for Computing Machinery, New York, NY, USA, 2156–2164. https://doi.org/10.1145/3292500.3330722
    • Introduce Sample Ratio Mismatch (SRM) and describe various example of SRM happening, and provide taxonomy that help debugging when the SRM happened.
  • Shota Yasui. 2020. 効果検証入門. 技術評論社. https://gihyo.jp/book/2020/978-4-297-11117-5
    • A Great introduction book about practical causal inference technique written in Japanese.