trials

Tiny Bayesian A/B testing library

Installation

pip install git+git://github.com/bogdan-kulynych/trials.git@master

pip might not install all the system packages needed for scipy. To install them on Debian:

sudo apt-get install libatlas-dev libatlas-base-dev liblapack-dev gfortran

Run the tests:

nosetests trials/tests

Usage

Import package

from trials import Trials

Start a split test with Bernoulli (binary) observations

test = Trials(['A', 'B', 'C'])

Observe successes and failures

test.update({
    'A': (50, 10), # 50 successes, 10 failures, total 60
    'B': (75, 15), # 75 successes, 15 failures, total 90
    'C': (20, 15)  # 20 successes, 15 failures, total 35
})

Evaluate some statistics

dominances = test.evaluate('dominance', control='A')         # Dominance probabilities P(X > A)
lifts = test.evaluate('expected lift', control='A')          # Expected lifts E[(X-A)/A]
intervals = test.evaluate('lift CI', control='A', level=95)  # Lifts' 95%-credible intervals

Available statistics for Bernoulli observation variations: expected posterior, posterior CI, expected lift, lift CI, empirical lift, dominance, z-test dominance.

Print or visualize results

for variation in ['B', 'C']:
    print('Variation {name}:'.format(name=variation))
    print('* E[lift] = {value:.2%}'.format(value=lifts[variation]))
    print('* P({lower:.2%} < lift < {upper:.2%}) = 95%' \
        .format(lower=intervals[variation][0], upper=intervals[variation][2]))
    print('* P({name} > {control}) = {value:.2%}' \
        .format(name=variation, control='A', value=dominances[variation]))

Examine the output:

Variation B:
* E[lift] = 0.22%                       # expected lift
* P(-13.47% < lift < 17.31%) = 95%      # lift CI
* P(B > A) = 49.27%                     # dominance
Variation C:
* E[lift] = -31.22%
* P(-51.33% < lift < -9.21%) = 95%
* P(C > A) = 0.25%

Interpreting and analyzing results

As per the output above there's 50% chance that variation B is better than A (dominance). Most likely it is better by about 0.2% (expected lift), but there's 95% chance that real lift is anywhere betwen -13% to 17% (lift CI). You need more data to know if B is better or worse for sure.

There's 100% - 0.25% = 99.75% chance that variation C is worse than A. Most likely it is worse by about 31%, and there's 95% chance that real lift falls betwen -51% to -9%. The data was sufficient to tell that this variation is almost certainly inferior to both A and B. However, if this 99.75% chance still doesn't convince you, you need more data.

Theory

Explanation of mathematics behind and usage guide are coming soon as a blog post.

Meanwhile, see the notebook for comparison of Bayesian lift (blue) and empirical lift (green) errors in a theoretical benchmark with equal sample sizes. Bayesian approach is a little better at predicting the lift, but no miracles here. Bayesian p-values and frequentist (z-test) p-values yield almost identical results.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
examples		examples
trials		trials
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

trials

Installation

Usage

Interpreting and analyzing results

Theory

About

Releases

Packages

Languages

License

mzhigarev/A-B-Tests

Folders and files

Latest commit

History

Repository files navigation

trials

Installation

Usage

Interpreting and analyzing results

Theory

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages