# Getting Started with `cibin`

The `cibin` package provides a method for constructing exact confidence intervals based on the method in section 3 of ["Exact confidence intervals for the average
causal effect on a binary outcome" by Li and Ding](https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.6764). This method 
inverts a series of randomization tests. With sample size *n*, the approach requires performing O(*n*$^4$) randomization tests, justifing computationally efficient methods of finding exact confidence intervals.

Theroem 2(1) in Li and Ding provides useful order information to reduce the number of randomization tests, resulting in at most *O(n$^2$)* randomization tests needing to be performed.

The paper provides technical details and R code, but the `cibin` code base is a python implementation of the method. 

## Installation
First, install the project from source, via:

``` pip install .```

or, as a developer:

`pip install -e .`

Check that you have installed the latest version.

## Calculating Confidence Intervals with `cibin`

In [1]:
# import relevant packages
%load_ext autoreload
%autoreload 2
import numpy as np
from cibin import *

The 2-sided 1-$\alpha$ confidence bounds for the average treatment effect can be calculated with the `tau_twosided_ci` function in the `cibin` package. This function assumes a randomized experiment with binary outcomes and two treatments, (active and control), finds the confidence bounds for $\tau$, the average treatment effect. 

The user may specify the number of subjects assigned to each treatment and outcome, the desired type 1 error level, whether the calculation should be exact or approximate, the max combinations to sample. The function returns the lower and upper oboudn of the confidence interval, allocation that gives the bounds, and number tables examined and total reps across simulations.

The function returns a tuple of 3 items:
* `[lb ,up]`: lower/upper bound of the confidence interval,
* `[allocation that gives lb, allocation that gives ub]`,
* `[# tables examined, total reps across simulations]`

In [2]:
# Find 2-sided 1-alpha confidence bounds
n11 = 1 # number of subjects assigned to treatment 1 who had outcome 1
n10 = 1 # number of subjects assigned to treatment 1 who had outcome 0
n01 = 1 # number of subjects assigned to treatment 0 who had outcome 1
n00 = 13 # number of subjects assigned to treatment 0 who had outcome 0
alpha = 0.05 # The desired type 1 error level.
ci, alloc, iters = tau_twosided_ci(n11, n10, n01, n00, alpha, exact=True, max_combinations=10**5, reps=10**3)

In [3]:
print('confidence interval:',ci)
print('allocation that gives confidence interval:',alloc)
print('# tables exambined:',iters[0])
print('total # reps across simulations:',iters[1])

confidence interval: [-0.0625, 0.875]
allocation that gives confidence interval: [[14, 0, 1, 1], [1, 14, 0, 1]]
# tables exambined: 71
total # reps across simulations: 8520


`tau_twosided_ci` can find an exact or simulated solution. For either solution, the function calls `N_generator` to generate tables algebraically consistent with the provided data. 

`N_generator` returns a list of 4 integers:
* `N00`: subjects with potential outcome 0 under control and treatment
* `N01`: subjects with potential outcome 0 under control and 1 under treatment
* `N10`: subjects with potential outcome 1 under control and 0 under treatment
* `N11`: subjects with potential outcome 1 under control and treatment

In [4]:
N_generator?

In [16]:
N = n00 + n01 + n10 + n11
Nt_gen = N_generator(N,n00,n01,n10,n11)
for Nt in Nt_gen:
    print(Nt)

[0, 13, 1, 2]
[0, 13, 2, 1]
[0, 14, 1, 1]
[1, 12, 1, 2]
[1, 12, 2, 1]
[1, 13, 0, 2]
[1, 13, 1, 1]
[1, 14, 0, 1]
[2, 11, 1, 2]
[2, 11, 2, 1]
[2, 12, 0, 2]
[2, 12, 1, 1]
[2, 13, 0, 1]
[3, 10, 1, 2]
[3, 10, 2, 1]
[3, 11, 0, 2]
[3, 11, 1, 1]
[3, 12, 0, 1]
[4, 9, 1, 2]
[4, 9, 2, 1]
[4, 10, 0, 2]
[4, 10, 1, 1]
[4, 11, 0, 1]
[5, 8, 1, 2]
[5, 8, 2, 1]
[5, 9, 0, 2]
[5, 9, 1, 1]
[5, 10, 0, 1]
[6, 7, 1, 2]
[6, 7, 2, 1]
[6, 8, 0, 2]
[6, 8, 1, 1]
[6, 9, 0, 1]
[7, 6, 1, 2]
[7, 6, 2, 1]
[7, 7, 0, 2]
[7, 7, 1, 1]
[7, 8, 0, 1]
[8, 5, 1, 2]
[8, 5, 2, 1]
[8, 6, 0, 2]
[8, 6, 1, 1]
[8, 7, 0, 1]
[9, 4, 1, 2]
[9, 4, 2, 1]
[9, 5, 0, 2]
[9, 5, 1, 1]
[9, 6, 0, 1]
[10, 3, 1, 2]
[10, 3, 2, 1]
[10, 4, 0, 2]
[10, 4, 1, 1]
[10, 5, 0, 1]
[11, 2, 1, 2]
[11, 2, 2, 1]
[11, 3, 0, 2]
[11, 3, 1, 1]
[11, 4, 0, 1]
[12, 1, 1, 2]
[12, 1, 2, 1]
[12, 2, 0, 2]
[12, 2, 1, 1]
[12, 3, 0, 1]
[13, 0, 1, 2]
[13, 0, 2, 1]
[13, 1, 0, 2]
[13, 1, 1, 1]
[13, 2, 0, 1]
[14, 0, 0, 2]
[14, 0, 1, 1]
[14, 1, 0, 1]


Each list returned by `N_generator` is used to generate a potential outcomes table with the `potential_outcomes` function. The `potential_outcomes` function makes a 2xN table of potential outcomes from the 2x2 summary table from `N_generator`. 

In [26]:
Nt_example = [0, 13, 1, 2]
table = potential_outcomes(Nt_example)

In [31]:
print('shape:',table.shape)
print(table)

shape: (16, 2)
[[0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [0 1]
 [1 0]
 [1 1]
 [1 1]]


$\tau$ and the test statistic $\tau$*-$\tau$ is computed for each of the potential outcome tables generated. 

If the solution is exact, all possible samples are generated and a test statistic is calculated for each sample. If the solution is not exact, the samples are only generated up to the `reps` value provided by the user.