# rerand example

Here I provide an example of how to use the rerand package.

First, lets import a few modules, including the `Randomisation` class, the main class in the `rerand` package.

In [58]:
from rerand.Randomisation import Randomisation
import numpy as np
import pandas as pd

## Simple treatment vs control case

Let's say we want to run a 50/50 experiment on 1000 observations with a single treatment group and a control group. We have access to three covariates, and we want to rerandomise until we achieve balance across these three covariates.

First, simulate some covariates data:

In [59]:
n = 1000

x = pd.DataFrame({
    'x1': np.random.normal(0, 1, n),
    'x2': np.random.normal(0, 1, n),
    'x3': np.random.normal(0, 1, n),
})

Now, let's configure the remaining inputs into the `Randomisation` class, starting with the variants and randomisation probabilities.

In [60]:
variants = {
    'treatment': 0.5,
    'control': 0.5
}

Next, let's choose a distance metric:

In [61]:
distance_metric = 'Euclidean'

Now choose a tolerance. This is the distance that we consider to be acceptable for balance.

In [62]:
tol = 0.05

Now choose the maximum number of times we'd like to attempt to achieve this distance.

In [63]:
max_reps = 100

Now let's plug this into the `Randomisation` class.

In [64]:
rand = Randomisation(covariates=x,
                     variants=variants,
                     distance_metric=distance_metric,
                     tol=tol,
                     max_reps=max_reps)

INFO:root:Initialising Randomisation class


We can now use the `randomise` method to obtain treatment and control groups that are balanced across all three covariates.

In [65]:
assignment_vector = rand.randomise()

INFO:root:Randomisation: 1, Distance = 0.15
INFO:root:Randomisation: 2, Distance = 0.08
INFO:root:Randomisation: 3, Distance = 0.05
INFO:root:Randomisation: 4, Distance = 0.07
INFO:root:Randomisation: 5, Distance = 0.12
INFO:root:Randomisation: 6, Distance = 0.12
INFO:root:Randomisation: 7, Distance = 0.06
INFO:root:Randomisation: 8, Distance = 0.12
INFO:root:Randomisation: 9, Distance = 0.15
INFO:root:Randomisation: 10, Distance = 0.13
INFO:root:Randomisation: 11, Distance = 0.14
INFO:root:Randomisation: 12, Distance = 0.17
INFO:root:Randomisation: 13, Distance = 0.11
INFO:root:Randomisation: 14, Distance = 0.17
INFO:root:Randomisation: 15, Distance = 0.11
INFO:root:Randomisation: 16, Distance = 0.07
INFO:root:Randomisation: 17, Distance = 0.19
INFO:root:Randomisation: 18, Distance = 0.1
INFO:root:Randomisation: 19, Distance = 0.07
INFO:root:Randomisation: 20, Distance = 0.13
INFO:root:Randomisation: 21, Distance = 0.16
INFO:root:Randomisation: 22, Distance = 0.1
INFO:root:Randomisati

You can see how many randomisations were required to achieve the selected tolerance distance.

The `randomise` method returns a list of assignments.

In [66]:
print(assignment_vector[0:9])

['treatment' 'control' 'control' 'treatment' 'control' 'control' 'control'
 'treatment' 'control']


## Example with seeds and multiple variants

Let's now move on to a more complex example. 

Firstly, a key concern when randomising is reproducibility. The module supports a list of seeds, with each seed corresponding to a potential rerandomisation. This must be at least as long as the maximum number of randomisations, provided by the `max_reps` attribute. Identical lists of seeds will result in identical randomisations.

Secondly, in many experiments we do not only have a treatment group and a control group. Rather, we have multiple treatment groups (or 'variants'). This is supported. When using more than two groups, the relevant distance is the maximum distance across all pairwise comparisons. The distance between any two groups cannot exceed the chosen `tol`.

Let's run through the example again, with some new settings.

In [71]:
seeds = range(300)
variants = {
    'a': 0.5,
    'b': 0.3,
    'c': 0.2
}
distance_metric = 'Euclidean'
max_reps = 300
tol = 0.1

In [72]:
rand = Randomisation(covariates=x,
                     variants=variants,
                     distance_metric=distance_metric,
                     tol=tol,
                     max_reps=max_reps,
                     seeds=seeds)

INFO:root:Initialising Randomisation class


In [73]:
assignment_vector = rand.randomise()

INFO:root:Randomisation: 1, Distance = 0.17
INFO:root:Randomisation: 2, Distance = 0.2
INFO:root:Randomisation: 3, Distance = 0.08
INFO:root:3 randomisations needed to achieve balance with tolerance 0.1


In [74]:
print(assignment_vector[0:9])

['a' 'a' 'b' 'a' 'a' 'a' 'a' 'b' 'a']
