# BinaryTwoStageDesigns - Quickstart

This is a [jupyter](http://jupyter.org) notebook using the [Julia](https://julialang.org/) kernel [IJulia.jl](https://github.com/JuliaLang/IJulia.jl) demonstrating the use of the julia package [BinaryTwoStageDesigns](https://github.com/imbi-heidelberg/BinaryTwoStageDesigns).

To run this notebook, a working installation of the [Gurobi](http://www.gurobi.com/index) solver and the [corresponding Julia interface](https://github.com/JuliaOpt/Gurobi.jl) for [JuMP](https://jump.readthedocs.io/en/latest/) is required. 

In [1]:
using BinaryTwoStageDesigns
using Gurobi

[1m[34mINFO: Recompiling stale cache file C:\Users\Kunzmann\.julia\lib\v0.5\BinaryTwoStageDesigns.ji for module BinaryTwoStageDesigns.

## Setting

Assume that a new anti-cancer agent is to be tested against a historical response rate of $p_0=0.2$ in a phase-II trial and a response rate of $p_1=0.4$ is expected.
The maximal tolerable type-I-error rate for testing $\mathcal{H}_0:p\leq p_0$ is 5% and a type-II-error rate of 20% is deemed acceptable at $p_1=0.4$.
The corresponding single-stage design would require $n=47$ patients in this situation.

In [2]:
p0   = 0.2
p1   = 0.4
tter = 0.2
toer = 0.05
nfix = 47 # (required sample size for one-stage design)

(AbstractArray) in module StatsBase at C:\Users\Kunzmann\.julia\v0.5\StatsBase\src\scalarstats.jl:573 overwritten in module DataFrames at C:\Users\Kunzmann\.julia\v0.5\DataFrames\src\abstractdataframe\abstractdataframe.jl:407.


47

## Adaptive Design

Alternatively, a two-stage adaptive design could be used which minimizes the expected sample size under $p_1=0.4$ subject to the same constraints. 
Additionally, for operational reasons a potential second stage must enroll at least 5 patients. Also, upon rejection of the null hypothesis, at least 25 patients must be enrolled to ensure a sufficiently precise effect estimate for subsequent phase-III planning.

### Sample Space

First, a sample space object is defined. It simply holds infomarion about the allowable search space for the optimization algorithm. Here, the range of possible stage-one sample sizes is limited to 10 to 25, the maximal overall sample size to 100, and the obove mentioned constraints are also passed as optional arguments. 

In [3]:
splspc = SimpleSampleSpace(
    10:25,        # n1 range
    100,          # nmax
    n2min = 5,    # minimum second stage 
    nmincont = 25 # minimum overall sample size on continuation (incl. stopping for efficacy)
)

BinaryTwoStageDesigns.SimpleSampleSpace{Int64}([10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25],100,5,10.0,25,500000,false,Int64[])

### Parameters

Next, the design parameters are also stored in an object. For a `SimpleMinimalExpectedSampleSize`-object only $p_0, p_1$ type one and two error rates and the parameter value for which the expected sample size is to be minimized are required besides the sample space object created earlier.

In [4]:
params = SimpleMinimalExpectedSampleSize(
    splspc,     # sample space
    p0, p1,     # null and planning alternative
    toer, tter, # max. type one and two error rates
    p1          # alternative on which to minimize expected sample size
)

BinaryTwoStageDesigns.SimpleMinimalExpectedSampleSize{BinaryTwoStageDesigns.SimpleSampleSpace{Int64}}(BinaryTwoStageDesigns.SimpleSampleSpace{Int64}([10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25],100,5,10.0,25,500000,false,Int64[]),0.2,0.4,0.05,0.2,0.4,0.0,true,0.0)

### Optimization

Finally, a solver can be defined and the optiization process is started. Note that both the optimal design as well as all design found while exhaustively exploting the $n_1$-space are returned. The basic technique via integer programming has been desribed in [Kunzmann & Kieser 2016](https://arxiv.org/abs/1605.00249).

In [5]:
solver = GurobiSolver(OutputFlag = 0)

Gurobi.GurobiSolver(nothing,Any[(:OutputFlag,0)])

In [6]:
design, res = getoptimaldesign(params, solver, VERBOSE = 0)
using DataFrames
convert(DataFrame, design)

Unnamed: 0,x1,n,c
1,0,25,inf
2,1,25,inf
3,2,25,inf
4,3,25,inf
5,4,25,inf
6,5,25,inf
7,6,25,inf
8,7,33,10.0
9,8,31,10.0
10,9,30,9.0


Here, the combination of operational constraint ($n_2\geq 5$) and the requirement to have at last 25 subjects upon rejection lead to a design with relatively large first stage (25) and almost constant second stage sample size.
We can explore how the ommission of these two constriants would alter the optial design:

In [7]:
splspc2 = SimpleSampleSpace(
    10:25, # n1 range
    100    # nmax
)
params2 = SimpleMinimalExpectedSampleSize(
    splspc2,    # sample space
    p0, p1,     # null and planning alternative
    toer, tter, # max. type one and two error rates
    p1          # alternative on which to minimize expected sample size
)
design2, res2 = getoptimaldesign(params2, solver, VERBOSE = 0)
convert(DataFrame, design2)

Unnamed: 0,x1,n,c
1,0,16,inf
2,1,16,inf
3,2,16,inf
4,3,16,inf
5,4,44,14.0
6,5,34,11.0
7,6,29,9.0
8,7,16,-inf
9,8,16,-inf
10,9,16,-inf


This optimal design has smaller stage-one sample size and a more variable second stage. However, we continue with the initial design.

## Inference

After completing the trial with 8/25 responses in stage one and 3/6 in stage two, a point estimate and confidence iterval are required. Point estimates were discussed in [Kunzmann & Kieser 2016](http://onlinelibrary.wiley.com/doi/10.1002/sim.7200/abstract) and different estimators are implemented. Here, we use a compatible minimum expected mean squared error estimator with several favorable properties.

In [8]:
est = CompatibleEstimator(design, solver)
estimate(est, 8, 3)

0.3585797612104045

In this case, the maximum likelihood estimator (MLE) would have been $11/n(8) = 11/31 = 0.35483870967$ which is relatively close as the design is not very flexible and thus the bias of the MLE is limited.

This estimator induces an ordering on the sample space which in turn implies p values. The major advantage of the novel estimators in [Kunzmann & Kieser 2016](http://onlinelibrary.wiley.com/doi/10.1002/sim.7200/abstract) is the fact that their implied p values are *always* compatible with the design's test decision.

In [9]:
p(est, 8, 3, p0)

0.032385459554107265

The very same ordering/p values can then be used to derive a Clopper-Pearson type confidence interval (paper under review):

In [10]:
ci = ClopperPearsonConfidenceInterval(est, confidence = .9)
limits(ci, 8, 3)

2-element Array{Float64,1}:
 0.214
 0.499