boostsa - BOOtSTrap SAmpling - is a tool to compute bootstrap sampling significance test, even in the pipeline of a complex experimental design...
- Free software: MIT license
- Documentation: https://boostsa.readthedocs.io.
Name | Link |
---|---|
You can try boostsa here: |
pip install -U boostsa
First, import boostsa
:
from boostsa import Bootstrap
Then, create a boostrap instance. You will use it to store your experiments' results and to compute the bootstrap sampling significance test:
boot = Bootstrap()
The assumption is that you ran at least two classification task experiments, which you want compare.
One is your baseline, or control, or hypothesis 0 (h0).
The other one is the experimental condition that hopefully beats the baseline, or treatment, or hypothesis 1 (h1).
You compare the h0 and h1 predictions against the same targets.
Therefore, h0 predictions, h1 predictions and targets will be the your Bootstrap
instance's data inputs.
By defalut, boostsa produces two output files:
results.tsv
, that contains the experiments' performance and the (possible) significance levels;outcomes.json
, that contains targets and predictions for all the experimental conditions.
You can define the outputs when you create the instance, using the following parameters:
save_results
, type:bool
, default:True
. This determines if you want to save the results.save_outcomes
, type:bool
, default:True
. This determines if you want to save the experiments' outcomes..dir_out
, type:str
, default:''
, that is your working directory. This indicates the directory where to save the results.
For example, if you want to save only the results in a particular folder, you will create an instance like this:
boot = Bootstrap(save_outcomes=False, dir_out='my/favourite/directory/')
In the simplest conditions, you will run the bootstrap sampling significance test with the test
function.
It takes the following inputs:
targs
, type:list
orstr
. They are the targets, or gold standard, that you use as benchmark to measure the h0 and h1 predictions' performance. They can be a list of integers, representing the labels' indexes for each data point, or a string. In such case, the string will be interpreted as the path to a text file containing a single integer in each row, having the same meaning as for the list input.h0_preds
, type:list
orstr
. The h0 predictions, in the same formats oftargs
.h1_preds
, type:list
orstr
. The h1 predictions, in the same formats as above.h0_name
, type:str
, default:h0
. Expression to describe the h0 condition.h1_name
, type:str
, default:h1
. Expression to describe the h1 condition.n_loops
, type:int
, default:100
. Number of iterations for computing the bootstrap sampling.sample_size
, type:float
, default:.1
. Percentage of data points sampled, with respect to their whole set. The admitted values range between 0.05 (5%) and 0.5 (50%).verbose
, type:bool
, default:False
. If true, the experiments' performance is shown.
For example:
boot.test(targs='../test_boot/h0.0/targs.txt', h0_preds='../test_boot/h0.0/preds.txt', h1_preds='../test_boot/h1.0/preds.txt', n_loops=1000, sample_size=.2, verbose=True)
The ouput will be:
total size............... 1000 sample size.............. 200 targs count: ['class 0 freq 465 perc 46.50%', 'class 1 freq 535 perc 53.50%'] h0 preds count: ['class 0 freq 339 perc 33.90%', 'class 1 freq 661 perc 66.10%'] h1 preds count: ['class 0 freq 500 perc 50.00%', 'class 1 freq 500 perc 50.00%'] h0 F-measure............. 67.76 h1 F-measure............. 74.07 diff... 6.31 h0 accuracy.............. 69.0 h1 accuracy.............. 74.1 diff... 5.1 h0 precision............. 69.94 h1 precision............. 74.1 diff... 4.16 h0 recall................ 67.96 h1 recall................ 74.22 diff... 6.26 bootstrap: 100%|███████████████████████████| 1000/1000 [00:07<00:00, 139.84it/s] count sample diff f1 is twice tot diff f1....... 37 / 1000 p < 0.037 * count sample diff acc is twice tot diff acc...... 73 / 1000 p < 0.073 count sample diff prec is twice tot diff prec..... 111 / 1000 p < 0.111 count sample diff rec is twice tot diff rec ..... 27 / 1000 p < 0.027 * Out[3]: f1 diff_f1 sign_f1 acc diff_acc sign_acc prec diff_prec sign_prec rec diff_rec sign_rec h0 67.76 69.0 69.94 67.96 h1 74.07 6.31 * 74.1 5.1 74.10 4.16 74.22 6.26 *
That's it!
For more complex experimental designs and technical/ethical considerations, please refer to the documentation page.