# Model Comparison and Confidence Intervals Tutorial

## Outline
1. Block-Bootstrap
2. Model Comparison
3. Evaluate Confidence Intervals

-------

## 1. Block-Bootstrap

Bootrap is a procedure to obtain data with similar characteristics, but different to the original data.

**Bootstrap** - random sampling with replacement.
<img src="pictures/bootstrap_illustration.png" width="900" align="left"/>

**Block-bootstrap** is a bootstrap performed over regions of linkage (e.g. genes). Usually 100 bootstraped data is generated.

We will use a custom script that takes recombination rate as input and performs bootstrap over the regions of 0.5 Morgans length. BE CAREFULL, THIS PROCESS IS VERY TIME-CONSUMING!

In [None]:
%%bash
python scripts/perform_block_bootstrap.py

In [None]:
%%bash
#mkdir data/boots
#python scripts/perform_block_bootstrap.py data/clouded_leopard_data.vcf data/popmap data/boots 100 ../2_easySFS_tutorial/easySFS/easySFS.py "-a --unfolded --proj 10"

We have already run this script for you and you can find `.sfs` files in `../3_GADMA_tutorial/data/boots` directory

In [None]:
%%bash
ls ../3_GADMA_tutorial/data/boots

We can draw several of those SFS:

In [None]:
from scripts.draw_sfs import draw_1d_sfs
draw_1d_sfs("../3_GADMA_tutorial/data/boots/1.sfs")

In [None]:
from scripts.draw_sfs import draw_1d_sfs
draw_1d_sfs("../3_GADMA_tutorial/data/boots/2.sfs")

## 1. Model comparison

We can compare models with different numbers of paramters (e.g. one-epoch history, two-epoch history and etc.) using AIC or CLAIC. More information is available [here](https://gadma.readthedocs.io/en/latest/user_manual/input_data/input_data.html#unlinked-snps-aic-and-claic).
- AIC could be used if our SNP's are **unlinked**, independent. It is usually a case of RAD-like data, remember `easySFS` offered us to choose one SNP per RAD locus? That is exactly the way to get set of unlinked SNP's.
- CLAIC [\[Coffman 2016\]](https://doi.org/10.1093/molbev/msv255) is applied when our SNP's are linked (general case). In order to evaluate CLAIC we should provide additional **block-bootstraped** data. Bootstrap should be performed over the regions of linkage (usually genes). A little more about bootstrap will be further. Here we just got some directory with bootstraped data.

In [None]:
%%bash
cat gadma_params_files/params_model_comparison

In [None]:
%%bash
rm -rf outputs/gadma_outputs/gadma_model_comparison
gadma -p gadma_params_files/params_model_comparison

[comment]: <Change type of this cell from Raw to Markdown to show the picture> 
Final demographic history (`outputs/gadma_outputs/gadma_model_comparison/best_claic_model.png` file):
<img src="outputs/gadma_outputs/gadma_model_comparison/best_claic_model.png" width="900" align="left"/>

----
## 8. Evaluate Confidence Intervals (CI) for the Final Model

For our final model we want to get confidence intervals for its parameters. We will also use block-boostraped data.

In [None]:
%%bash
# If you want to start from scratch remove the output directory
rm -rf confidence_intervals
gadma-run_ls_on_boot_data -b ../3_GADMA_tutorial/data/boots -d outputs/gadma_outputs/gadma_model_comparison/best_claic_model_moments_code.py\
    -o confidence_intervals --opt log -e moments

In [None]:
%%bash
python scripts/translate_units.py confidence_intervals/result_table.csv 7.33488
cat confidence_intervals/result_table_translated.csv

In [None]:
%%bash
python scripts/translate_units.py

In [None]:
%%bash
gadma-get_confidence_intervals confidence_intervals/result_table_translated.csv

### Another way to evaluate CI's

There is a method from [\[Coffman 2016\]](https://doi.org/10.1093/molbev/msv255) that also allows to estimate confidence intervals for the parameters.
- Estimates ancertancy using Godambe Information Matrix.
- Require bootstrap data.
- Does NOT require the whole machinery we just used.
- However, there is no universal script to use it.
- Provides different confidence intervals.

Example script for specific model and data could be found [here](https://github.com/pblischak/inbreeding-sfs/blob/master/data/cabbage/run_cabbage_godambe_3epoch_noF.py). You can use it for your model.