# SusieR benchmark

A first (comprehensive) set of simulations to learn properties of the new fine-mapping method.

As a first pass, we run `susie` on simulations of 100 genes, with 

- real genotypes, of length 1000 for starters
- randomly-chosen SNPs as eQTLs
- 1-5 eQTLs per gene
- explaining 5-40% of the variance in Y

We characterize 95% confident sets (CS) it produces in terms of

- size: number of variants in it
- purity: min and mean LD
- significance: lfsr
- whether or not they capture a signal

We run Susie with

- `estimate_residual_variance`: TRUE and FALSE
- `prior_variance`: 5-40%

We should expect that results will be reasonably robust to these choices.

In [1]:
%cd ~/GIT/github/mvarbvs/dsc

/home/gaow/GIT/github/mvarbvs/dsc

In [None]:
dsc susie.dsc --target run_susie

## Utility function

In [None]:
get_combined = function(sub, dirname, ld_col) {
    out_files = sub[,c("fit_susie.output.file", "plot_susie.output.file")]
    combined = list(purity = NULL, lfsr = NULL, size = NULL, captures = NULL)
    for (i in 1:nrow(out_files)) {
        fit = readRDS(paste0(dirname, out_files[i,1], '.rds'))$posterior
        purity = readRDS(paste0(dirname, out_files[i,2], '.rds'))
        #
        if (is.null(combined$purity)) combined$purity = purity$purity$V1[,ld_col]
        else combined$purity = cbind(combined$purity, purity$purity$V1[,ld_col])
        #
        if (is.null(combined$size)) combined$size = fit$n_in_CI[,1]
        else combined$size = cbind(combined$size, fit$n_in_CI[,1])
        #
        if (is.null(combined$lfsr)) combined$lfsr = fit$set_lfsr[,1]
        else combined$lfsr = cbind(combined$lfsr, fit$set_lfsr[,1])
        # 
        detected = apply(t(purity$signal$V1[which(fit$set_lfsr[,1] < lfsr_cutoff),,drop=FALSE]), 1, sum)
        if (is.null(combined$captures)) combined$captures = detected
        else combined$captures = combined$captures + detected
    }
    return(combined)
}

## Results

In [23]:
out = dscrutils::dscquery('benchmark', 
                    target = "liter_data.data_file simple_lm.pve simple_lm.n_signal fit_susie.estimate_residual_variance fit_susie.prior_var fit_susie plot_susie")

Loading dsc-query output from CSV file.


In [9]:
head(out)

DSC,liter_data.data_file,simple_lm.pve,simple_lm.n_signal,fit_susie.estimate_residual_variance,fit_susie.prior_var,fit_susie.output.file,plot_susie.output.file
1,~/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds,0.05,1,False,0.05,fit_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_1,plot_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_1_plot_susie_1
1,~/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds,0.05,1,False,0.1,fit_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_3,plot_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_3_plot_susie_1
1,~/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds,0.05,1,False,0.2,fit_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_5,plot_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_5_plot_susie_1
1,~/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds,0.05,1,False,0.4,fit_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_7,plot_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_7_plot_susie_1
1,~/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds,0.05,1,True,0.05,fit_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_2,plot_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_2_plot_susie_1
1,~/Documents/GTExV8/Thyroid.Lung.FMO2.filled.rds,0.05,1,True,0.1,fit_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_4,plot_susie/liter_data_1_summarize_ld_1_simple_lm_1_fit_susie_4_plot_susie_1


It is a lot of results to look at. Here we focus on PVE 20% and 40%, having 2, 3, 5 signals, with fit residual variance set to `FALSE` and prior set to `0.2`. We focus on measuring "purity" by min(abs(LD)).

### 2 signals + 20% PVE

In [81]:
pve = 0.2
n = 2
est_res = FALSE
prior = 0.2
ld_col = 1 # LD_Min
lfsr_cutoff = 0.05
dirname = 'benchmark/'
sub = out[which(out$simple_lm.pve == pve & out$simple_lm.n_signal == n & out$fit_susie.estimate_residual_variance == est_res & out$fit_susie.prior_var == prior),]

In [88]:
combined = get_combined(sub, dirname, ld_col)
combined