# M&M ASH benchmark V

This is a continuation of Part V where I set total PVE is set to 0.15 and assume 2 causal variables per region. But here, the two SNPs have the same effects sampled from the multivariate distribution. Also I use $R = 5$ conditions and run it on $J=1000$ and 150 genes.

The most important difference from previous simulations is that here I mix-match simulated data under different prior assumptions to analyzing them with different priors. I expect to observe that:

1. The "oracle" prior is always better than using other priors, for all scenarios.
2. Mixture prior generally performs well in all scenarios -- it is robust to simulation assumptions.

## Conclusion

...

The benchmark was executd on UChicago midway

```
./finemap.dsc --host mnm_R5.yml --R 5
```

This executes the `default` pipeline in `finemap.dsc` file, as of today (2019.02.04).

In [1]:
%cd ~/GIT/github/mnm-twas/dsc

/scratch/midway2/gaow/GIT/github/mnm-twas/dsc

In [None]:
start_time <- Sys.time()
library('dscrutils')
out = dscquery('finemap_output', "sharing_pattern mnm.eff_mode susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top susie_scores.n_causal susie_scores.included_causal susie_scores.overlap", omit.file.columns = T, verbose = F)
end_time <- Sys.time()

In [None]:
saveRDS(out, 'finemap_output/benchmark_v.rds')

In [4]:
# out = readRDS('finemap_output/benchmark_v.rds')

In [3]:
end_time - start_time

Time difference of 2.014167 mins

In [5]:
head(out)

DSC,sharing_pattern,mnm,mnm.eff_mode,susie_scores.total,susie_scores.valid,susie_scores.size,susie_scores.purity,susie_scores.top,susie_scores.n_causal,susie_scores.included_causal,susie_scores.overlap
1,identity,mnm_identity,identity,1,1,1.0,1.0,1,1,1,0
1,identity,mnm_identity,identity,2,2,3.5,0.9813743,1,2,2,0
1,identity,mnm_identity,identity,1,1,2.0,1.0,1,1,1,0
1,identity,mnm_identity,identity,1,1,6.0,1.0,0,1,1,0
1,identity,mnm_identity,identity,1,1,17.0,0.9615369,0,1,1,0
1,identity,mnm_identity,identity,3,2,3.0,0.9806151,1,2,2,0


In [6]:
res = out[,c(2,4,5,6,7,8,9,10,11,12)]
colnames(res) = c('pattern', 'method', 'total', 'valid', 'size', 'purity', 'top_hit', 'total_true', 'total_true_included', 'overlap')

### Purity of CS

In [7]:
aggregate(purity~pattern + method, res, mean)

pattern,method,purity
high_het,high_het,0.9861015
identity,high_het,0.9872195
low_het,high_het,0.9836635
mid_het,high_het,0.9868067
mixture01,high_het,0.953427
shared,high_het,0.9825523
singleton,high_het,0.842797
high_het,identity,0.9847402
identity,identity,0.9870663
low_het,identity,0.9827262


### Size of CS

In [8]:
aggregate(size~pattern+method, res, median)

pattern,method,size
high_het,high_het,3.5
identity,high_het,3.0
low_het,high_het,4.0
mid_het,high_het,3.0
mixture01,high_het,4.5
shared,high_het,4.0
singleton,high_het,9.0
high_het,identity,3.5
identity,identity,3.0
low_het,identity,4.0


### Power

In [15]:
total_true_included = aggregate(total_true_included ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
overlap = aggregate(overlap ~ pattern + method, res, mean)
power = merge(total_true_included, total_true, by = c("pattern", "method"))
power = merge(power, overlap,  by = c("pattern", "method"))
power$power = power$total_true_included/power$total_true
power

pattern,method,total_true_included,total_true,overlap,power
high_het,high_het,249,272,1.46,0.9154412
high_het,identity,250,272,1.4933333,0.9191176
high_het,low_het,248,272,0.0,0.9117647
high_het,mid_het,249,272,1.4733333,0.9154412
high_het,mixture_1,241,272,0.0,0.8860294
high_het,shared,206,272,9.4,0.7573529
high_het,singleton,235,272,209.5333333,0.8639706
identity,high_het,228,244,0.0,0.9344262
identity,identity,228,244,0.0,0.9344262
identity,low_het,226,244,0.0,0.9262295


### FDR

In [16]:
valid = aggregate(valid ~ pattern + method, res, sum)
total = aggregate(total ~ pattern + method, res, sum)
fdr = merge(valid, total, by = c("pattern", "method"))
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr

pattern,method,valid,total,fdr
high_het,high_het,250,270,0.07407407
high_het,identity,251,271,0.07380074
high_het,low_het,248,265,0.06415094
high_het,mid_het,250,268,0.06716418
high_het,mixture_1,240,258,0.06976744
high_het,shared,220,227,0.030837
high_het,singleton,616,668,0.07784431
identity,high_het,227,242,0.06198347
identity,identity,227,242,0.06198347
identity,low_het,225,239,0.05857741


### Top-hit rate (how often the strongest SNP is causal)

In [17]:
top_hit = aggregate(top_hit ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
top_rate = merge(top_hit, total_true, by = c("pattern", "method"))
top_rate$top_rate = top_rate$top_hit/top_rate$total_true
top_rate

pattern,method,top_hit,total_true,top_rate
high_het,high_het,139,272,0.51102941
high_het,identity,138,272,0.50735294
high_het,low_het,138,272,0.50735294
high_het,mid_het,140,272,0.51470588
high_het,mixture_1,129,272,0.47426471
high_het,shared,112,272,0.41176471
high_het,singleton,262,272,0.96323529
identity,high_het,135,244,0.55327869
identity,identity,135,244,0.55327869
identity,low_het,135,244,0.55327869
