# M&M ASH benchmark VI

This is a continuation of Part V where I set total PVE is set to 0.1 and assume 2 causal variables per region. I added in evaluation of lfsr.

The most important difference from previous simulations is that here I mix-match simulated data under different prior assumptions to analyzing them with different priors. I expect to observe that:

1. The "oracle" prior is always better than using other priors, for all scenarios.
2. Mixture prior generally performs well in all scenarios -- it is robust to simulation assumptions.

## Conclusion

1. The expected observations above are both true, with some interesting exceptions
    - "oracle" mixture prior is not better than using mixture prior on some other scenarios
    - Singleton oracle is bad
2. Power table: model mis-specification will result in overlaps, but there is no overlapping issue in mixture model
3. Overlaps of singleton results are prevalent as expected
4. mixture prior has great FDR control on CS
5. mixture prior has the best lfsr control on effect estimates; shared and singleton priors has bad controls; even low_het is better than shared.

The benchmark was executd on UChicago midway

```
./finemap.dsc --host mnm_R5.yml --R 5
```

This executes the `default` pipeline in `finemap.dsc` file, as of today (2019.02.04).

In [1]:
%cd ~/GIT/github/mnm-twas/dsc

/scratch/midway2/gaow/GIT/github/mnm-twas/dsc

In [2]:
start_time <- Sys.time()
library('dscrutils')
out = dscquery('finemap_output', "sharing_pattern mnm.eff_mode susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top susie_scores.n_causal susie_scores.included_causal susie_scores.overlap susie_scores.false_discoveries susie_scores.total_discoveries", omit.file.columns = T, verbose = F)
end_time <- Sys.time()

In [3]:
end_time - start_time

Time difference of 3.255534 mins

In [4]:
head(out)

DSC,sharing_pattern,mnm,mnm.eff_mode,susie_scores.total,susie_scores.valid,susie_scores.size,susie_scores.purity,susie_scores.top,susie_scores.n_causal,susie_scores.included_causal,susie_scores.overlap,susie_scores.false_discoveries,susie_scores.total_discoveries
1,identity,mnm_identity,identity,1,1,7,0.9998415,0,1,1,0,0,0
1,identity,mnm_identity,identity,2,1,9,0.7843589,1,1,1,0,0,5
1,identity,mnm_identity,identity,1,1,6,0.9809961,0,1,1,0,0,0
1,identity,mnm_identity,identity,1,1,1,1.0,1,1,1,0,0,5
1,identity,mnm_identity,identity,1,1,1,1.0,1,1,1,0,0,5
1,identity,mnm_identity,identity,2,2,7,0.9851348,1,2,2,0,0,0


In [5]:
dim(out)

In [6]:
res = out[,c(2,4,5,6,7,8,9,10,11,12,13,14)]
colnames(res) = c('pattern', 'method', 'total', 'valid', 'size', 'purity', 'top_hit', 'total_true', 'total_true_included', 'overlap', 'false_discoveries', 'total_discoveries')

### Purity of CS

In [7]:
purity = aggregate(purity~pattern + method, res, mean)
purity

pattern,method,purity
high_het,high_het,0.9859257
identity,high_het,0.9854762
low_het,high_het,0.982859
mid_het,high_het,0.9835136
mixture01,high_het,0.9431239
shared,high_het,0.9847228
singleton,high_het,0.8170618
high_het,identity,0.9855643
identity,identity,0.9853064
low_het,identity,0.9828271


In [8]:
aggregate(purity~method, purity, mean)

method,purity
high_het,0.954669
identity,0.955127
low_het,0.9469669
mid_het,0.953836
mixture_1,0.9535649
shared,0.8136527
singleton,0.8848792


### Size of CS

In [9]:
size = aggregate(size~pattern+method, res, median)
size

pattern,method,size
high_het,high_het,3.0
identity,high_het,3.5
low_het,high_het,3.0
mid_het,high_het,3.25
mixture01,high_het,4.0
shared,high_het,2.0
singleton,high_het,10.0
high_het,identity,3.0
identity,identity,3.5
low_het,identity,3.0


In [10]:
aggregate(size~method, size, mean)

method,size
high_het,4.107143
identity,4.0
low_het,3.785714
mid_het,4.142857
mixture_1,4.071429
shared,3.071429
singleton,9.357143


### Power of CS

In [11]:
total_true_included = aggregate(total_true_included ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
overlap = aggregate(overlap ~ pattern + method, res, mean)
power = merge(total_true_included, total_true, by = c("pattern", "method"))
power = merge(power, overlap,  by = c("pattern", "method"))
power$power = power$total_true_included/power$total_true
power = power[order(power$method),]
power

Unnamed: 0,pattern,method,total_true_included,total_true,overlap,power
1,high_het,high_het,251,272,0.32,0.9227941
8,identity,high_het,232,249,0.0,0.9317269
15,low_het,high_het,235,265,0.0,0.8867925
22,mid_het,high_het,254,274,0.05333333,0.9270073
29,mixture01,high_het,212,252,0.0,0.8412698
36,shared,high_het,224,247,0.0,0.9068826
43,singleton,high_het,161,266,0.0,0.6052632
2,high_het,identity,252,272,0.31333333,0.9264706
9,identity,identity,231,249,1.43333333,0.9277108
16,low_het,identity,235,265,0.0,0.8867925


In [12]:
aggregate(power~method, power, mean)

method,power
high_het,0.8602481
identity,0.8618861
low_het,0.8581447
mid_het,0.8607923
mixture_1,0.850561
shared,0.740021
singleton,0.8160266


### FDR of CS

In [13]:
valid = aggregate(valid ~ pattern + method, res, sum)
total = aggregate(total ~ pattern + method, res, sum)
fdr = merge(valid, total, by = c("pattern", "method"))
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr = fdr[order(fdr$method),]
fdr

Unnamed: 0,pattern,method,valid,total,fdr
1,high_het,high_het,252,274,0.08029197
8,identity,high_het,232,249,0.06827309
15,low_het,high_het,231,260,0.11153846
22,mid_het,high_het,252,273,0.07692308
29,mixture01,high_het,211,238,0.11344538
36,shared,high_het,221,247,0.10526316
43,singleton,high_het,153,166,0.07831325
2,high_het,identity,253,273,0.07326007
9,identity,identity,232,250,0.072
16,low_het,identity,231,260,0.11153846


In [14]:
aggregate(fdr~method, fdr, mean)

method,fdr
high_het,0.09057834
identity,0.09238118
low_het,0.0777091
mid_het,0.0851832
mixture_1,0.05784747
shared,0.05375328
singleton,0.05852513


### lfsr for effect size estimates

In [21]:
invalid = aggregate(false_discoveries ~ pattern + method, res, sum)
total = aggregate(total_discoveries ~ pattern + method, res, sum)
lfsr = merge(invalid, total, by = c("pattern", "method"))
lfsr$lfsr = lfsr$false_discoveries/lfsr$total_discoveries
lfsr = lfsr[order(lfsr$method),]
lfsr = lfsr[which(lfsr$pattern != 'singleton'),]
lfsr

Unnamed: 0,pattern,method,false_discoveries,total_discoveries,lfsr
1,high_het,high_het,35,490,0.07142857
8,identity,high_het,20,385,0.05194805
15,low_het,high_het,24,434,0.05529954
22,mid_het,high_het,10,400,0.025
29,mixture01,high_het,31,329,0.09422492
36,shared,high_het,15,450,0.03333333
2,high_het,identity,30,485,0.06185567
9,identity,identity,20,385,0.05194805
16,low_het,identity,24,429,0.05594406
23,mid_het,identity,10,400,0.025


In [22]:
aggregate(lfsr~method, lfsr, mean)

method,lfsr
high_het,0.05520574
identity,0.05396002
low_het,0.04839321
mid_het,0.05504637
mixture_1,0.03955569
shared,0.08009451
singleton,0.17266583


## Power for effect size estimates
Total number of true discoveries over total number of signals to detect??