# M&M ASH benchmark V

This is a continuation of Part V where I set total PVE is set to 0.15 and assume 2 causal variables per region. But here, the two SNPs have the same effects sampled from the multivariate distribution. Also I use $R = 5$ conditions and run it on $J=1000$ and 150 genes.

The most important difference from previous simulations is that here I mix-match simulated data under different prior assumptions to analyzing them with different priors. I expect to observe that:

1. The "oracle" prior is always better than using other priors, for all scenarios.
2. Mixture prior generally performs well in all scenarios -- it is robust to simulation assumptions.

## Conclusion

...

The benchmark was executd on UChicago midway

```
./finemap.dsc --host mnm_R5.yml --R 5
```

This executes the `default` pipeline in `finemap.dsc` file, as of today (2019.02.04).

In [1]:
%cd ~/GIT/github/mnm-twas/dsc

/scratch/midway2/gaow/GIT/github/mnm-twas/dsc

In [2]:
start_time <- Sys.time()
library('dscrutils')
out = dscquery('finemap_output', "sharing_pattern mnm.eff_mode susie_scores.total susie_scores.valid susie_scores.size susie_scores.purity susie_scores.top susie_scores.n_causal susie_scores.overlap", omit.file.columns = T)
end_time <- Sys.time()

Loading dsc-query output from CSV file.
Reading DSC outputs:
 - susie_scores.total: extracted atomic values
 - susie_scores.valid: extracted atomic values
 - susie_scores.size: extracted atomic values
 - susie_scores.purity: extracted atomic values
 - susie_scores.top: extracted atomic values
 - susie_scores.n_causal: extracted atomic values
 - susie_scores.overlap: extracted atomic values


In [3]:
end_time - start_time

Time difference of 2.014167 mins

In [4]:
head(out)

DSC,sharing_pattern,mnm,mnm.eff_mode,susie_scores.total,susie_scores.valid,susie_scores.size,susie_scores.purity,susie_scores.top,susie_scores.n_causal,susie_scores.overlap
1,identity,mnm_identity,identity,2,2,3.5,1.0,1,2,0
1,identity,mnm_identity,identity,1,1,1.0,1.0,1,1,0
1,identity,mnm_identity,identity,1,1,9.0,0.9886828,0,1,0
1,identity,mnm_identity,identity,1,1,10.0,1.0,0,1,0
1,identity,mnm_identity,identity,1,1,13.0,0.9875783,0,1,0
1,identity,mnm_identity,identity,1,1,4.0,0.9960325,0,1,0


In [6]:
res = out[,c(2,4,5,6,7,8,9,10,11)]
colnames(res) = c('pattern', 'method', 'total', 'valid', 'size', 'purity', 'top_hit', 'total_true', 'overlap')

### Purity of CS

In [7]:
aggregate(purity~pattern + method, res, mean)

pattern,method,purity
high_het,high_het,0.9916862
identity,high_het,0.9917146
low_het,high_het,0.9884244
mid_het,high_het,0.9923806
mixture01,high_het,0.9757964
shared,high_het,0.9904957
singleton,high_het,0.9241387
high_het,identity,0.9913949
identity,identity,0.9917
low_het,identity,0.9886923


### Size of CS

In [8]:
aggregate(size~pattern+method, res, median)

pattern,method,size
high_het,high_het,2.25
identity,high_het,3.0
low_het,high_het,2.0
mid_het,high_het,3.0
mixture01,high_het,4.0
shared,high_het,2.5
singleton,high_het,6.25
high_het,identity,2.25
identity,identity,3.0
low_het,identity,2.0


### Power

In [11]:
valid = aggregate(valid ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
overlap = aggregate(overlap ~ pattern + method, res, mean)
power = merge(valid, total_true, by = c("pattern", "method"))
power = merge(power, overlap,  by = c("pattern", "method"))
power$power = power$valid/power$total_true
power

pattern,method,valid,total_true,overlap,power
high_het,high_het,238,250,7.04,0.952
high_het,identity,240,250,6.88,0.96
high_het,low_het,241,250,3.12666667,0.964
high_het,mid_het,241,250,2.65333333,0.964
high_het,mixture_1,235,250,0.0,0.94
high_het,shared,264,250,60.69333333,1.056
high_het,singleton,683,250,209.06,2.732
identity,high_het,254,266,0.19333333,0.9548872
identity,identity,255,266,0.19333333,0.9586466
identity,low_het,255,266,0.36666667,0.9586466


### FDR

In [12]:
valid = aggregate(valid ~ pattern + method, res, sum)
total = aggregate(total ~ pattern + method, res, sum)
fdr = merge(valid, total, by = c("pattern", "method"))
fdr$fdr = (fdr$total - fdr$valid)/fdr$total
fdr

pattern,method,valid,total,fdr
high_het,high_het,238,259,0.08108108
high_het,identity,240,260,0.07692308
high_het,low_het,241,257,0.06225681
high_het,mid_het,241,259,0.06949807
high_het,mixture_1,235,245,0.04081633
high_het,shared,264,267,0.01123596
high_het,singleton,683,726,0.05922865
identity,high_het,254,271,0.06273063
identity,identity,255,272,0.0625
identity,low_het,255,269,0.05204461


### Top-hit rate (how often the strongest SNP is causal)

In [20]:
top_hit = aggregate(top_hit ~ pattern + method, res, sum)
total_true = aggregate(total_true ~ pattern + method, res, sum)
top_rate = merge(top_hit, total_true, by = c("pattern", "method"))
top_rate$top_rate = top_rate$top_hit/top_rate$total_true
top_rate

pattern,method,top_hit,total_true,top_rate
high_het,high_het,174,300,0.58
high_het,identity,174,300,0.58
high_het,low_het,171,300,0.57
high_het,mid_het,172,300,0.57333333
high_het,mixture_1,164,300,0.54666667
high_het,shared,99,300,0.33
high_het,singleton,404,300,1.34666667
identity,high_het,174,300,0.58
identity,identity,174,300,0.58
identity,low_het,174,300,0.58
