/
function.Rmd
208 lines (151 loc) · 9.16 KB
/
function.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
---
title: "Function Reference"
author: Xiang Zhu
output: workflowr::wflow_html
editor_options:
chunk_output_type: console
---
[Zhu and Stephens (2017)]: https://projecteuclid.org/euclid.aoas/1507168840
[Zhu and Stephens (2018)]: https://www.nature.com/articles/s41467-018-06805-x
[`rss_bvsr.m`]: https://github.com/stephenslab/rss/blob/master/src/rss_bvsr.m
[(Guan and Stephens, 2011)]: https://projecteuclid.org/euclid.aoas/1318514285
[Guan and Stephens (2011)]: https://projecteuclid.org/euclid.aoas/1318514285
[`rss_bslmm.m`]: https://github.com/stephenslab/rss/blob/master/src/rss_bslmm.m
[(Zhou et al, 2013)]: https://doi.org/10.1371/journal.pgen.1003264
[Zhou et al (2013)]: https://doi.org/10.1371/journal.pgen.1003264
[`rss_ash.m`]: https://github.com/stephenslab/rss/blob/master/src/rss_ash.m
[(Stephens, 2017)]: https://doi.org/10.1093/biostatistics/kxw041
[Carbonetto and Stephens (2012)]: https://projecteuclid.org/euclid.ba/1339616726
[`rss_varbvsr.m`]: https://github.com/stephenslab/rss/blob/master/src_vb/rss_varbvsr.m
[`rss_varbvsr_squarem.m`]: https://github.com/stephenslab/rss/blob/master/src_vb/rss_varbvsr_squarem.m
[(Varadhan and Roland, 2008)]: https://doi.org/10.1111/j.1467-9469.2007.00585.x
[`rss_varbvsr_bigmem_squarem.m`]: https://github.com/stephenslab/rss/blob/master/src_vb/rss_varbvsr_bigmem_squarem.m
[Parallel Computing Toolbox]: https://www.mathworks.com/help/distcomp/index.html
## Markov chain Monte Carlo (MCMC)
Details of MCMC algorithms for [`rss_bvsr.m`][] and [`rss_bslmm.m`][] are
available in Supplementary Appendix B of [Zhu and Stephens (2017)][]
Details of MCMC algorithms for [`rss_ash.m`][] are available in this
[unpublished note](https://www.stat.uchicago.edu/~xiangzhu/rss_mcmc.pdf)<sup>1</sup>.
Note that only [`rss_bvsr.m`][] and [`rss_bslmm.m`][] were used to
generate results in [Zhu and Stephens (2017)][].
### [`rss_bvsr.m`][]
Fit the RSS-BVSR model that consists of RSS likelihood
and BVSR prior [(Guan and Stephens, 2011)][]:
$$
\begin{aligned}
\widehat{\boldsymbol\beta} &\sim {\cal N}({\bf SRS}^{-1}{\boldsymbol\beta},{\bf SRS}),\\
\beta_j &\sim \pi\cdot{\cal N}(0,\sigma_B^2) + (1-\pi)\cdot\delta_0,
\end{aligned}
$$
using a Metropolis-Hastings algorithm.
### [`rss_bslmm.m`][]
Fit the RSS-BSLMM model that consists of RSS likelihood
and BSLMM prior [(Zhou et al, 2013)][]:
$$
\begin{aligned}
\widehat{\boldsymbol\beta} &\sim {\cal N}({\bf SRS}^{-1}{\boldsymbol\beta},{\bf SRS}),\\
\beta_j &\sim \pi\cdot{\cal N}(0,\sigma_B^2+\sigma_P^2) + (1-\pi)\cdot{\cal N}(0,\sigma_P^2),
\end{aligned}
$$
using a component-wise MCMC algorithm.
### [`rss_ash.m`][]
Fit the RSS-ASH model that consists of RSS likelihood
and ASH prior [(Stephens, 2017)][]:
$$
\begin{aligned}
\widehat{\boldsymbol\beta} &\sim {\cal N}({\bf SRS}^{-1}{\boldsymbol\beta},{\bf SRS}),\\
\beta_j &\sim \pi_0 \cdot \delta_0 + {\textstyle\sum}_{k=1}^K \pi_k \cdot {\cal N}(0,\sigma_k^2),
\end{aligned}
$$
using a component-wise MCMC algorithm.
## Variational Bayes (VB)
Details of VB algorithms, SQUAREM accelerator and parallel implementation
are available in Supplementary Notes of [Zhu and Stephens (2018)][].
Note that only [`rss_varbvsr_squarem.m`][] and [`rss_varbvsr_bigmem_squarem.m`][]
were used to generate results in [Zhu and Stephens (2018)][].
The other functions were developed merely for testing and benchmarking.
### [`rss_varbvsr.m`][]
Fit the following extended RSS-BVSR model
$$
\begin{aligned}
\widehat{\boldsymbol\beta} &\sim {\cal N}({\bf SRS}^{-1}{\boldsymbol\beta},{\bf SRS}),\\
\beta_j &\sim \pi_j\cdot{\cal N}(0,\sigma_j^2) + (1-\pi_j)\cdot\delta_0,
\end{aligned}
$$
using a mean-field VB algorithm.
The VB algorithm largely follows [Carbonetto and Stephens (2012)][].
This is an extended RSS-BVSR model because each SNP $j$ can have
its own hyper-parameters $\{\pi_j,\sigma_j^2\}$,
whereas the standard RSS-BVSR model assumes that all SNPs share
the same hyper-parameters $\{\pi,\sigma_B^2\}$.
### [`rss_varbvsr_squarem.m`][]
This is a variant of [`rss_varbvsr.m`][] with the SQUAREM
accelerator [(Varadhan and Roland, 2008)][] added.
### [`rss_varbvsr_parallel.m`](https://github.com/stephenslab/rss/blob/master/src_vb/rss_varbvsr_parallel.m)
This is a parallel implementation of [`rss_varbvsr.m`][]
based on MATLAB [Parallel Computing Toolbox][].
### [`rss_varbvsr_pasquarem.m`](https://github.com/stephenslab/rss/blob/master/src_vb/rss_varbvsr_pasquarem.m)
This is a parallel implementation of [`rss_varbvsr_squarem.m`][]
based on MATLAB [Parallel Computing Toolbox][].
### [`rss_varbvsr_bigmem.m`](https://github.com/stephenslab/rss/blob/master/src_vb/rss_varbvsr_bigmem.m)
This is a memory-efficient implementation of
[`rss_varbvsr_parallel.m`](https://github.com/stephenslab/rss/blob/master/src_vb/rss_varbvsr_parallel.m).
### [`rss_varbvsr_bigmem_squarem.m`][]
This is a memory-efficient implementation of
[`rss_varbvsr_pasquarem.m`](https://github.com/stephenslab/rss/blob/master/src_vb/rss_varbvsr_pasquarem.m).
## Miscellaneous
### [`import_1000g_vcf.sh`](https://github.com/stephenslab/rss/blob/master/misc/import_1000g_vcf.sh)
Output [1000 Genomes](https://www.internationalgenome.org/data)
phased haplotypes of a given list of SNPs in
[IMPUTE reference-panel format](https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#input_options).
### [`compute_pve.m`](https://github.com/stephenslab/rss/blob/master/src/compute_pve.m)
Use GWAS summary data to estimate PVE (or SNP heritability),
a quantity defined by Equation 2.10 in [Guan and Stephens (2011)][].
This function corresponds to Equation 3.7 in [Zhu and Stephens (2017)][].
### [`band_storage.m`](https://github.com/stephenslab/rss/blob/master/misc/band_storage.m)
Convert a symmetric, banded matrix to a compact matrix in such a way
that only the main diagonal and the nonzero super-diagonals are stored.
This function is used to reduce the file size of a large LD matrix.
### [`find_bandwidth.m`](https://github.com/stephenslab/rss/blob/master/misc/find_bandwidth.m)
Find the bandwidth of a symmetric, banded matrix.
### [`get_corr.m`](https://github.com/stephenslab/rss/blob/master/misc/get_corr.m)
Compute linkage disequilibrium (LD) matrix using the shrinkage estimator proposed in
[Wen and Stephens (2010)](https://www.ncbi.nlm.nih.gov/pubmed/21479081).
This function is also implemented in an `R` package
[`ldshrink`](https://github.com/stephenslab/ldshrink).
### [`data_maker.m`](https://github.com/stephenslab/rss/blob/master/misc/data_maker.m)
Simulate phenotype data from the genome-wide multiple-SNP model
described in [Zhou et al (2013)][],
and then compute the single-SNP summary statistics for each SNP.
This function was used in some simulation studies of [Zhu and Stephens (2017)][].
### [`enrich_datamaker.m`](https://github.com/stephenslab/rss/blob/master/misc/enrich_datamaker.m)
Simulate phenotype data from the genetic association enrichment model described in
[Carbonetto and Stephens (2013)](https://doi.org/10.1371/journal.pgen.1003770),
and then compute the single-SNP summary statistics for each SNP.
This function was used in some simulation studies of [Zhu and Stephens (2018)][].
### [`null_single.m`](https://github.com/stephenslab/rss/blob/master/src_vb/null_single.m) & [`null_template.m`](https://github.com/stephenslab/rss/blob/master/src_vb/null_template.m)
Fit genome-wide multiple-SNP "baseline model" to single-SNP summary data, using
[`rss/src_vb` functions](https://github.com/stephenslab/rss/tree/master/src_vb).
These scripts were used in data analyses of [Zhu and Stephens (2018)][].
### [`gsea_wrapper.m`](https://github.com/stephenslab/rss/blob/master/src_vb/gsea_wrapper.m) & [`gsea_template.m`](https://github.com/stephenslab/rss/blob/master/src_vb/gsea_template.m)
Fit genome-wide multiple-SNP "enrichment model" to single-SNP summary data, using
[`rss/src_vb` functions](https://github.com/stephenslab/rss/tree/master/src_vb).
These scripts were used in data analyses of [Zhu and Stephens (2018)][].
### [`null_wrapper_fixsb.m`](https://github.com/stephenslab/rss/blob/master/src_vb/null_wrapper_fixsb.m) & [`gsea_wrapper_fixsb.m`](https://github.com/stephenslab/rss/blob/master/src_vb/gsea_wrapper_fixsb.m)
Fit genome-wide multiple-SNP "baseline model" and "enrichment model" to single-SNP summary data,
using a fixed prior variance of causal genetic effects ($\sigma_B^2$) in
[`rss/src_vb` functions](https://github.com/stephenslab/rss/tree/master/src_vb).
These scripts were used in simulation studies of [Zhu and Stephens (2018)][].
### [`ash_lrt_31traits.R`](https://github.com/stephenslab/rss/blob/master/misc/ash_lrt_31traits.R)
Compute a simple likelihood ratio as a sanity check for the more
complicated enrichment analysis method developed in [Zhu and Stephens (2018)][].
This likelihood ratio calculation is based on an `R` package
[`ashr`](https://cran.r-project.org/web/packages/ashr/index.html).
--------
**Footnotes:**
1. There is a missing multiplier term ${\omega_k}/{\omega_1}$
on the right-hand side of Equation 5.1 in this note (page 4).
However, this missing term was correctly reflected in the source codes
of [`rss_ash.m`][], in particular, Lines 80-81 of
[`update_bz.m`](https://github.com/stephenslab/rss/blob/master/src/update_bz.m).
We thank Geyu Zhou <geyu.zhou@yale.edu> for pointing out this typo on 01/14/2019.