/
sub-models.Rmd
278 lines (193 loc) · 7.07 KB
/
sub-models.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
---
title: "Models of nucleotide substitution"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Models of nucleotide substitution}
%\VignetteEngine{knitr::rmarkdown}
%\usepackage[utf8]{inputenc}
---
```{r setup, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
options(tibble.print_min = 4L, tibble.print_max = 4L)
library(jackalope)
set.seed(654612)
```
## Introduction
This document outlines the models of substitution used in the package.
The matrices below are substitution-rate matrices for each model.
The rates within these matrices are ordered as follows:
$$
\begin{bmatrix}
\cdot & T\rightarrow C & T\rightarrow A & T\rightarrow G \\
C\rightarrow T & \cdot & C\rightarrow A & C\rightarrow G \\
A\rightarrow T & A\rightarrow C & \cdot & A\rightarrow G \\
G\rightarrow T & G\rightarrow C & G\rightarrow A & \cdot
\end{bmatrix}
$$
(For example, $C \rightarrow T$ indicates that the cell in that location refers to
the rate from $C$ to $T$.)
Diagonals are determined based on all rows having to sum to zero (Yang 2006).
Under each rate matrix are listed the parameters in `create_mevo` required for that model.
Below is a key of the parameters required in `create_mevo` for the models below,
in order of their appearance:
* `lambda`: $\lambda$
* `alpha` $\alpha$
* `beta` $\beta$
* `pi_tcag` vector of $\pi_T$, $\pi_C$, $\pi_A$, then $\pi_G$
* `alpha_1` $\alpha_1$
* `alpha_2` $\alpha_2$
* `kappa` transition / transversion rate ratio
* `abcdef` vector of $a$, $b$, $c$, $d$, $e$, then $f$
* `Q`: matrix of all parameters, where diagonals are ignored
## JC69
The JC69 model (Jukes and Cantor 1969) uses a single rate, $\lambda$.
$$
\mathbf{Q} =
\begin{bmatrix}
\cdot & \lambda & \lambda & \lambda \\
\lambda & \cdot & \lambda & \lambda \\
\lambda & \lambda & \cdot & \lambda \\
\lambda & \lambda & \lambda & \cdot
\end{bmatrix}
$$
__Parameters\:__
* `lambda`
## K80
The K80 model (Kimura 1980) uses separate rates for transitions ($\alpha$)
and transversions ($\beta$).
$$
\mathbf{Q} =
\begin{bmatrix}
\cdot & \alpha & \beta & \beta \\
\alpha & \cdot & \beta & \beta \\
\beta & \beta & \cdot & \alpha \\
\beta & \beta & \alpha & \cdot
\end{bmatrix}
$$
__Parameters\:__
* `alpha`
* `beta`
## F81
The F81 model (Felsenstein 1981) incorporates different equilibrium frequency
distributions for each nucleotide ($\pi_X$ for nucleotide $X$).
$$
\mathbf{Q} =
\begin{bmatrix}
\cdot & \pi_C & \pi_A & \pi_G \\
\pi_T & \cdot & \pi_A & \pi_G \\
\pi_T & \pi_C & \cdot & \pi_G \\
\pi_T & \pi_C & \pi_A & \cdot
\end{bmatrix}
$$
__Parameters\:__
* `pi_tcag`
## HKY85
The HKY85 model (Hasegawa et al. 1984, 1985) combines different equilibrium frequency
distributions with unequal transition and transversion rates.
$$
\mathbf{Q} =
\begin{bmatrix}
\cdot & \alpha \pi_C & \beta \pi_A & \beta \pi_G \\
\alpha \pi_T & \cdot & \beta \pi_A & \beta \pi_G \\
\beta \pi_T & \beta \pi_C & \cdot & \alpha \pi_G \\
\beta \pi_T & \beta \pi_C & \alpha \pi_A & \cdot
\end{bmatrix}
$$
__Parameters\:__
* `alpha`
* `beta`
* `pi_tcag`
## TN93
The TN93 model (Tamura and Nei 1993) adds to the HKY85 model by distinguishing
between the two types of transitions:
between pyrimidines ($\alpha_1$) and
between purines ($\alpha_2$).
$$
\mathbf{Q} =
\begin{bmatrix}
\cdot & \alpha_1 \pi_C & \beta \pi_A & \beta \pi_G \\
\alpha_1 \pi_T & \cdot & \beta \pi_A & \beta \pi_G \\
\beta \pi_T & \beta \pi_C & \cdot & \alpha_2 \pi_G \\
\beta \pi_T & \beta \pi_C & \alpha_2 \pi_A & \cdot
\end{bmatrix}
$$
__Parameters\:__
* `alpha_1`
* `alpha_2`
* `beta`
* `pi_tcag`
## F84
The F84 model (Kishino and Hasegawa 1989) is a special case of TN93,
where $\alpha_1 = (1 + \kappa/\pi_Y) \beta$ and $\alpha_2 = (1 + \kappa/\pi_R) \beta$
($\pi_Y = \pi_T + \pi_C$ and $\pi_R = \pi_A + \pi_G$).
$$
\mathbf{Q} =
\begin{bmatrix}
\cdot & (1 + \kappa/\pi_Y) \beta \pi_C &
\beta \pi_A & \beta \pi_G \\
(1 + \kappa/\pi_Y) \beta \pi_T & \cdot &
\beta \pi_A & \beta \pi_G \\
\beta \pi_T & \beta \pi_C &
\cdot & (1 + \kappa/\pi_R) \beta \pi_G \\
\beta \pi_T & \beta \pi_C &
(1 + \kappa/\pi_R) \beta \pi_A & \cdot
\end{bmatrix}
$$
__Parameters\:__
* `beta`
* `kappa`
* `pi_tcag`
## GTR
The GTR model (Tavaré 1986) is the least restrictive model that is still time-reversible
(i.e., the rates $r_{x \rightarrow y} = r_{y \rightarrow x}$).
$$
\mathbf{Q} =
\begin{bmatrix}
\cdot & a \pi_C & b \pi_A & c \pi_G \\
a \pi_T & \cdot & d \pi_A & e \pi_G \\
b \pi_T & d \pi_C & \cdot & f \pi_G \\
c \pi_T & e \pi_C & f \pi_A & \cdot
\end{bmatrix}
$$
__Parameters\:__
* `pi_tcag`
* `abcdef`
## UNREST
The UNREST model (Yang 1994) is entirely unrestrained.
$$
\mathbf{Q} =
\begin{bmatrix}
\cdot & q_{TC} & q_{TA} & q_{TG} \\
q_{CT} & \cdot & q_{CA} & q_{CG} \\
q_{AT} & q_{AC} & \cdot & q_{AG} \\
q_{GT} & q_{GC} & q_{GA} & \cdot
\end{bmatrix}
$$
__Parameters\:__
* `Q`
## References
Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum likelihood
approach. Journal of Molecular Evolution 17:368–376.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a
molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22:160–174.
Hasegawa, M., T. Yano, and H. Kishino. 1984. A new molecular clock of mitochondrial
DNA and the evolution of hominoids. Proceedings of the Japan Academy, Series B
60:95–98.
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pages 21–131 in H.
N. Munro, editor. Mammalian protein metabolism. Academic Press, New York.
Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions
through comparative studies of nucleotide sequences. Journal of Molecular Evolution
16:111–120.
Kishino, H., and M. Hasegawa. 1989.
Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from
DNA sequence data, and the branching order in hominoidea.
Journal of Molecular Evolution 29:170-179.
Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide substitutions in the
control region of mitochondrial dna in humans and chimpanzees. Molecular Biology and
Evolution 10:512–526.
Tavaré, S. 1986. Some probabilistic and statistical problems in the analysis of DNA
sequences. Lectures on Mathematics in the Life Sciences 17:57–86.
Yang, Z. B. 1994. Estimating the pattern of nucleotide substitution. Journal of
Molecular Evolution 39:105–111.
Yang, Z. 2006. *Computational molecular evolution*. (P. H. Harvey and R. M. May, Eds.).
Oxford University Press, New York, NY, USA.