-
Notifications
You must be signed in to change notification settings - Fork 2
/
complete_tree.Rmd
260 lines (221 loc) · 9.45 KB
/
complete_tree.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
---
title: "Using secsse with complete phylogenies (with extinction)"
author: "Pedro Santos Neves"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Using secsse with complete phylogenies (with extinction)}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Introduction
Most current studies of evolutionary dynamics make use of molecular phylogenies,
which, for most groups, contain only information on extant species. However,
when data on extinct species is available, usually through the presence of
fossil data, we can use complete trees. Thus, we can leverage the data
from extinct lineages for maximum-likelihood estimation with secsse.
Note that here "complete tree" should not be taken as a complete sampling
fraction, that is, all known species being present in the phylogeny and there
being no missing data, but rather the assumption that all currently extinct
species are included. This follows the nomenclature of Nee et al. (1994),
who also coined the term "reconstructed tree" for phylogenies for which there
is no information on extinct lineages.
## Set-up
Like all ML analyses with secsse, we first need a few things to start with,
starting with a dated phylogeny. For the purpose of this vignette, we are going
to simulate phylogenies with `secsse_sim()`. We will simulate a reconstructed
and a complete version of the same tree under the CR model.
In order to simulate the trees, we need to specify the model and set starting
parameters. Here we simulate a similar constant rate (CR) example to that of
the _Simulating with secsse_ vignette. For more details on this model and full
details of the functionality of `secsse_sim()`, see
`vignette("sim_with_secsse", package = "secsse")`.
```{r sim_plot_tree}
library(secsse)
spec_matrix <- c()
spec_matrix <- rbind(spec_matrix, c(0, 0, 0, 1))
spec_matrix <- rbind(spec_matrix, c(1, 1, 1, 1))
lambda_list <- secsse::create_lambda_list(state_names = c(0, 1),
num_concealed_states = 2,
transition_matrix = spec_matrix,
model = "CR")
mu_vector <- secsse::create_mu_vector(state_names = c(0, 1),
num_concealed_states = 2,
model = "CR",
lambda_list = lambda_list)
shift_matrix <- c()
shift_matrix <- rbind(shift_matrix, c(0, 1, 3))
shift_matrix <- rbind(shift_matrix, c(1, 0, 4))
q_matrix <- secsse::create_q_matrix(state_names = c(0, 1),
num_concealed_states = 2,
shift_matrix = shift_matrix,
diff.conceal = FALSE)
# Set-up starting parameters
speciation_rate <- 0.8
extinction_rate <- 0.2
q_01 <- 0.1
q_10 <- 0.1
used_params <- c(speciation_rate, extinction_rate, q_01, q_10)
sim_lambda_list <- secsse::fill_in(lambda_list, used_params)
sim_mu_vector <- secsse::fill_in(mu_vector, used_params)
sim_q_matrix <- secsse::fill_in(q_matrix, used_params)
# Simulate and plot the tree
sim_tree_complete <- secsse::secsse_sim(lambdas = sim_lambda_list,
mus = sim_mu_vector,
qs = sim_q_matrix,
crown_age = 5,
num_concealed_states = 2,
seed = 40,
drop_extinct = FALSE)
if (requireNamespace("diversitree")) {
traits_for_plot_complete <- data.frame(
trait = as.numeric(sim_tree_complete$obs_traits),
row.names = sim_tree_complete$phy$tip.label
)
diversitree::trait.plot(tree = sim_tree_complete$phy,
dat = traits_for_plot_complete,
cols = list("trait" = c("blue", "red")),
type = "p")
} else {
plot(sim_tree_complete$phy)
}
```
## Fitting the model
Finally, we run `secsse_ml()` on our complete tree, much in the same way as we
would for one with extant species. However, this time we make sure to set the
`is_complete_tree` argument to `TRUE` (defaults to `FALSE` if omitted). This
enables secsse to use the information present in extinct lineages.
```{r fitting_model_complete_tree}
idparsopt <- 1:4 # our maximum rate parameter was 4 -> We are keeping
# concealed and examined traits the same for the MLE.
idparsfix <- c(0) # we want to keep all zeros at zero
initparsopt <- rep(0.1, 4)
initparsfix <- c(0.0) # all zeros remain at zero.
sampling_fraction <- c(1, 1)
idparslist <- list()
idparslist[[1]] <- lambda_list
idparslist[[2]] <- mu_vector
idparslist[[3]] <- q_matrix
complete_tree_ml_CR <- secsse_ml(phy = sim_tree_complete$phy,
traits = sim_tree_complete$obs_traits,
num_concealed_states = 2,
idparslist = idparslist,
idparsopt = idparsopt,
initparsopt = initparsopt,
idparsfix = idparsfix,
parsfix = initparsfix,
sampling_fraction = sampling_fraction,
verbose = FALSE)
```
Now we can see what our results look like.
```{r complete_tree_res}
CR_par_complete <- secsse::extract_par_vals(idparslist, complete_tree_ml_CR$MLpars)
complete_tree_ml_CR
CR_par_complete
spec_rates_complete <- CR_par_complete[1]
ext_rates_complete <- CR_par_complete[2]
Q_01_complete <- CR_par_complete[3]
Q_10_complete <- CR_par_complete[4]
spec_rates_complete
ext_rates_complete
Q_01_complete
Q_10_complete
```
## Comparing with reconstructed trees
It would be interesting to see how they compare with the same tree without any
extant species. Let's follow the standard procedure using a similar phylogeny -
the same tree we used before - but where all the extinct lineages have been
removed. We'll keep all other model specification the same.
```{r fitting_ml_reconstructed_tree}
sim_tree_reconstructed <- secsse::secsse_sim(lambdas = sim_lambda_list,
mus = sim_mu_vector,
qs = sim_q_matrix,
crown_age = 5,
num_concealed_states = 2,
seed = 40,
drop_extinct = TRUE)
if (requireNamespace("diversitree")) {
traits_for_plot_reconstructed <- data.frame(
trait = as.numeric(sim_tree_reconstructed$obs_traits),
row.names = sim_tree_reconstructed$phy$tip.label
)
diversitree::trait.plot(tree = sim_tree_reconstructed$phy,
dat = traits_for_plot_reconstructed,
cols = list("trait" = c("blue", "red")),
type = "p")
} else {
plot(sim_tree_reconstructed$phy)
}
reconstructed_tree_ml <- secsse_ml(phy = sim_tree_reconstructed$phy,
traits = sim_tree_reconstructed$obs_traits,
num_concealed_states = 2,
idparslist = idparslist,
idparsopt = idparsopt,
initparsopt = initparsopt,
idparsfix = idparsfix,
parsfix = initparsfix,
sampling_fraction = sampling_fraction,
verbose = FALSE,
is_complete_tree = FALSE)
```
```{r reconstructed_tree_res_comparison}
reconstructed_tree_ml_CR <- reconstructed_tree_ml$ML
CR_par_reconstructed <- secsse::extract_par_vals(
idparslist,
reconstructed_tree_ml$MLpars
)
reconstructed_tree_ml
CR_par_reconstructed
spec_rates_reconstructed <- CR_par_reconstructed[1]
ext_rates_reconstructed <- CR_par_reconstructed[2]
Q_01_reconstructed <- CR_par_reconstructed[3]
Q_10_reconstructed <- CR_par_reconstructed[4]
knitr::kable(
data.frame(
Reconstructed_tree = c(
spec_rates_reconstructed,
ext_rates_reconstructed,
Q_01_reconstructed,
Q_10_reconstructed
),
Complete_tree = c(
spec_rates_complete,
ext_rates_complete,
Q_01_complete,
Q_10_complete
),
Generating_parameters = c(
speciation_rate,
extinction_rate,
q_01,
q_10
),
row.names = c(
"Speciation rate",
"Extinction rate",
"Transition rate 01",
"Transition rate 10"
)
)
)
```
We see that including extinct species results in a better esimation
particularly of the extinction rate. This effect is especially noticeable if
there are many extinct species present in the tree.
Additionally, we see that the estimation of the transition rate from state 1 to
0 also improved.
As a final note, do note that this is just a simple simulation example and
care should be exercised with model selection and specification when fitting
secsse to empirical datasets to make predictions about evolutionary patterns.
## References
Nee S, May RM, Harvey PH. The reconstructed evolutionary process. Philos Trans
R Soc Lond B Biol Sci. 1994 May 28;344(1309):305-11.
https://doi.org/10.1098/rstb.1994.0068.