From 0bd97759de5d6d39759f6419e6fc1e51b24f4cf6 Mon Sep 17 00:00:00 2001 From: Michael Matschiner Date: Sun, 12 Aug 2018 15:37:11 +0200 Subject: [PATCH] Text polishing --- README.md | 108 ++++++++++++++++++++++++++---------------------------- 1 file changed, 52 insertions(+), 56 deletions(-) diff --git a/README.md b/README.md index 4536289..871eb0d 100644 --- a/README.md +++ b/README.md @@ -9,14 +9,12 @@ beastversion: 2.5.0 # Background -In Bayesian divergence-time estimation, phylogenies are commonly time calibrated through the specification of calibration densities on nodes representing clades with known fossil occurrences. Unfortunately, the optimal shape of these calibration densities is usually unknown and they are therefore often chosen arbitrarily, which directly impacts the reliability of the resulting age estimates. CladeAge overcomes this limitation by calculating optimal calibration densities for clades with fossil records, based on estimates for diversification rates and the sampling rate for fossils. CladeAge thus shares similarities with the Fossilized Birth-Death (FBD) process; however, while the FBD model assumes that the fossil record is either completely or randomly sampled, the CladeAge model assumes that only information about the oldest fossil of each clade is available. CladeAge allows uncertainty in the diversification- and sampling-rate estimates, but unlike with the FBD model, these parameters must be known _a priori_ and can not be estimated as part of the analysis. +In Bayesian divergence-time estimation, phylogenies are commonly time calibrated through the specification of calibration densities on nodes that represent clades with known fossil occurrences. Unfortunately, the optimal shape of these calibration densities is usually unknown and they are therefore often chosen arbitrarily, which directly impacts the reliability of the resulting age estimates. CladeAge overcomes this limitation by calculating optimal calibration densities for clades with fossil records, based on estimates for diversification rates and the sampling rate for fossils. CladeAge thus shares similarities with the Fossilized Birth-Death (FBD) process; however, while the FBD model assumes that the fossil record is either completely or randomly sampled, the CladeAge model assumes that only information about the oldest fossil of each clade is available and can provide unbiased age estimates in such cases. CladeAge allows uncertainty in the diversification- and sampling-rate estimates, but unlike with the FBD model, these parameters must be known _a priori_ and can not be estimated as part of the analysis. - - ---- -# Programs used in this Exercise +# Programs Used in This Tutorial ### BEAST2 - Bayesian Evolutionary Analysis Sampling Trees 2 @@ -34,16 +32,16 @@ Tracer ([http://tree.bio.ed.ac.uk/software/tracer](http://tree.bio.ed.ac.uk/soft ### TreeAnnotator -TreeAnnotator is used to summarize the posterior sample of trees to produce a maximum clade credibility tree. It can also be used to summarize and visualize the posterior estimates of other tree parameters (e.g. node height). TreeAnnotator is provided as a part of the BEAST2 package so you do not need to install it separately. +TreeAnnotator is used to summarize the posterior sample of trees to produce a maximum-clade-credibility tree. It can also be used to summarize and visualize the posterior estimates of other tree parameters (e.g. node height). TreeAnnotator is provided as a part of the BEAST2 package so you do not need to install it separately. ### FigTree -FigTree ([http://tree.bio.ed.ac.uk/software/figtree](http://tree.bio.ed.ac.uk/software/figtree)) is a program for viewing trees and producing publication-quality figures. It can interpret the node-annotations created on the summary trees by TreeAnnotator, allowing the user to display node-based statistics (e.g. posterior probabilities). We will be using FigTree v1.4.2. +FigTree ([http://tree.bio.ed.ac.uk/software/figtree](http://tree.bio.ed.ac.uk/software/figtree)) is a program for viewing trees and producing publication-quality figures. It can interpret the node annotations created on the summary trees by TreeAnnotator, allowing the user to display node-based statistics (e.g. posterior probabilities). We will be using FigTree v1.4.2. ---- -# Practical: Fossil-based divergence-time estimation with CladeAge +# Tutorial: Fossil-Based Divergence-Time Estimation With CladeAge In this tutorial, we are going to use a multi-marker sequence dataset in combination with fossil information to estimate the divergence times of cichlid fishes with the CladeAge model. @@ -53,13 +51,13 @@ The aim of this tutorial is to: - Learn how to estimate divergence times with CladeAge. -## The Data +## The data -The analyses in this tutorial will be based on the dataset of Near et al. {% cite Near2013 -A --file CladeAge-Tutorial/master-refs.bib %}, comprising alignments for ten nuclear genes. In their study, Near et al. {% cite Near2013 -A --file CladeAge-Tutorial/master-refs.bib %} used this dataset to estimate divergence times of spiny-rayed fishes (=Acanthomorphata), and to identify shifts in diversification rates among different groups of these fishes. While the dataset of Near et al. {% cite Near2013 -A --file CladeAge-Tutorial/master-refs.bib %} did not focus on cichlid diversification, it included nine cichlid species among the 520 species sampled for the extensive phylogeny. Thus, we can here use part of this dataset of Near et al. {% cite Near2013 -A --file CladeAge-Tutorial/master-refs.bib %} to estimate early divergences among cichlid fishes. To facilitate the analyses in this tutorial, we will reduce the dataset of Near et al. {% cite Near2013 -A --file CladeAge-Tutorial/master-refs.bib %} to sequences of 24 selected species. These species represent divergent cichlid lineages as well as the most ancestral groups of spiny-rayed fishes so that the fossil record of these lineages can be employed for calibration. +The analyses in this tutorial will be based on the dataset of Near et al. {% cite Near2013 -A --file CladeAge-Tutorial/master-refs.bib %}, comprising alignments for ten nuclear genes. In their study, Near et al. {% cite Near2013 -A --file CladeAge-Tutorial/master-refs.bib %} used this dataset to estimate divergence times of spiny-rayed fishes (=Acanthomorphata), and to identify shifts in diversification rates among different groups of these fishes. While the dataset of Near et al. {% cite Near2013 -A --file CladeAge-Tutorial/master-refs.bib %} did not focus on cichlid diversification, it included nine cichlid species among the 608 species sampled for the extensive phylogeny. Thus, we can here use part of this dataset of Near et al. {% cite Near2013 -A --file CladeAge-Tutorial/master-refs.bib %} to estimate early divergences among cichlid fishes. To facilitate the analyses in this tutorial, we will reduce the dataset of Near et al. {% cite Near2013 -A --file CladeAge-Tutorial/master-refs.bib %} to sequences of 24 selected species. These species represent divergent cichlid lineages as well as the most ancestral groups of spiny-rayed fishes so that the fossil record of these lineages can be employed for calibration. ## The CladeAge model -The approach of CladeAge is similar to the more traditional node-dating approach in the sense that prior densities are defined for the ages of different clades, and the minimum ages of these prior densities are provided by the oldest fossils of these clades. However, important differences exist between the CladeAge approach and node dating: First, the shape of age-prior densities is informed by a model of diversification and fossil sampling in the CladeAge approach, whereas in node dating, parametric distributions (e.g. lognormal or gamma) with more or less arbitrarily chosen parameters were usually applied. Second, because of the quantitative model used in the CladeAge approach, the clades used for calibration should also not be chosen at will. Instead, strictly all clades included in the phylogeny that (i) have a fossil record, (ii) are morphologically recognizable, and (iii) have their sister lineage also included in the phylogeny should be constrained according to the age of their oldest fossil. A consequence of this is that clades are constrained even when their known sister lineage has an older fossil record, and that the same fossil may be used to constrain not just one clade, but multiple nested clades, if the more inclusive clades do not have an even older fossil record. More details about these criteria can be found in our paper on CladeAge {% cite Matschiner2017 --file CladeAge-Tutorial/master-refs.bib %}, and further information on using CladeAge is given in our [Rough Guide to CladeAge](http://evoinformatics.eu/cladeage.pdf). +The approach of CladeAge is similar to the more traditional node-dating approach in the sense that prior densities are defined for the ages of different clades, and the minimum ages of these prior densities are provided by the oldest fossils of these clades. However, important differences exist between the CladeAge approach and node dating: First, the shape of age-prior densities is informed by a model of diversification and fossil sampling in the CladeAge approach, whereas in node dating, parametric distributions (e.g. lognormal or gamma) with more or less arbitrarily chosen parameters are usually applied. Second, because of the quantitative model used in the CladeAge approach, the clades used for calibration should also not be chosen at will. Instead, strictly all clades included in the phylogeny that (i) have a fossil record, (ii) are morphologically recognizable, and (iii) have their sister lineage also included in the phylogeny should be constrained according to the age of their oldest fossil. A consequence of this is that clades are constrained even when their known sister lineage has an older fossil record, and that the same fossil may be used to constrain not just one clade, but multiple nested clades, if the more inclusive clades do not have an even older fossil record. More details about these criteria can be found in our paper on CladeAge {% cite Matschiner2017 --file CladeAge-Tutorial/master-refs.bib %}, and further information on using CladeAge is given in our [Rough Guide to CladeAge](http://evoinformatics.eu/cladeage.pdf). ## Divergence-time estimation with CladeAge @@ -132,7 +130,7 @@ This will open a window for the BEAST2 Package Manager. In this window, select "
BEAUti -
Figure 1: Install the CladeAge package.
+
Figure 1: Installing the CladeAge package.
> Close and reopen BEAUti. @@ -150,7 +148,7 @@ You should then see that an additional tab has been added named "Clade Ages", as > Click on "Import Alignment" in BEAUti's "File" menu, and select the alignment file `Near_et_al_red.nex`. -BEAUti should then recognize 30 different partitions, one for each codon position of each of the ten markers. The BEAUti window should then look as shown in the screenshot below. +BEAUti should recognize 30 different partitions, one for each codon position of each of the ten markers. The BEAUti window should then look as shown in the screenshot below.
@@ -168,7 +166,7 @@ BEAUti should then recognize 30 different partitions, one for each codon positio > Move on to the "Site Model" tab to select the site model for all partitions. -Instead of selecting a model such as HKY or GTR, I highly recommend the use of the model averaging implemented in the bModelTest package {% cite Bouckaert2017 --file CladeAge-Tutorial/master-refs.bib %}. If you did not already install this package, you can do so with the BEAST2 Package Manager from BEAUti as described above for the installation of the CladeAge package (don't forget to close and reopen BEAUti after installation to see changes to the interface). More information on model averaging with the bModelTest package is provided in the [Substitution Model Averaging](https://taming-the-beast.org/tutorials/Substitution-model-averaging/) tutorial. While recommended, model averaging with bModelTest is not required for this CladeAge tutorial, and you could also pick a model such as HKY or GTR instead. The description given here, however, assumes that the bModelTest package has been installed. +Instead of selecting a model such as HKY or GTR, I highly recommend the use of the model-averaging approach implemented in the bModelTest package {% cite Bouckaert2017 --file CladeAge-Tutorial/master-refs.bib %}. If you did not already install this package, you can do so with the BEAST2 Package Manager from BEAUti as described above for the installation of the CladeAge package (don't forget to close and reopen BEAUti after installation to see changes to the interface). More information on model averaging with the bModelTest package is provided in the [Substitution Model Averaging](https://taming-the-beast.org/tutorials/Substitution-model-averaging/) tutorial. While recommended, model averaging with bModelTest is not required for this CladeAge tutorial, and you could also pick a model such as HKY or GTR instead. The description given here, however, assumes that the bModelTest package has been installed. > Select "BEAST Model Test" from the drop-down menu at the top of the window, as shown below. @@ -236,7 +234,7 @@ The "Clock Model" tab should then look as shown in the screenshot below.
Figure 11: Selecting the tree prior.
-Instead of specifying constraints on monophyly and divergence times in the "Priors" tab, this is done in the separate tab named "Clade Ages" when the CladeAge package is installed. +Instead of specifying constraints on monophyly and divergence times in the "Priors" tab (as is usually the case), this is done in the separate tab named "Clade Ages" when the CladeAge package is installed. > Open the "Clade Ages" tab and click on the "+ Add Prior" button @@ -246,7 +244,7 @@ Instead of specifying constraints on monophyly and divergence times in the "Prio
Figure 12: Selecting a first taxon set for calibration.
-We will now have to specify a rather long list of constraints to make the best possible use of the information provided by the fossil record and to obtain divergence time estimates that are as reliable as possible given our dataset. We'll start simple by specifying that the origin of African cichlid tribe *Heterochromini*, represented in our dataset by *Heterochromis multidens*, must have occurred at least 15.97-33.9 million years ago (Ma), since this is the age of the oldest fossil of Heterochromi, which was reported from the Baid Formation of Saudi Arabia by Lippitsch and Micklich {% cite Lippitsch1998 -A --file CladeAge-Tutorial/master-refs.bib %}. +We will now have to specify a rather long list of constraints to make the best possible use of the information provided by the fossil record and to obtain divergence time estimates that are as reliable as possible given our dataset. We'll start simple by specifying that the origin of African cichlid tribe Heterochromini, represented in our dataset by _Heterochromis multidens_, must have occurred at least 15.97-33.9 million years ago (Ma), since this is the age of the oldest fossil of Heterochromi, which was reported from the Baid Formation of Saudi Arabia by Lippitsch and Micklich {% cite Lippitsch1998 -A --file CladeAge-Tutorial/master-refs.bib %}. > Specify "Heterochromini" in the field next to "Taxon set label", and select only "Heterochromis\_multidensA" as the ingroup of this taxon set. @@ -306,32 +304,32 @@ From this plot, you can see that under the assumption that all specified model p > > Add further fossil constraints for each of the clades listed below (see the Supplementary Material of Matschiner et al. {% cite Matschiner2017 -A --file CladeAge-Tutorial/master-refs.bib %} if interested in details and references): -* **"Other African cichlid tribes"**
Ingroup: *Oreochromis niloticus* ("Oreochromis\_niloticus")
Oldest fossil species: *Mahengechromis* spp.
First occurrence age: 45.0-46.0 Ma -* **"African cichlids"**
Ingroup: *Heterochromis multidens* ("Heterochromis\_multidensA"), *Oreochromis niloticus* ("Oreochromis\_niloticus")
Oldest fossil species: *Mahengechromis* spp.
First occurrence age: 45.0-46.0 Ma -* **"Retroculini and Cichlini"**
Ingroup: *Cichla temensis* ("Cichla\_temensisA")
Oldest fossil species: *Palaeocichla longirostrum*
First occurrence age: 5.332-23.03 Ma -* **"Other Neotropical cichlid tribes"**
Ingroup: *Heros appendictulatus* ("Heros\_appendictulatusA")
Oldest fossil species: *Plesioheros chauliodus*
First occurrence age: 39.9-45.0 Ma -* **"Neotropical cichlids"**
Ingroup: *Cichla temensis* ("Cichla\_temensisA"), *Heros appendictulatus* ("Heros\_appendictulatusA")
Oldest fossil species: *Plesioheros chauliodus*
First occurrence age: 39.9-45.0 Ma -* **"Afro-American cichlids"**
Ingroup: *Cichla temensis* ("Cichla\_temensisA"), *Heros appendictulatus* ("Heros\_appendictulatusA"), *Heterochromis multidens* ("Heterochromis\_multidensA"), *Oreochromis niloticus* ("Oreochromis\_niloticus")
Oldest fossil species: *Mahengechromis* spp.
First occurrence age: 45.0-46.0 Ma -* **"Cichlids"**
*Cichla temensis* ("Cichla\_temensisA"), *Etroplus maculatus* ("Etroplus\_maculatusA"), *Heros appendictulatus* ("Heros\_appendictulatusA"), *Heterochromis multidens* ("Heterochromis\_multidensA"), *Oreochromis niloticus* ("Oreochromis\_niloticus")
Oldest fossil species: *Mahengechromis* spp.
First occurrence age: 45.0-46.0 Ma -* **"Atherinomorphae"**
Ingroup: *Oryzias latipes* ("Oryzias\_latipes")
Oldest fossil species: *Rhamphexocoetus volans*
First occurrence age: 49.1-49.4 Ma -* **"Ovalentaria"**
Ingroup: *Cichla temensis* ("Cichla\_temensisA"), *Etroplus maculatus* ("Etroplus\_maculatusA"), *Heros appendictulatus* ("Heros\_appendictulatusA"), *Heterochromis multidens* ("Heterochromis\_multidensA"), *Oreochromis niloticus* ("Oreochromis\_niloticus"), *Oryzias latipes* ("Oryzias\_latipes")
Oldest fossil species: *Rhamphexocoetus volans*
First occurrence age: 49.1-49.4 Ma -* **"Carangaria"**
Ingroup: *Trachinotus carolinus* ("Trachinotus\_carolinusA")
Oldest fossil species: *Trachicaranx tersus*
First occurrence age: 55.8-57.23 Ma -* **"Anabantiformes"**
Ingroup: *Channa striata* ("Channa\_striataA")
Oldest fossil species: *Osphronemus goramy*
First occurrence age: 45.5-50.7 Ma -* **"Anabantaria"**
Ingroup: *Channa striata* ("Channa\_striataA"), *Monopterus albus* ("Monopterus\_albusA")
Oldest fossil species: *Osphronemus goramy*
First occurrence age: 45.5-50.7 Ma -* **"Eupercaria"**
Ingroup: *Gasterosteus aculeatus* ("Gasterosteus_acuC")
Oldest fossil species: *Cretatriacanthus guidottii*
First occurrence age: 83.5-99.6 Ma -* **"Gobiaria"**
Ingroup: *Astrapogon tellatus* ("Astrapogon\_stellatusA")
Oldest fossil species: *"Gobius" gracilis*
First occurrence age: 30.7-33.9 Ma -* **"Syngnatharia"**
Ingroup: *Aulostomus chinensis* ("Aulostomus\_chinensisA")
Oldest fossil species: *Prosolenostomus lessinii*
First occurrence age: 49.1-49.4 Ma -* **"Pelagaria"**
Ingroup: *Thunnus albacares* ("Thunnus\_albacaresA")
Oldest fossil species: *Eutrichiurides opiensis*
First occurrence age: 56.6-66.043 Ma -* **"Batrachoidiaria"**
Ingroup: *Porichthys notatus* ("Porichthys\_notatusA")
Oldest fossil species: *Louckaichthys novosadi*
First occurrence age: 27.82-33.9 Ma -* **"Ophidiaria"**
Ingroup: *Diplacanthopoma brunnea* ("Diplacanthopoma\_brunneaA")
Oldest fossil species: *Eolamprogrammus senectus*
First occurrence age: 55.8-57.23 Ma -* **"Percomorphaceae"**
Ingroup: *Astrapogon tellatus* ("Astrapogon\_stellatusA"), *Aulostomus chinensis* ("Aulostomus\_chinensisA"), *Channa striata* ("Channa\_striataA"), *Cichla temensis* ("Cichla\_temensisA"), *Diplacanthopoma brunnea* ("Diplacanthopoma\_brunneaA"), *Etroplus maculatus* ("Etroplus\_maculatusA"), *Gasterosteus aculeatus* ("Gasterosteus_acuC"), *Heros appendictulatus* ("Heros\_appendictulatusA"), *Heterochromis multidens* ("Heterochromis\_multidensA"), *Monopterus albus* ("Monopterus\_albusA"), *Oreochromis niloticus* ("Oreochromis\_niloticus"), *Oryzias latipes* ("Oryzias\_latipes"), *Porichthys notatus* ("Porichthys\_notatusA"), *Thunnus albacares* ("Thunnus\_albacaresA"), *Trachinotus carolinus* ("Trachinotus\_carolinusA")
Oldest fossil species: *Cretatriacanthus guidottii*
First occurrence age: 83.5-99.6 Ma -* **"Holocentrimorphaceae"**
Ingroup: *Sargocentron cornutum* (Sargocentron\_cornutumA)
Oldest fossil species: *Caproberyx pharsus*
First occurrence age: 97.8-99.1 Ma -* **"Acanthopterygii"**
Ingroup: *Astrapogon tellatus* ("Astrapogon\_stellatusA"), *Aulostomus chinensis* ("Aulostomus\_chinensisA"), *Channa striata* ("Channa\_striataA"), *Cichla temensis* ("Cichla\_temensisA"), *Diplacanthopoma brunnea* ("Diplacanthopoma\_brunneaA"), *Etroplus maculatus* ("Etroplus\_maculatusA"), *Gasterosteus aculeatus* ("Gasterosteus_acuC"), *Heros appendictulatus* ("Heros\_appendictulatusA"), *Heterochromis multidens* ("Heterochromis\_multidensA"), *Monocentris japonica* ("Monocentris\_japonicaA"), *Monopterus albus* ("Monopterus\_albusA"), *Oreochromis niloticus* ("Oreochromis\_niloticus"), *Oryzias latipes* ("Oryzias\_latipes"), *Porichthys notatus* ("Porichthys\_notatusA"), *Rondeletia loricata* ("Rondeletia\_loricataA"), *Thunnus albacares* ("Thunnus\_albacaresA"), *Trachinotus carolinus* ("Trachinotus\_carolinusA"), *Sargocentron cornutum* (Sargocentron\_cornutumA)
Oldest fossil species: *Caproberyx pharsus*
First occurrence age: 97.8-99.1 Ma -* **"Polymixiipterygii"**
Ingroup: *Polymixia japonica* ("Polymixia\_japonicaA")
Oldest fossil species: *Homonotichthys rotundus*
First occurrence age: 93.5-96.0 Ma -* **"Percopsaria"**
Ingroup: *Percopsis omiscomaycus* ("Percopsis\_omiscomaycusA")
Oldest fossil species: *Mcconichthys longipinnis*
First occurrence age: 61.1-66.043 Ma -* **"Zeiariae"**
Ingroup: *Zenopsis conchifera* ("Zenopsis\_conchiferaB")
Oldest fossil species: *Cretazeus rinaldii*
First occurrence age: 69.2-76.4 Ma -* **"Gadiformes"**
Ingroup: *Gadus morhua* ("Gadus\_morhua")
Oldest fossil species: *Protacodus* sp.
First occurrence age: 59.7-62.8 Ma -* **"Paracanthopterygii"**
Ingroup: *Gadus morhua* ("Gadus\_morhua"), *Percopsis omiscomaycus* ("Percopsis\_omiscomaycusA"), *Stylephorus chordatus* ("Stylephorus\_chordatusB"), *Zenopsis conchifera* ("Zenopsis\_conchiferaB")
Oldest fossil species: *Cretazeus rinaldii*
First occurrence age: 69.2-76.4 Ma +- **"Other African cichlid tribes"**
Ingroup: _Oreochromis niloticus_ ("Oreochromis\_niloticus")
Oldest fossil species: _Mahengechromis_ spp.
First occurrence age: 45.0-46.0 Ma +- **"African cichlids"**
Ingroup: _Heterochromis multidens_ ("Heterochromis\_multidensA"), _Oreochromis niloticus_ ("Oreochromis\_niloticus")
Oldest fossil species: _Mahengechromis_ spp.
First occurrence age: 45.0-46.0 Ma +- **"Retroculini and Cichlini"**
Ingroup: _Cichla temensis_ ("Cichla\_temensisA")
Oldest fossil species: _Palaeocichla longirostrum_
First occurrence age: 5.332-23.03 Ma +- **"Other Neotropical cichlid tribes"**
Ingroup: _Heros appendictulatus_ ("Heros\_appendictulatusA")
Oldest fossil species: _Plesioheros chauliodus_
First occurrence age: 39.9-45.0 Ma +- **"Neotropical cichlids"**
Ingroup: _Cichla temensis_ ("Cichla\_temensisA"), _Heros appendictulatus_ ("Heros\_appendictulatusA")
Oldest fossil species: _Plesioheros chauliodus_
First occurrence age: 39.9-45.0 Ma +- **"Afro-American cichlids"**
Ingroup: _Cichla temensis_ ("Cichla\_temensisA"), _Heros appendictulatus_ ("Heros\_appendictulatusA"), _Heterochromis multidens_ ("Heterochromis\_multidensA"), _Oreochromis niloticus_ ("Oreochromis\_niloticus")
Oldest fossil species: _Mahengechromis_ spp.
First occurrence age: 45.0-46.0 Ma +- **"Cichlids"**
_Cichla temensis_ ("Cichla\_temensisA"), _Etroplus maculatus_ ("Etroplus\_maculatusA"), _Heros appendictulatus_ ("Heros\_appendictulatusA"), _Heterochromis multidens_ ("Heterochromis\_multidensA"), _Oreochromis niloticus_ ("Oreochromis\_niloticus")
Oldest fossil species: _Mahengechromis_ spp.
First occurrence age: 45.0-46.0 Ma +- **"Atherinomorphae"**
Ingroup: _Oryzias latipes_ ("Oryzias\_latipes")
Oldest fossil species: _Rhamphexocoetus volans_
First occurrence age: 49.1-49.4 Ma +- **"Ovalentaria"**
Ingroup: _Cichla temensis_ ("Cichla\_temensisA"), _Etroplus maculatus_ ("Etroplus\_maculatusA"), _Heros appendictulatus_ ("Heros\_appendictulatusA"), _Heterochromis multidens_ ("Heterochromis\_multidensA"), _Oreochromis niloticus_ ("Oreochromis\_niloticus"), _Oryzias latipes_ ("Oryzias\_latipes")
Oldest fossil species: _Rhamphexocoetus volans_
First occurrence age: 49.1-49.4 Ma +- **"Carangaria"**
Ingroup: _Trachinotus carolinus_ ("Trachinotus\_carolinusA")
Oldest fossil species: _Trachicaranx tersus_
First occurrence age: 55.8-57.23 Ma +- **"Anabantiformes"**
Ingroup: _Channa striata_ ("Channa\_striataA")
Oldest fossil species: _Osphronemus goramy_
First occurrence age: 45.5-50.7 Ma +- **"Anabantaria"**
Ingroup: _Channa striata_ ("Channa\_striataA"), _Monopterus albus_ ("Monopterus\_albusA")
Oldest fossil species: _Osphronemus goramy_
First occurrence age: 45.5-50.7 Ma +- **"Eupercaria"**
Ingroup: _Gasterosteus aculeatus_ ("Gasterosteus_acuC")
Oldest fossil species: _Cretatriacanthus guidottii_
First occurrence age: 83.5-99.6 Ma +- **"Gobiaria"**
Ingroup: _Astrapogon tellatus_ ("Astrapogon\_stellatusA")
Oldest fossil species: _"Gobius" gracilis_
First occurrence age: 30.7-33.9 Ma +- **"Syngnatharia"**
Ingroup: _Aulostomus chinensis_ ("Aulostomus\_chinensisA")
Oldest fossil species: _Prosolenostomus lessinii_
First occurrence age: 49.1-49.4 Ma +- **"Pelagaria"**
Ingroup: _Thunnus albacares_ ("Thunnus\_albacaresA")
Oldest fossil species: _Eutrichiurides opiensis_
First occurrence age: 56.6-66.043 Ma +- **"Batrachoidiaria"**
Ingroup: _Porichthys notatus_ ("Porichthys\_notatusA")
Oldest fossil species: _Louckaichthys novosadi_
First occurrence age: 27.82-33.9 Ma +- **"Ophidiaria"**
Ingroup: _Diplacanthopoma brunnea_ ("Diplacanthopoma\_brunneaA")
Oldest fossil species: _Eolamprogrammus senectus_
First occurrence age: 55.8-57.23 Ma +- **"Percomorphaceae"**
Ingroup: _Astrapogon tellatus_ ("Astrapogon\_stellatusA"), _Aulostomus chinensis_ ("Aulostomus\_chinensisA"), _Channa striata_ ("Channa\_striataA"), _Cichla temensis_ ("Cichla\_temensisA"), _Diplacanthopoma brunnea_ ("Diplacanthopoma\_brunneaA"), _Etroplus maculatus_ ("Etroplus\_maculatusA"), _Gasterosteus aculeatus_ ("Gasterosteus_acuC"), _Heros appendictulatus_ ("Heros\_appendictulatusA"), _Heterochromis multidens_ ("Heterochromis\_multidensA"), _Monopterus albus_ ("Monopterus\_albusA"), _Oreochromis niloticus_ ("Oreochromis\_niloticus"), _Oryzias latipes_ ("Oryzias\_latipes"), _Porichthys notatus_ ("Porichthys\_notatusA"), _Thunnus albacares_ ("Thunnus\_albacaresA"), _Trachinotus carolinus_ ("Trachinotus\_carolinusA")
Oldest fossil species: _Cretatriacanthus guidottii_
First occurrence age: 83.5-99.6 Ma +- **"Holocentrimorphaceae"**
Ingroup: _Sargocentron cornutum_ (Sargocentron\_cornutumA)
Oldest fossil species: _Caproberyx pharsus_
First occurrence age: 97.8-99.1 Ma +- **"Acanthopterygii"**
Ingroup: _Astrapogon tellatus_ ("Astrapogon\_stellatusA"), _Aulostomus chinensis_ ("Aulostomus\_chinensisA"), _Channa striata_ ("Channa\_striataA"), _Cichla temensis_ ("Cichla\_temensisA"), _Diplacanthopoma brunnea_ ("Diplacanthopoma\_brunneaA"), _Etroplus maculatus_ ("Etroplus\_maculatusA"), _Gasterosteus aculeatus_ ("Gasterosteus_acuC"), _Heros appendictulatus_ ("Heros\_appendictulatusA"), _Heterochromis multidens_ ("Heterochromis\_multidensA"), _Monocentris japonica_ ("Monocentris\_japonicaA"), _Monopterus albus_ ("Monopterus\_albusA"), _Oreochromis niloticus_ ("Oreochromis\_niloticus"), _Oryzias latipes_ ("Oryzias\_latipes"), _Porichthys notatus_ ("Porichthys\_notatusA"), _Rondeletia loricata_ ("Rondeletia\_loricataA"), _Thunnus albacares_ ("Thunnus\_albacaresA"), _Trachinotus carolinus_ ("Trachinotus\_carolinusA"), _Sargocentron cornutum_ (Sargocentron\_cornutumA)
Oldest fossil species: _Caproberyx pharsus_
First occurrence age: 97.8-99.1 Ma +- **"Polymixiipterygii"**
Ingroup: _Polymixia japonica_ ("Polymixia\_japonicaA")
Oldest fossil species: _Homonotichthys rotundus_
First occurrence age: 93.5-96.0 Ma +- **"Percopsaria"**
Ingroup: _Percopsis omiscomaycus_ ("Percopsis\_omiscomaycusA")
Oldest fossil species: _Mcconichthys longipinnis_
First occurrence age: 61.1-66.043 Ma +- **"Zeiariae"**
Ingroup: _Zenopsis conchifera_ ("Zenopsis\_conchiferaB")
Oldest fossil species: _Cretazeus rinaldii_
First occurrence age: 69.2-76.4 Ma +- **"Gadiformes"**
Ingroup: _Gadus morhua_ ("Gadus\_morhua")
Oldest fossil species: _Protacodus_ sp.
First occurrence age: 59.7-62.8 Ma +- **"Paracanthopterygii"**
Ingroup: _Gadus morhua_ ("Gadus\_morhua"), _Percopsis omiscomaycus_ ("Percopsis\_omiscomaycusA"), _Stylephorus chordatus_ ("Stylephorus\_chordatusB"), _Zenopsis conchifera_ ("Zenopsis\_conchiferaB")
Oldest fossil species: _Cretazeus rinaldii_
First occurrence age: 69.2-76.4 Ma Once all these constraints are added, the BEAUti window should look as shown below. @@ -372,7 +370,7 @@ Most likely, the MCMC analysis is going to crash right at the start with an erro
Figure 21: Error message due to failure to initialize the MCMC chain.
-This is a common problem when several fossil constraints are specified: According to the error message, BEAST2 could not find a proper state to initialize. This means that even after several attempts, no starting state of the MCMC chain could be found that had a non-zero probability. Most often, the issue is that the tree that BEAST2 randomly generates to start the chain is in conflict with one or more fossil constraints. Unfortunately, the only way to fix this issue is to manually edit the XML file and specify a starting tree that is in agreement with the specified fossil constraints. In particular, because all fossil constraints imposed hard minimum ages on the origin of the respective clades, this clades must at least be as old as this minimum age in the starting tree. In case of doubt, it is usually safer to make the starting tree too old rather than too young, the course of the MCMC chain should, at least after the burnin, not be influenced by the starting state anymore anyway. Some helpful advice on how to specify starting trees is provided on the [BEAST2](https://www.beast2.org/fix-starting-tree/) webpage. With trees of hundreds of taxa, generating a suitable starting tree can be a tricky task in itself, but with the small number of 24 species used here, writing a starting tree by hand is feasible. +This is a common problem when several fossil constraints are specified: According to the error message, BEAST2 could not find a proper state to initialize. This means that even after several attempts, no starting state of the MCMC chain could be found that had a non-zero probability. Most often, the issue is that the tree that BEAST2 randomly generates to start the chain is in conflict with one or more fossil constraints. Unfortunately, the only way to fix this issue is to manually edit the XML file and specify a starting tree that is in agreement with the specified fossil constraints. In particular, because all fossil constraints impose hard minimum ages on the origin of the respective clades, these clades must at least be as old as the minimum age in the starting tree. In case of doubt, it is usually safer to make the starting tree too old rather than too young, since the course of the MCMC chain should, at least after the burnin, not be influenced by the starting state anymore anyway. Some helpful advice on how to specify starting trees is provided on the [BEAST2](https://www.beast2.org/fix-starting-tree/) webpage. With trees of hundreds of taxa, generating a suitable starting tree can be a tricky task in itself, but with the small number of 24 species used here, writing a starting tree by hand is feasible. > Copy and paste the below starting tree string into a new FigTree window. @@ -412,7 +410,7 @@ Depending on the speed of your computer, this analysis will take half a day or l We are now going to use the program Tracer to assess stationarity of the MCMC produced by the analyses with CladeAge. -> Open Tracer and the log file `Near_et_al_red.log` resulting from the analysis with CladeAge. +> Open the analysis log file `Near_et_al_red.log` in Tracer. The Tracer window should then look as shown in the next screenshot. @@ -422,7 +420,7 @@ The Tracer window should then look as shown in the next screenshot.
Figure 22: Analyzing results with Tracer.
-> Quickly browse through the long list of parameters to see if any have particularly low ESS values. +> Quickly browse through the long list of traces in the bottom left part of the window to see if any have particularly low ESS values. We'll ignore those parameters of the bModelTest model named "hasEqualFreqs...". Besides these, the lowest ESS values are probably around 80, indicating that the chain is approaching stationarity, but that it should be run for more iterations for a complete and publishable analysis. Nevertheless, the degree of stationarity appears to be sufficient for our interpretation here. @@ -438,7 +436,7 @@ You should see that after a steep increase at the very beginning of the MCMC, th This suggest that considering the first 10% of the chain as "burn-in" is appropriate for this analysis. -> To see a better example of a "hairy caterpillar" trace pattern indicating good stationarity, click on "prior" in the list of parameters and on the tab for "Trace" in the top right of the window. +> To see a better example of a "hairy caterpillar" trace pattern indicating good stationarity, click on "prior" in the list of traces. You should see a trace as shown below. @@ -450,11 +448,11 @@ You should see a trace as shown below. Note that in principle all traces should look similar to this pattern, with ESS values greater than 200, once the chain is fully stationary. -> Select the "TreeHeight" parameter indicating the root age in the list on the left. +> Select the "TreeHeight" trace indicating the root age in the list on the left. **Question 1:** What is the mean estimate and its confidence interval for the age of the first split in the phylogeny? [(see answer)](#q1) -> Next, find the estimated divergence time between African and Neotropical cichlid fishes. To do so, scroll to the bottom of the list on the left and select "mrcatime(Afro-American cichlids)". +> Next, find the estimated divergence time between African and Neotropical cichlid fishes. To do so, scroll to the bottom of the traces list and select "mrcatime(Afro-American cichlids)". You'll see that this divergence event was estimated around 65 Ma, with a range of uncertainty between around 55 Ma and 75 Ma, as shown in the next screenshot. @@ -464,7 +462,7 @@ You'll see that this divergence event was estimated around 65 Ma, with a range o
Figure 25: Analyzing results with Tracer.
-> Finally, select the speciation-rate parameter named "BDBirthRate" from the list on the left to see the summary statistics for this parameter. +> Finally, select the speciation-rate parameter named "BDBirthRate" from the traces list to see the summary statistics for this parameter. These should look similar to those shown in the screenshot below. @@ -474,14 +472,14 @@ These should look similar to those shown in the screenshot below.
Figure 26: Analyzing results with Tracer.
-**Question 2:** How do these estimates compare to those that we used to define prior densities for CladeAge calibrations? [(see answer)](#q2) +**Question 2:** How does this speciation-rate estimate compare to the estimate for net diversification that we used to define prior densities for CladeAge calibrations? [(see answer)](#q2) ### Summarizing the posterior tree distribution -While analysis with Tracer has been sufficient to inspect the run stationarity and parameter estimates, we might still want to analyze and visualize the inferred tree itself. One useful method to visualize the entire tree distribution is Densitree {% cite BouckaertHeled2014 --file CladeAge-Tutorial/master-refs.bib %}; this software is distributed together with BEAST2. Alternatively, a single summary tree can be generated with TreeAnnotator as described below. +While analysis with Tracer has been sufficient to inspect the run stationarity and parameter estimates, we might still want to inspect and visualize the inferred tree itself. One useful method to visualize the posterior tree distribution is [Densitree](https://www.cs.auckland.ac.nz/~remco/DensiTree/) {% cite BouckaertHeled2014 --file CladeAge-Tutorial/master-refs.bib %}; this software is distributed together with BEAST2. Alternatively, a single summary tree can be generated with TreeAnnotator as described below. -> Open TreeAnnotator. Set the burn-in percentage to 10% (recall that this was determined to be appropriate based on the MCMC trace for the posterior). As the target tree type, choose "Maximum clade credibility tree" from the drop-down menu, and select "Mean heights" as node heights. Select the `Near_et_al_red.trees` output file of BEAST2 analysis as input tree file, and name the output file `Near_et_al_red.tre`. +> Open TreeAnnotator. Set the burn-in percentage to 10% (recall that this was determined to be appropriate based on the MCMC trace for the posterior). As the target tree type, choose "Maximum clade credibility tree" from the drop-down menu, and select "Mean heights" as node heights. Select the `Near_et_al_red.trees` file written by BEAST2 as input tree file, and name the output file `Near_et_al_red.tre`. The TreeAnnotator window should then look as shown below. @@ -491,7 +489,7 @@ The TreeAnnotator window should then look as shown below.
Figure 27: Generating a summary tree with TreeAnnotator.
-After clicking "OK", TreeAnnotator will write the maximum-clade-credibility tree to file `Near_et_al_red.tre`. This tree can now be visualized with FigTree. +After clicking "OK", TreeAnnotator will write the maximum-clade-credibility tree to file `Near_et_al_red.tre`. This tree can then be visualized with FigTree. > Open file `Near_et_al_red.tre` in FigTree. To add a time scale, set a tick next to "Scale Axis" in the menu on the left, then click the triangle next to it. Remove the tick next to "Show grid" and set a tick next to "Reverse axis". Then, set a tick next to "Node bars", click the triangle to open this panel, and select "height_95%_HPD" from the drop-down menu to illustrate the uncertainty in divergence-time estimates. @@ -520,7 +518,7 @@ The tree should then be shown as in the screenshot below. -- **Question 2:** The estimated speciation rate is far lower than the values that we assumed when we specified prior densities for clade ages with CladeAge. Recall that we had used the estimates for the net diversification rate from Santini et al. {% cite Santini2009 -A --file CladeAge-Tutorial/master-refs.bib %}, which were 0.041-0.081 (per millon year). Thus, the estimated speciation of around 0.0009 is almost an order of magnitude lower than the values that we had assumed for the net diversification rate. This is remarkable because the speciation rate should always be higher than the net diversification rate, given that the latter is defined as the difference between the speciation and extinction rates. The explanation for this difference is that BEAST2 estimated the speciation rate under the assumption that the species that we included in the phylogeny are in fact all the extant species that descended from the root of the phylogeny. This means that BEAST2 assumed that no spiny-rayed fish species besides those 24 included in the phylogeny exist. On the other hand the estimates for the net diversification rate obtained by {% cite Santini2009 -A --file CladeAge-Tutorial/master-refs.bib %} accounted for the fact that only a subset of the living species were included in their phylogeny. Thus, the speciation-rate estimate resulting from our analysis is most certainly a severe underestimate. This bias, however, should not lead to strong bias in the timeline inferred in the analysis with CladeAge, because it did not influence the prior densities placed on clade ages. +- **Question 2:** The estimated speciation rate is far lower than the values that we assumed when we specified prior densities for clade ages with CladeAge. Recall that we had used the estimates for the net diversification rate from Santini et al. {% cite Santini2009 -A --file CladeAge-Tutorial/master-refs.bib %}, which were 0.041-0.081 (per millon year). Thus, the estimated speciation of around 0.0009 is almost an order of magnitude lower than the values that we had assumed for the net diversification rate. This is remarkable because the speciation rate should always be higher than the net diversification rate, given that the latter is defined as the difference between the speciation and extinction rates. The explanation for this difference is that BEAST2 estimated the speciation rate under the assumption that the species that we included in the phylogeny are in fact all the extant species that descended from the root of the phylogeny. This means that BEAST2 assumed that no spiny-rayed fish species besides those 24 included in the phylogeny exist. On the other hand the estimates for the net diversification rate obtained by Santini et al. {% cite Santini2009 -A --file CladeAge-Tutorial/master-refs.bib %} accounted for the fact that only a subset of the living species were included in their phylogeny. Thus, the speciation-rate estimate resulting from our analysis is most certainly a severe underestimate. This bias, however, should not lead to strong bias in the timeline inferred in the analysis with CladeAge, because it did not influence the prior densities placed on clade ages. ------- @@ -529,9 +527,7 @@ The tree should then be shown as in the screenshot below. - [Rough Guide to CladeAge](http://evoinformatics.eu/cladeage.pdf) - [Bayesian Evolutionary Analysis with BEAST 2](http://www.beast2.org/book.html) {% cite BEAST2book2014 --file CladeAge-Tutorial/master-refs.bib %} -- BEAST 2 website and documentation: [http://www.beast2.org/](http://www.beast2.org/) -- BEAST 1 website and documentation: [http://beast.bio.ed.ac.uk](http://beast.bio.ed.ac.uk) -- Join the BEAST user discussion: [http://groups.google.com/group/beast-users](http://groups.google.com/group/beast-users) +- Ask questions at the BEAST user discussion: [http://groups.google.com/group/beast-users](http://groups.google.com/group/beast-users) ----