Skip to content
Permalink
Browse files

fixes error in citations

  • Loading branch information...
nicfel committed Mar 25, 2019
1 parent 47f8a36 commit f91378358131d45090f1f1aa9b6816ef23c7da61
Showing with 100 additions and 40 deletions.
  1. +5 −9 README.md
  2. +95 −31 master-refs.bib
@@ -47,10 +47,6 @@ TreeAnnotator is provided as a part of the BEAST2 package so you do not need to

# Practical: Setting up an analysis with coupled MCMC

{% cite Mueller348391 --file CoupledMCMC-Tutorial/master-refs.bib %}



In this tutorial, we will describe the two different ways to setup a BEAST2 analysis to run with coupled MCMC.
To do so, we will setup a Bayesian Skyline plot analysis by following analogue to the tutorial on [skyline plots](https://taming-the-beast.org/tutorials/Skyline-plots/).

@@ -59,14 +55,14 @@ All other analyses have to be setup by editing one line in the `*xml` file.


## The Data
The dataset consists of an alignment of 63 Hepatitis C sequences sampled in 1993 in Egypt {% cite Ray2000 --file Skyline-plots/master-refs %}. This dataset has been used previously to test the performance of skyline methods {% cite Pybus2003, Drummond2005, Stadler2013 --file Skyline-plots/master-refs %}.
The dataset consists of an alignment of 63 Hepatitis C sequences sampled in 1993 in Egypt {% cite Ray2000 --file CoupledMCMC-Tutorial/master-refs %}. This dataset has been used previously to test the performance of skyline methods {% cite Pybus2003, Drummond2005, Stadler2013 --file CoupledMCMC-Tutorial/master-refs %}.

With an estimated 15-25%, Egypt has the highest Hepatits C prevalence in the world. In the mid 20^(th) century, the prevalence of Hepatitis C increased drastically (see [Figure 1](#fig:prevalence) for estimates). We will try to infer this increase from sequence data.

<figure>
<a id="fig:prevalence"></a>
<img style="width:50%;" src="figures/Estimated_number_hcv.png" alt="">
<figcaption>Figure 1: The estimated number of Hepatitis C cases in Egypt {% cite Pybus2003 --file Skyline-plots/master-refs.bib %}.</figcaption>
<figcaption>Figure 1: The estimated number of Hepatitis C cases in Egypt {% cite Pybus2003 --file CoupledMCMC-Tutorial/master-refs.bib %}.</figcaption>
</figure>
<br>

@@ -112,7 +108,7 @@ After we have loaded the sequences into BEAUti, we have to specify the evolution
</figure>
<br>

As we use sequences that were sampled at the same point in time, we need to fix the clock rate (for more information on this please refer to the tutorial on molecular clocks). We will use an estimate inferred in {% cite Pybus2001 --file Skyline-plots/master-refs %} to fix the clock rate. In this case all the samples were contemporaneous (at the same time) and the clock rate works as a mapping of the estimated tree branch lengths into calendar time.
As we use sequences that were sampled at the same point in time, we need to fix the clock rate (for more information on this please refer to the tutorial on molecular clocks). We will use an estimate inferred in {% cite Pybus2001 --file CoupledMCMC-Tutorial/master-refs %} to fix the clock rate. In this case all the samples were contemporaneous (at the same time) and the clock rate works as a mapping of the estimated tree branch lengths into calendar time.

We will keep the strict clock model and will set `Clock.rate` to 0.00079.

@@ -137,7 +133,7 @@ For this analysis we will set the number of dimensions to 4 (the default value i
</figure>
<br>

Choosing the dimension for the Bayesian Coalescent Skyline can be rather arbitrary. If the dimension is chosen too low, not all population changes are captured, if it is chosen too large, there might be too little information in an interval to support an estimate of a population size. There are implementations in BEAST of the coalescent skyline that either sample dimensions (Extended Bayesian Skyline {% cite Heled2008 --file Skyline-plots/master-refs %}) or do not require dimensions to be specified (Skyride {% cite Minin2008 --file Skyline-plots/master-refs %}).
Choosing the dimension for the Bayesian Coalescent Skyline can be rather arbitrary. If the dimension is chosen too low, not all population changes are captured, if it is chosen too large, there might be too little information in an interval to support an estimate of a population size. There are implementations in BEAST of the coalescent skyline that either sample dimensions (Extended Bayesian Skyline {% cite Heled2008 --file CoupledMCMC-Tutorial/master-refs %}) or do not require dimensions to be specified (Skyride {% cite Minin2008 --file CoupledMCMC-Tutorial/master-refs %}).

We can leave the rest of the priors as they are and go to the `Coupled MCMC panel`.
In contrast to regular MCMC, we have to define a few more things.
@@ -152,7 +148,7 @@ The next parameter we have to set is the `deltaTemperature`, the higher this val
Hotter chains on the other hand are more easily able to cross unlikely intermediate states and can therefore help chains to move out of local optimas.
Here, we use a value of 0.05.
This value should be different depending on the dataset, the analysis and the number of chains.
Overall, it should be chosen such that the acceptance probability of an exchange of states between chains is between 0.25 and 0.6 {% cite altekar2004parallel --file Skyline-plots/master-refs %}.
Overall, it should be chosen such that the acceptance probability of an exchange of states between chains is between 0.25 and 0.6 {% cite altekar2004parallel --file CoupledMCMC-Tutorial/master-refs %}.

<figure>
<a id="fig:dimensions"></a>
@@ -128,44 +128,108 @@ @article{Bouckaert2014
volume = {10},
year = {2014}
}

@article{Mueller2017,
title={The Structured Coalescent and its Approximations},
author={M{\"u}ller, Nicola F and Rasmussen, David A and Stadler, Tanja},
journal={Molecular Biology and Evolution},
pages={msx186},
year={2017},
publisher={Oxford University Press}
@article{Ray2000,
author = {Ray, StuartÊC. and Arthur, RayÊR. and Carella, Anthony and Bukh, Jens and Thomas, DavidÊL.},
doi = {10.1086/315786},
file = {:Users/nicmuell/Library/Application Support/Mendeley Desktop/Downloaded/Ray et al. - 2000 - Genetic Epidemiology of Hepatitis C Virus throughout Egypt.pdf:pdf},
issn = {0022-1899},
journal = {The Journal of Infectious Diseases},
mendeley-groups = {Diseases and Healthcare,SkylineTutorial},
month = {sep},
number = {3},
pages = {698--707},
publisher = {Oxford University Press},
title = {Genetic Epidemiology of Hepatitis C Virus throughout Egypt},
url = {http://jid.oxfordjournals.org/lookup/doi/10.1086/315786},
volume = {182},
year = {2000}
}
@article{Pybus2003,
author = {Pybus, O. G. and Drummond, A. J. and Nakano, T. and Robertson, B. H. and Rambaut, A.},
doi = {10.1093/molbev/msg043},
file = {:Users/nicmuell/Library/Application Support/Mendeley Desktop/Downloaded/Pybus et al. - 2003 - The Epidemiology and Iatrogenic Transmission of Hepatitis C Virus in Egypt A Bayesian Coalescent Approach(3).pdf:pdf},
issn = {07374038},
journal = {Molecular Biology and Evolution},
mendeley-groups = {Skyline,Diseases and Healthcare,SkylineTutorial},
month = {mar},
number = {3},
pages = {381--387},
publisher = {Oxford University Press},
title = {The Epidemiology and Iatrogenic Transmission of Hepatitis C Virus in Egypt: A Bayesian Coalescent Approach},
url = {http://mbe.oupjournals.org/cgi/doi/10.1093/molbev/msg043},
volume = {20},
year = {2003}
}
@article{Drummond2005,
abstract = {We introduce the Bayesian skyline plot, a new method for estimating past population dynamics through time from a sample of molecular sequences without dependence on a prespecified parametric model of demographic history. We describe a Markov chain Monte Carlo sampling procedure that efficiently samples a variant of the generalized skyline plot, given sequence data, and combines these plots to generate a posterior distribution of effective population size through time. We apply the Bayesian skyline plot to simulated data sets and show that it correctly reconstructs demographic history under canonical scenarios. Finally, we compare the Bayesian skyline plot model to previous coalescent approaches by analyzing two real data sets (hepatitis C virus in Egypt and mitochondrial DNA of Beringian bison) that have been previously investigated using alternative coalescent methods. In the bison analysis, we detect a severe but previously unrecognized bottleneck, estimated to have occurred 10,000 radiocarbon years ago, which coincides with both the earliest undisputed record of large numbers of humans in Alaska and the megafaunal extinctions in North America at the beginning of the Holocene.},
author = {Drummond, A J and Rambaut, A and Shapiro, B and Pybus, O G},
doi = {10.1093/molbev/msi103},
issn = {0737-4038},
journal = {Molecular biology and evolution},
keywords = {Algorithms,Animals,Bayes Theorem,Bison,Bison: genetics,DNA, Mitochondrial,DNA, Mitochondrial: genetics,Egypt,Egypt: epidemiology,Evolution, Molecular,Genetics, Population,Hepacivirus,Hepacivirus: genetics,Hepacivirus: pathogenicity,Hepatitis C,Hepatitis C: epidemiology,Hepatitis C: transmission,Humans,Markov Chains,Models, Genetic,Monte Carlo Method,Population Density,Population Dynamics,Time Factors},
mendeley-groups = {Skyline,SkylineTutorial},
month = {may},
number = {5},
pages = {1185--92},
pmid = {15703244},
title = {Bayesian coalescent inference of past population dynamics from molecular sequences.},
url = {http://mbe.oxfordjournals.org/content/22/5/1185.abstract},
volume = {22},
year = {2005}
}
@article{Stadler2013,
author = {Stadler, T. and Kuhnert, D. and Bonhoeffer, S. and Drummond, A. J.},
doi = {10.1073/pnas.1207965110},
file = {:Users/nicmuell/Library/Application Support/Mendeley Desktop/Downloaded/Stadler et al. - 2013 - Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV).pdf:pdf},
issn = {0027-8424},
journal = {Proceedings of the National Academy of Sciences},
mendeley-groups = {Skyline,SkylineTutorial},
month = {jan},
number = {1},
pages = {228--233},
publisher = {National Acad Sciences},
title = {Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV)},
url = {http://www.pnas.org/cgi/doi/10.1073/pnas.1207965110},
volume = {110},
year = {2013}
}
@article{Heled2008,
author = {Heled, Joseph and Drummond, Alexei J},
doi = {10.1186/1471-2148-8-289},
issn = {1471-2148},
journal = {BMC Evolutionary Biology},
mendeley-groups = {Skyline,SkylineTutorial},
number = {1},
pages = {289},
publisher = {BioMed Central},
title = {Bayesian inference of population size history from multiple loci},
url = {http://bmcevolbiol.biomedcentral.com/articles/10.1186/1471-2148-8-289},
volume = {8},
year = {2008}
}
@article{Minin2008,
abstract = {Kingman's coalescent process opens the door for estimation of population genetics model parameters from molecular sequences. One paramount parameter of interest is the effective population size. Temporal variation of this quantity characterizes the demographic history of a population. Because researchers are rarely able to choose a priori a deterministic model describing effective population size dynamics for data at hand, nonparametric curve-fitting methods based on multiple change-point (MCP) models have been developed. We propose an alternative to change-point modeling that exploits Gaussian Markov random fields to achieve temporal smoothing of the effective population size in a Bayesian framework. The main advantage of our approach is that, in contrast to MCP models, the explicit temporal smoothing does not require strong prior decisions. To approximate the posterior distribution of the population dynamics, we use efficient, fast mixing Markov chain Monte Carlo algorithms designed for highly structured Gaussian models. In a simulation study, we demonstrate that the proposed temporal smoothing method, named Bayesian skyride, successfully recovers "true" population size trajectories in all simulation scenarios and competes well with the MCP approaches without evoking strong prior assumptions. We apply our Bayesian skyride method to 2 real data sets. We analyze sequences of hepatitis C virus contemporaneously sampled in Egypt, reproducing all key known aspects of the viral population dynamics. Next, we estimate the demographic histories of human influenza A hemagglutinin sequences, serially sampled throughout 3 flu seasons.},
author = {Minin, Vladimir N and Bloomquist, Erik W and Suchard, Marc A},
doi = {10.1093/molbev/msn090},
file = {:Users/nicmuell/Library/Application Support/Mendeley Desktop/Downloaded/Minin, Bloomquist, Suchard - 2008 - Smooth skyride through a rough skyline Bayesian coalescent-based inference of population dynamics.pdf:pdf},
issn = {1537-1719},
journal = {Molecular biology and evolution},
mendeley-groups = {Skyline,SkylineTutorial},
month = {jul},
number = {7},
pages = {1459--71},
pmid = {18408232},
title = {Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics.},
url = {http://www.ncbi.nlm.nih.gov/pubmed/18408232 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC3302198},
volume = {25},
year = {2008}
}

@BOOK{BEAST2book2014,
title = {Bayesian evolutionary analysis with {BEAST} 2},
publisher = {Cambridge University Press},
year = {2014},
author = {Alexei J. Drummond and Remco R. Bouckaert}
}
@article {Mueller348391,
author = {Mueller, Nicola Felix and Ogilvie, Huw and Zhang, Chi and Drummond, Alexei and Stadler, Tanja},
title = {Inference of species histories in the presence of gene flow},
year = {2018},
doi = {10.1101/348391},
publisher = {Cold Spring Harbor Laboratory},
abstract = {When populations become isolated, members of these populations can diverge genetically over time. This leads to genetic differences between individuals of these populations that increase over time if the isolation persists. This process can be counteracted when genes are exchanged between populations. In order to study the speciation processes when gene flow is present, isolation-with-migration methods have been developed. These methods typically assume that the ranked topology of the species history is already known. However, this is often not the case and the species tree is therefore of interest itself. To infer it is currently only possible when assuming no gene flow. This assumption can lead to wrongly inferred speciation times and species tree topologies. Building on a recently introduced structured coalescent approach, we introduce a new method that allows inference of the species tree while explicitly modelling the flow of genes between coexisting species. By using Markov chain Monte Carlo sampling, we co-infer the species tree alongside evolutionary parameters of interest. By using simulations, we show that our newly introduced approach is able to reliably infer the species trees and parameters of the isolation-with-migration model from genetic sequence data. We then infer the species history of six great ape species including gene flow after population isolation. By using this dataset, we are able to show that our new methods is able to infer the correct species tree not only on simulated but also on a real data set where the species history has already been well studied. In line with previous results, we find some support for some gene flow between bonobos and common chimpanzees.},
URL = {https://www.biorxiv.org/content/early/2018/06/17/348391},
eprint = {https://www.biorxiv.org/content/early/2018/06/17/348391.full.pdf},
journal = {bioRxiv}
}

@article{fontaine2015extensive,
title={Extensive introgression in a malaria vector species complex revealed by phylogenomics},
author={Fontaine, Michael C and Pease, James B and Steele, Aaron and Waterhouse, Robert M and Neafsey, Daniel E and Sharakhov, Igor V and Jiang, Xiaofang and Hall, Andrew B and Catteruccia, Flaminia and Kakani, Evdoxia and others},
journal={Science},
volume={347},
number={6217},
pages={1258524},
year={2015},
publisher={American Association for the Advancement of Science}
}
@article{altekar2004parallel,
title={Parallel metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference},
author={Altekar, Gautam and Dwarkadas, Sandhya and Huelsenbeck, John P and Ronquist, Fredrik},

0 comments on commit f913783

Please sign in to comment.
You can’t perform that action at this time.