Skip to content

Structured Tree Likelihood

Igor Siveroni edited this page Feb 16, 2018 · 51 revisions

The Structured Tree Likelihood component

The STreeLikelihoodODE class computes the log-likelihood of a genealogy given a population model. The STreeLikelihoodODE component takes as input a structured tree set of intervals and an instantiation of a PopModelODE. A typical STreeLikelihoodODE entry looks like this:

<distribution spec='STreeLikelihoodODE' id = 'stlikelihood'
      useStateName='true' popmodel='@sirmodel'             	 
      solvePL='true' method='classicrk' stepSize='0.01'>                
      <treeIntervals spec="STreeIntervals" tree='@tree'> </treeIntervals>                
</distribution>

where @popmodel is reference to a PopModelODE element and '@tree' is a reference to a Tree element. [2]

It is important to take into consideration the following:

  • How to introduce population structure.
  • How to date the tree.
  • Equation types and ODE solvers.

Population Structure

Population structure is introduced by mapping tip nodes to deme names. This can be done by annotating the corresponding tree taxa ids with deme information or by introducing an extra type trait. With the former, taxa ids must be suffixed with an underscore followed by the deme information. For example, a tree tip annotated with taxon

  "EBOV|G4973.1|KR105286|SLE|x|2014-08-12_I"  

will be associated to deme I. Deme names must correspond to demes defined in the population model used as input. If the flag useStateName is set to False, deme indices are used instead i.e. the order (starting from zero) in which they appear in the model. The default is useStateName='True'.

Deme information can also be introduced by adding a type trait, that is, a separate trait defined for the same taxon set mapping taxa to deme information: deme names or indices, depending on the value of the useStateName flag. This is done by including a typeTrait parameter.

Tree Dating

Date information can be introduced via a Date trait or by extracting time information from the population model (trajectory parameters). The following must be taken into consideration:

  • If the tree has a Date trait then the date (time) information associated with the trait is used.
  • If no Date Trait is found, then PhyDyn looks into the PopModelODE element:
    • If t1 is defined then the time of the most recent sample (tip) is set to the value of t0.
    • If t1 is not defined then the time of the most recent sample (tip) is set to the height of the tree.

The 'forgiveT0' and 'Ne' parameters

The population trajectory given as input runs from time of origin t0 (a mandatory parameter of PopModelODE) and t1, set by the rules specified above. It is possible to be in a situation where the root of the tree precedes t0 e.g. if t0 is sampled. This means that part of the tree, from the root to t0, will have no population history associated to it. PhyDyn reacts to this situation depending on the values of two parameters, forgiveT0 and Ne:

  • If forgiveT0 is true (default) then PhyDyn switches to the coalescent with constan population size to model the source population. The value of Ne is taken from the Ne parameter, with default n*(n-1)/2*lambda.
  • If forgiveT0 is false, the log-likelihood is set to -Inf.

Equations and ODE Solvers

PhyDyn calculates state probabilities and tree log-likelihoods by solving a set of ODEs along tree intervals. The user can choose between two methods (type of sets of equations):

  • solvePL=true, which solves a set of m-by-n equations per tree interval, where m is the number of demes and n the number of extant lineages. Each equation reasons about the probability of lineages being in a particular state (deme) at any given time. This method is usually more precise but computationally expensive for large number of demes or lineages. It may also require, for complex examples (stiff systems), fine-tuning with the solver parameters and methods.
  • solvePL=false performs an approximation that reduces the order of the number of equations to n-by-n. Accuracy depends of the complexity of the population models.

The user can choose from a range of ODE solvers:

   method='ODE_SOLVER'
   ODE_SOLVER :== FIXED | ADAPTIVE
   FIXED :== 'midpoint' | 'classicrk' | 'gill'
   ADAPTIVE :== 'adams-bashforth' | 'adams=moulton' | 'higham-hall'

Fixed methods require a stepSize parameter. Adaptive methods use the order, atol and rtol parameters. The defaults are:

  • method = 'classicrk' stepSize='0.01'
  • For fixed methods: order='3' rTol='0.0001' aTol='0.00001'

Clone this wiki locally