Skip to content

Structured Tree Likelihood

Igor Siveroni edited this page Jul 9, 2021 · 51 revisions

The Structured Tree Likelihood component

The STreeLikelihoodODE class calculates the log-likelihood of a genealogy given a population model. The STreeLikelihoodODE component takes as input a structured tree set of intervals and an instantiation of a PopModelODE. A typical STreeLikelihoodODE entry looks like this:

<distribution spec='STreeLikelihoodODE' id = 'stlikelihood'
      popmodel='@sirmodel'             	 
      equations='PL' method='classicrk' stepSize='0.01'>                
      <treeIntervals spec="STreeIntervals" tree='@tree'> </treeIntervals>                
</distribution>

where @popmodel is reference to a PopModelODE element and '@tree' is a reference to a Tree element.

It is important to take into consideration the following:

  • How to introduce population structure.
  • How to date the tree.
  • Equation types and ODE solvers.

Population Structure

Population structure is introduced by mapping tip nodes, or the taxa IDs used to tag them, to deme names. This can be done by explicitly mapping taxa IDs to deme names via a Type Trait, or by annotating with deme information the corresponding tree taxa ids (taxon set) associated to the tree node tips. Both options are explained below.

Deme names must correspond to demes defined in the population model used as input. Invalid deme names will raise an Exception and stop the program.

Note: The use of the useStateName flag has been deprecated. PhyDyn will always look for deme names in type traits or taxa IDs - deme indices are no longer considered. PhyDyn will allow the use of useStateName but will ignore its value.

Type trait (typeTrait argument)

A type Trait maps taxa IDs to deme names. PhyDyn will use a Type trait if:

  • the typeTrait argument is set, referencing to or including a Type Trait object, or (in this order)
  • the tree used by the likelihood object contains a Type trait.

A Type trait must have the value of its name argument set to Types or Type, and must be defined for the same taxon set used by the tree used in the analysis.

Extracting population structure from Taxa IDs (splitSymbol and splitIndex).

If no type trait is provided, PhyDyn extracts annotated deme information from taxa by splitting taxa IDs using the character splitSymbol and accessing the deme name with splitIndex. A splitIndex value of 1 corresponds to the left-most substring of the splitted taxa ID whereas -1 corresponds to the last splitted substring. For example, given the following taxa ID

  "EBOV|G4973.1|KR105286|SLE|x|I|2014-08-12"  

and splitSymbol = '|', the strings corresponding to splitIndex values 1,2,-2 and -1 are "EBOV", "G4973.1", "I" and "2014-08-12", respectively.

Default: The default PhyDyn behaviour corresponds to splitSymbol= '_' and splitIndex='-1'. Thus, when using the default, taxa ids must be suffixed with an underscore followed by the deme information. For example, a tree tip annotated with taxon

  "EBOV|G4973.1|KR105286|SLE|x|2014-08-12_I"

will be associated to deme I. Most examples presented in this Wiki follow this approach.

Tree Dating

Date information can be introduced via a Date trait or by extracting time information from the population model (trajectory parameters). The following must be taken into consideration:

  • If the tree has a Date trait then the date (time) information associated with the trait is used.
  • If no Date Trait is found, then PhyDyn looks into the PopModelODE element:
    • If t1 is defined then the time of the most recent sample (tip) is set to the value of t1.
    • If t1 is not defined then the time of the most recent sample (tip) is set to the height of the tree.

't(root) < t0': the 'forgiveT0' and 'Ne' parameters

The population trajectory given as input runs from time of origin t0 (a mandatory parameter of PopModelODE) to t1, set by the rules specified above. It is possible to be in a situation where the root of the tree precedes t0 e.g. if t0 is sampled. This means that part of the tree, from the root to t0, will have no population history associated to it. PhyDyn reacts to this situation depending on the values of two parameters, forgiveT0 and Ne:

  • If forgiveT0 is true (default) then PhyDyn switches to the coalescent with constant population size to model the source population. The value of Ne is taken from the Ne parameter, with default n*(n-1)/2*lambda.
  • If forgiveT0 is false, the log-likelihood is set to -Inf.

Equations and ODE Solvers (equations and method parameters)

PhyDyn calculates state probabilities and tree log-likelihoods by solving a set of ODEs along tree intervals. The equations parameter (replaces solvePL, now deprecated) determines the type of equations to use. There are three options:

  • equations=PL (previously solvePL=true), which solves a set of m-by-n equations per tree interval, where m is the number of demes and n the number of extant lineages. Each equation reasons about the probability of lineages being in a particular state (deme) at any given time. This method - introduced in [1] - is usually more precise but computationally expensive for large number of demes or lineages. It may also require, for complex examples (stiff systems), fine-tuning with the solver parameters and methods. These equations correspond to a variation of the original model presented in [1], denoted by com12 in [2].

  • equations=QL (previously solvePL=false) correspond to an approximation on comP that reduces the order of the number of equations to n-by-n per tree interval. Accuracy depends of the complexity of the population models. This set of equations correspond to the model denoted by comQ in [2].

  • equations=logQL performs a logarithmic tranformation on comQ such that the state probabilities p_ik(t) are re-written as u_ik(t) = log(p_ik(t)/(1-p_ik(t)). The solution of the transformed ODEs is more precise (and stable) but take twice as long when compared to comQ.

The user can choose from a range of ODE solvers:

   method='ODE_SOLVER'
   ODE_SOLVER :== FIXED | ADAPTIVE
   FIXED :== 'midpoint' | 'classicrk' | 'gill'
   ADAPTIVE :== 'adams-bashforth' | 'adams=moulton' | 'higham-hall'

Fixed methods require a stepSize parameter. Adaptive methods use the order, atol and rtol parameters. The defaults are:

  • method = 'classicrk' stepSize='0.01'
  • For adaptive methods methods: order='3' rTol='0.0001' aTol='0.00001'

Other arguments and functionality

Minimum state probability (minP)

PhyDyn computes the probability (state probabilities) of lineages being in a particular state (demes) along all points of the tree's intervals. Zero probability values, or values near to zero, may cause numeric problems (-Inf or NaN results) either due to numeric precision difficulties or model-related issues such as the shape of the birth and migration matrices for a particular point along the.

PhyDyn prevents numerical errors of this kind by setting the minimum value of state probabilities to the value of minP. The default value of minP is 0.01.

Sampling from Prior: the isConstantLh flag.

The isConstantLh flag instructs PhyDyn to skip the calculation of the STreeLikelihood and return 0 instead. This is useful if we want to compute the combined popmodel parameters likelihood solely from their pior distributions i.e. effectively "sampling from priors".

Clone this wiki locally