-
Notifications
You must be signed in to change notification settings - Fork 8
Structured Tree Likelihood
The STreeLikelihoodODE class calculates the log-likelihood of a genealogy given a population model. The STreeLikelihoodODE component takes as input a structured tree set of intervals and an instantiation of a PopModelODE. A typical STreeLikelihoodODE entry looks like this:
<distribution spec='STreeLikelihoodODE' id = 'stlikelihood'
useStateName='true' popmodel='@sirmodel'
equations='PL' method='classicrk' stepSize='0.01'>
<treeIntervals spec="STreeIntervals" tree='@tree'> </treeIntervals>
</distribution>where @popmodel is reference to a PopModelODE element and '@tree' is a reference to a Tree element.
It is important to take into consideration the following:
- How to introduce population structure.
- How to date the tree.
- Equation types and ODE solvers.
Population structure is introduced by mapping tip nodes to deme names. This can be done by annotating with deme information the corresponding tree taxa ids associated to the tree node tips, or by introducing an extra type trait.
Deme names must correspond to demes defined in the population model used as input. If the flag useStateName is set to False, deme indices are used instead i.e. the order (starting from zero) in which they appear in the model. The default is useStateName='True'.
Deme information can also be introduced by adding a type trait, that is, a separate trait defined for the same taxon set mapping taxa to deme information: deme names or indices, depending on the value of the useStateName flag. This is done by including a typeTrait parameter.
PhyDyn extracts annotated deme information from taxa by splitting taxa IDs using the character splitSymbol and accessing the deme name with splitIndex. A `splitIndex' value of 1 corresponds to the left-most substring of the splitted taxa ID whereas -1 corresponds to the last splitted substring. For example, given the following taxa ID
"EBOV|G4973.1|KR105286|SLE|x|I|2014-08-12" and splitSymbol = '_', the strings corresponding to splitIndex values 1,2,-2 and -1 are "EBOV", "G4973.1", "I" and "2014-08-12", respectively.
Default: The default PhyDyn behaviour corresponds to splitSymbol= '_' and splitIndex='-1'. Thus, when using the default, taxa ids must be suffixed with an underscore followed by the deme information. For example, a tree tip annotated with taxon
"EBOV|G4973.1|KR105286|SLE|x|2014-08-12_I"
```
will be associated to deme `I`. Most examples presented in this Wiki follow the approach.
## Tree Dating
Date information can be introduced via a Date trait or by extracting time information from the population model (trajectory parameters). The following must be taken into consideration:
* If the tree has a Date trait then the date (time) information associated with the trait is used.
* If no Date Trait is found, then PhyDyn looks into the `PopModelODE` element:
* If `t1` is defined then the time of the most recent sample (tip) is set to the value of `t1`.
* If `t1` is not defined then the time of the most recent sample (tip) is set to the height of the tree.
## 't(root) < t0': the 'forgiveT0' and 'Ne' parameters
The population trajectory given as input runs from time of origin t0 (a mandatory parameter of PopModelODE) to t1, set by the rules specified above. It is possible to be in a situation where the root of the tree precedes t0 e.g. if t0 is sampled. This means that part of the tree, from the root to t0, will have no population history associated to it. PhyDyn reacts to this situation depending on the values of two parameters, `forgiveT0` and `Ne`:
* If `forgiveT0` is true (default) then PhyDyn switches to the coalescent with constant population size to model the source population. The value of Ne is taken from the `Ne` parameter, with default n*(n-1)/2*lambda.
* If `forgiveT0` is false, the log-likelihood is set to `-Inf`.
## Equations and ODE Solvers (`equations` and `method` parameters)
PhyDyn calculates state probabilities and tree log-likelihoods by solving a set of ODEs along tree intervals.
The `equations` parameter (replaces `solvePL`, now deprecated) determines the type of equations to use. There are three options:
* `equations=PL` (previously `solvePL=true`), which solves a set of m-by-n equations per tree interval, where m is the number of demes and n the number of extant lineages. Each equation reasons about the probability of lineages being in a particular state (deme) at any given time. This method - introduced in [[1](Home#volz12)] - is usually more precise but computationally expensive for large number of demes or lineages. It may also require, for complex examples (stiff systems), fine-tuning with the solver parameters and methods. These equations correspond to a variation of the original model presented in [[1](Home#volz12)], denoted by _com12_ in [[2](Home#volzsiveroni)].
* `equations=QL` (previously `solvePL=false`) correspond to an approximation on _comP_ that reduces the order of the number of equations to `n-by-n` per tree interval. Accuracy depends of the complexity of the population models. This set of equations correspond to the model denoted by _comQ_ in [[2](Home#volzsiveroni)].
* `equations=logQL` performs a logarithmic tranformation on _comQ_ such that the state probabilities _p_ik(t)_ are re-written as _u_ik(t) = log(p_ik(t)/(1-p_ik(t))_. The solution of the transformed ODEs is more precise (and stable) but take twice as long when compared to _comQ_.
The user can choose from a range of ODE solvers:method='ODE_SOLVER' ODE_SOLVER :== FIXED | ADAPTIVE FIXED :== 'midpoint' | 'classicrk' | 'gill' ADAPTIVE :== 'adams-bashforth' | 'adams=moulton' | 'higham-hall'
Fixed methods require a `stepSize` parameter. Adaptive methods use the `order`, `atol` and `rtol` parameters. The defaults are:
* `method = 'classicrk' stepSize='0.01'`
* For adaptive methods methods: ` order='3' rTol='0.0001' aTol='0.00001' `
## Other arguments and functionality
### `minP`
### Sampling from Prior