Skip to content

Commit

Permalink
Merge pull request #1455: Update FAQ
Browse files Browse the repository at this point in the history
  • Loading branch information
victorlin committed May 1, 2024
2 parents c8ef202 + be37dbb commit 03ed408
Show file tree
Hide file tree
Showing 7 changed files with 146 additions and 143 deletions.
4 changes: 2 additions & 2 deletions docs/faq/clades.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Labeling `clades`
# How do I label `clades`?

Clades in phylogenetic trees are often named to facilitate discussion of genetic diversity, see for example [seasonal influenza on nextstrain](https://nextstrain.org/flu).
Clades in phylogenetic trees are often named to facilitate discussion of genetic diversity, see for example [seasonal influenza on nextstrain](https://nextstrain.org/seasonal-flu).
Augur has a command to determine the position of such clade labels and assign sequences to clades.
The definition of these clades are provided in a tab-delimited file (tsv) using the following format:
```
Expand Down
4 changes: 2 additions & 2 deletions docs/faq/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,5 @@ common questions and problems users run into.
what-is-a-build
metadata
clades
Specifying `refine` rates <refine>
Creating a tree using your own tree builder <skip_augur_tree>
refine
skip_augur_tree
4 changes: 2 additions & 2 deletions docs/faq/metadata.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Preparing Your Metadata
=======================
How do I prepare metadata?
==========================

Analyses are vastly more interesting if the sequences or samples
analyzed have rich 'meta data' wherever possible. This metadata could
Expand Down
1 change: 0 additions & 1 deletion docs/faq/refine.rst

This file was deleted.

133 changes: 133 additions & 0 deletions docs/faq/refine.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
==================================
How do I specify ``refine`` rates?
==================================

How we use refine in the zika tutorial
======================================

In the Zika tutorial we used the following basic rule to run the :doc:`../usage/cli/refine` command:

.. code-block:: python
rule refine:
input:
tree = rules.tree.output.tree,
alignment = rules.align.output,
metadata = "data/metadata.tsv"
output:
tree = "results/tree.nwk",
node_data = "results/branch_lengths.json"
shell:
"""
augur refine \
--tree {input.tree} \
--alignment {input.alignment} \
--metadata {input.metadata} \
--timetree \
--output-tree {output.tree} \
--output-node-data {output.node_data}
"""
This rule will estimate the rate of the molecular clock, reroot the tree, and estimate a time tree.
The paragraphs below will detail how to exert more control on each of these steps through additional options the refine command.


Specify the evolutionary rate
=============================

By default ``augur`` (through ``treetime``) will estimate the rate of evolution from the data by regressing divergence vs sampling date.
In some scenarios, however, there is insufficient temporal signal to reliably estimate the rate and the analysis will be more robust and reproducible if one fixes this rate explicitly.
This can be done via the flag ``--clock-rate <value>`` where the implied units are substitutions per site and year.
In our zika example, this would look like this

.. code-block:: diff
rule refine:
input:
tree = rules.tree.output.tree,
alignment = rules.align.output,
metadata = "data/metadata.tsv"
output:
tree = "results/tree.nwk",
node_data = "results/branch_lengths.json"
+ params:
+ clock_rate = 0.0008
shell:
"""
augur refine \
--tree {input.tree} \
--alignment {input.alignment} \
--metadata {input.metadata} \
--timetree \
+ --clock-rate {params.clock_rate} \
--output-tree {output.tree} \
--output-node-data {output.node_data}
"""
Confidence intervals for divergence times
=========================================

Divergence time estimates are probabilistic and uncertain for multiple reasons, primarily because the accumulation of mutations is a probabilistic process and the rate estimate itself is not precise.
Augur/TreeTime will account for this uncertainty if the refine command is run with the flag ``--date-confidence`` and the standard deviation of the rate estimate is specified.

.. code-block:: diff
rule refine:
input:
tree = rules.tree.output.tree,
alignment = rules.align.output,
metadata = "data/metadata.tsv"
output:
tree = "results/tree.nwk",
node_data = "results/branch_lengths.json"
params:
clock_rate = 0.0008,
+ clock_std_dev = 0.0002
shell:
"""
augur refine \
--tree {input.tree} \
--alignment {input.alignment} \
--metadata {input.metadata} \
--timetree \
--date-confidence \
+ --clock-rate {params.clock_rate} \
+ --clock-std-dev {params.clock_std_dev} \
--output-tree {output.tree} \
--output-node-data {output.node_data}
"""
If run with these parameters, augur will save an confidence interval (e.g. ``[2014.5,2014.7]``) for each node in the tree.

By default, augur runs TreeTime in a "covariance-aware" mode where the root-to-tip regression accounts for shared ancestry and covariance between terminal nodes.
This, however, is sometimes unstable when the temporal signal is low and can be switch off with the flag ``--no-covariance``.


Specifying the root of the tree
===============================

By default, augur/TreeTime reroots your input tree to optimize the temporal signal in the data. This is robust when there is robust temporal signal.
In other situations, you might want to specify the root explicitly, specify a rerooting mechanisms, or keep the root of the input tree.
The latter can be achieved by passing the argument ``--keep-root``.
To specify a particular strain (or the common ancestor of a group of strains), pass the name(s) of the(se) strain(s) like so:

.. code-block:: bash
--root strain1 [strain2 strain3 ...]
Other available rooting mechanisms are

* ``least-squares`` (default): minimize squared deviation of the root-to-tip regression
* ``min-dev``: essentially midpoint rooting minimizing the variance in root-to-tip distance
* ``oldest``: use the oldest strain as outgroup


Polytomy resolution
===================

if the data set contains many very similar sequences, their evolutionary relationship some times remains ambiguous resulting in zero-length branches or polytomies (that is internal nodes with more than 2 children).
Augur partially resolves those polytomies if such resolution helps the make the tree fit the temporal structure in the data.
If this is undesired, this can be switched-off using ``--keep-polytomies``.
6 changes: 3 additions & 3 deletions docs/faq/skip_augur_tree.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
===========================================
Creating a tree using your own tree builder
===========================================
=================================
How do I use my own tree builder?
=================================

The `augur tree` command is a light wrapper around tree building programs such as IQ-TREE, RAxML and FastTree.
It's possible that the functionality you want isn't available in those programs, or that it is available but that `augur tree` doesn't expose the functionality you need.
Expand Down
2 changes: 1 addition & 1 deletion docs/faq/what-is-a-build.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# The concept of a 'build'
# What is a "build"?

Nextstrain's focus on providing a _real-time_ snapshot of evolving pathogen populations necessitates a reproducible analysis that can be rerun when new sequences are available.
The individual steps necessary to repeat analysis together comprise a "build".
Expand Down
135 changes: 3 additions & 132 deletions docs/usage/cli/refine.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,136 +14,7 @@ augur refine
:prog: augur
:path: refine

Guides
======


How we use refine in the zika tutorial
======================================

In the Zika tutorial we used the following basic rule to run the `refine` command:

.. code-block:: python
rule refine:
input:
tree = rules.tree.output.tree,
alignment = rules.align.output,
metadata = "data/metadata.tsv"
output:
tree = "results/tree.nwk",
node_data = "results/branch_lengths.json"
shell:
"""
augur refine \
--tree {input.tree} \
--alignment {input.alignment} \
--metadata {input.metadata} \
--timetree \
--output-tree {output.tree} \
--output-node-data {output.node_data}
"""
This rule will estimate the rate of the molecular clock, reroot the tree, and estimate a time tree.
The paragraphs below will detail how to exert more control on each of these steps through additional options the refine command.


Specify the evolutionary rate
=============================

By default ``augur`` (through ``treetime``) will estimate the rate of evolution from the data by regressing divergence vs sampling date.
In some scenarios, however, there is insufficient temporal signal to reliably estimate the rate and the analysis will be more robust and reproducible if one fixes this rate explicitly.
This can be done via the flag ``--clock-rate <value>`` where the implied units are substitutions per site and year.
In our zika example, this would look like this

.. code-block:: diff
rule refine:
input:
tree = rules.tree.output.tree,
alignment = rules.align.output,
metadata = "data/metadata.tsv"
output:
tree = "results/tree.nwk",
node_data = "results/branch_lengths.json"
+ params:
+ clock_rate = 0.0008
shell:
"""
augur refine \
--tree {input.tree} \
--alignment {input.alignment} \
--metadata {input.metadata} \
--timetree \
+ --clock-rate {params.clock_rate} \
--output-tree {output.tree} \
--output-node-data {output.node_data}
"""
Confidence intervals for divergence times
=========================================

Divergence time estimates are probabilistic and uncertain for multiple reasons, primarily because the accumulation of mutations is a probabilistic process and the rate estimate itself is not precise.
Augur/TreeTime will account for this uncertainty if the refine command is run with the flag ``--date-confidence`` and the standard deviation of the rate estimate is specified.

.. code-block:: diff
rule refine:
input:
tree = rules.tree.output.tree,
alignment = rules.align.output,
metadata = "data/metadata.tsv"
output:
tree = "results/tree.nwk",
node_data = "results/branch_lengths.json"
params:
clock_rate = 0.0008,
+ clock_std_dev = 0.0002
shell:
"""
augur refine \
--tree {input.tree} \
--alignment {input.alignment} \
--metadata {input.metadata} \
--timetree \
--date-confidence \
+ --clock-rate {params.clock_rate} \
+ --clock-std-dev {params.clock_std_dev} \
--output-tree {output.tree} \
--output-node-data {output.node_data}
"""
If run with these parameters, augur will save an confidence interval (e.g. ``[2014.5,2014.7]``) for each node in the tree.

By default, augur runs TreeTime in a "covariance-aware" mode where the root-to-tip regression accounts for shared ancestry and covariance between terminal nodes.
This, however, is sometimes unstable when the temporal signal is low and can be switch off with the flag ``--no-covariance``.


Specifying the root of the tree
===============================

By default, augur/TreeTime reroots your input tree to optimize the temporal signal in the data. This is robust when there is robust temporal signal.
In other situations, you might want to specify the root explicitly, specify a rerooting mechanisms, or keep the root of the input tree.
The latter can be achieved by passing the argument ``--keep-root``.
To specify a particular strain (or the common ancestor of a group of strains), pass the name(s) of the(se) strain(s) like so:

.. code-block:: bash
--root strain1 [strain2 strain3 ...]
Other available rooting mechanisms are

* ``least-squares`` (default): minimize squared deviation of the root-to-tip regression
* ``min-dev``: essentially midpoint rooting minimizing the variance in root-to-tip distance
* ``oldest``: use the oldest strain as outgroup


Polytomy resolution
===================

if the data set contains many very similar sequences, their evolutionary relationship some times remains ambiguous resulting in zero-length branches or polytomies (that is internal nodes with more than 2 children).
Augur partially resolves those polytomies if such resolution helps the make the tree fit the temporal structure in the data.
If this is undesired, this can be switched-off using ``--keep-polytomies``.


See :doc:`../../faq/refine`.

0 comments on commit 03ed408

Please sign in to comment.