Merge pull request #1455: Update FAQ

nextstrain · May 1, 2024 · 03ed408 · 03ed408
2 parents c8ef202 + be37dbb
commit 03ed408
Show file tree

Hide file tree

Showing 7 changed files with 146 additions and 143 deletions.
diff --git a/docs/faq/clades.md b/docs/faq/clades.md
@@ -1,6 +1,6 @@
-# Labeling `clades`
+# How do I label `clades`?
 
-Clades in phylogenetic trees are often named to facilitate discussion of genetic diversity, see for example [seasonal influenza on nextstrain](https://nextstrain.org/flu).
+Clades in phylogenetic trees are often named to facilitate discussion of genetic diversity, see for example [seasonal influenza on nextstrain](https://nextstrain.org/seasonal-flu).
 Augur has a command to determine the position of such clade labels and assign sequences to clades.
 The definition of these clades are provided in a tab-delimited file (tsv) using the following format:
 ```

diff --git a/docs/faq/faq.rst b/docs/faq/faq.rst
@@ -13,5 +13,5 @@ common questions and problems users run into.
    what-is-a-build
    metadata
    clades
-   Specifying `refine` rates <refine>
-   Creating a tree using your own tree builder <skip_augur_tree>
+   refine
+   skip_augur_tree
diff --git a/docs/faq/metadata.rst b/docs/faq/metadata.rst
@@ -1,5 +1,5 @@
-Preparing Your Metadata
-=======================
+How do I prepare metadata?
+==========================
 
 Analyses are vastly more interesting if the sequences or samples
 analyzed have rich 'meta data' wherever possible. This metadata could

diff --git a/docs/faq/refine.rst b/docs/faq/refine.rst
diff --git a/docs/faq/refine.rst b/docs/faq/refine.rst
@@ -0,0 +1,133 @@
+==================================
+How do I specify ``refine`` rates?
+==================================
+
+How we use refine in the zika tutorial
+======================================
+
+In the Zika tutorial we used the following basic rule to run the :doc:`../usage/cli/refine` command:
+
+.. code-block:: python
+
+    rule refine:
+        input:
+            tree = rules.tree.output.tree,
+            alignment = rules.align.output,
+            metadata = "data/metadata.tsv"
+        output:
+            tree = "results/tree.nwk",
+            node_data = "results/branch_lengths.json"
+        shell:
+            """
+            augur refine \
+                --tree {input.tree} \
+                --alignment {input.alignment} \
+                --metadata {input.metadata} \
+                --timetree \
+                --output-tree {output.tree} \
+                --output-node-data {output.node_data}
+            """
+
+
+This rule will estimate the rate of the molecular clock, reroot the tree, and estimate a time tree.
+The paragraphs below will detail how to exert more control on each of these steps through additional options the refine command.
+
+
+Specify the evolutionary rate
+=============================
+
+By default ``augur`` (through ``treetime``) will estimate the rate of evolution from the data by regressing divergence vs sampling date.
+In some scenarios, however, there is insufficient temporal signal to reliably estimate the rate and the analysis will be more robust and reproducible if one fixes this rate explicitly.
+This can be done via the flag ``--clock-rate <value>`` where the implied units are substitutions per site and year.
+In our zika example, this would look like this
+
+.. code-block:: diff
+
+    rule refine:
+        input:
+            tree = rules.tree.output.tree,
+            alignment = rules.align.output,
+            metadata = "data/metadata.tsv"
+        output:
+            tree = "results/tree.nwk",
+            node_data = "results/branch_lengths.json"
+    +    params:
+    +    	clock_rate = 0.0008
+        shell:
+            """
+            augur refine \
+                --tree {input.tree} \
+                --alignment {input.alignment} \
+                --metadata {input.metadata} \
+                --timetree \
+    +           --clock-rate {params.clock_rate} \
+                --output-tree {output.tree} \
+                --output-node-data {output.node_data}
+            """
+
+
+
+Confidence intervals for divergence times
+=========================================
+
+Divergence time estimates are probabilistic and uncertain for multiple reasons, primarily because the accumulation of mutations is a probabilistic process and the rate estimate itself is not precise.
+Augur/TreeTime will account for this uncertainty if the refine command is run with the flag ``--date-confidence`` and the standard deviation of the rate estimate is specified.
+
+.. code-block:: diff
+
+    rule refine:
+        input:
+            tree = rules.tree.output.tree,
+            alignment = rules.align.output,
+            metadata = "data/metadata.tsv"
+        output:
+            tree = "results/tree.nwk",
+            node_data = "results/branch_lengths.json"
+        params:
+            clock_rate = 0.0008,
+    +    	clock_std_dev = 0.0002
+        shell:
+            """
+            augur refine \
+                --tree {input.tree} \
+                --alignment {input.alignment} \
+                --metadata {input.metadata} \
+                --timetree \
+                --date-confidence \
+    +            --clock-rate {params.clock_rate} \
+    +            --clock-std-dev {params.clock_std_dev} \
+                --output-tree {output.tree} \
+                --output-node-data {output.node_data}
+            """
+
+If run with these parameters, augur will save an confidence interval (e.g. ``[2014.5,2014.7]``) for each node in the tree.
+
+By default, augur runs TreeTime in a "covariance-aware" mode where the root-to-tip regression accounts for shared ancestry and covariance between terminal nodes.
+This, however, is sometimes unstable when the temporal signal is low and can be switch off with the flag ``--no-covariance``.
+
+
+Specifying the root of the tree
+===============================
+
+By default, augur/TreeTime reroots your input tree to optimize the temporal signal in the data. This is robust when there is robust temporal signal.
+In other situations, you might want to specify the root explicitly, specify a rerooting mechanisms, or keep the root of the input tree.
+The latter can be achieved by passing the argument ``--keep-root``.
+To specify a particular strain (or the common ancestor of a group of strains), pass the name(s) of the(se) strain(s) like so:
+
+.. code-block:: bash
+
+    --root strain1 [strain2 strain3 ...]
+
+Other available rooting mechanisms are
+
+  * ``least-squares`` (default): minimize squared deviation of the root-to-tip regression
+  * ``min-dev``: essentially midpoint rooting minimizing the variance in root-to-tip distance
+  * ``oldest``: use the oldest strain as outgroup
+
+
+Polytomy resolution
+===================
+
+if the data set contains many very similar sequences, their evolutionary relationship some times remains ambiguous resulting in zero-length branches or polytomies (that is internal nodes with more than 2 children).
+Augur partially resolves those polytomies if such resolution helps the make the tree fit the temporal structure in the data.
+If this is undesired, this can be switched-off using ``--keep-polytomies``.
diff --git a/docs/faq/skip_augur_tree.rst b/docs/faq/skip_augur_tree.rst
@@ -1,6 +1,6 @@
-===========================================
-Creating a tree using your own tree builder
-===========================================
+=================================
+How do I use my own tree builder?
+=================================
 
 The `augur tree` command is a light wrapper around tree building programs such as IQ-TREE, RAxML and FastTree.
 It's possible that the functionality you want isn't available in those programs, or that it is available but that `augur tree` doesn't expose the functionality you need.

diff --git a/docs/faq/what-is-a-build.md b/docs/faq/what-is-a-build.md
@@ -1,4 +1,4 @@
-# The concept of a 'build'
+# What is a "build"?
 
 Nextstrain's focus on providing a _real-time_ snapshot of evolving pathogen populations necessitates a reproducible analysis that can be rerun when new sequences are available.
 The individual steps necessary to repeat analysis together comprise a "build".

diff --git a/docs/usage/cli/refine.rst b/docs/usage/cli/refine.rst
@@ -14,136 +14,7 @@ augur refine
     :prog: augur
     :path: refine
 
+Guides
+======
 
-
-How we use refine in the zika tutorial
-======================================
-
-In the Zika tutorial we used the following basic rule to run the `refine` command:
-
-.. code-block:: python
-
-    rule refine:
-        input:
-            tree = rules.tree.output.tree,
-            alignment = rules.align.output,
-            metadata = "data/metadata.tsv"
-        output:
-            tree = "results/tree.nwk",
-            node_data = "results/branch_lengths.json"
-        shell:
-            """
-            augur refine \
-                --tree {input.tree} \
-                --alignment {input.alignment} \
-                --metadata {input.metadata} \
-                --timetree \
-                --output-tree {output.tree} \
-                --output-node-data {output.node_data}
-            """
-
-
-This rule will estimate the rate of the molecular clock, reroot the tree, and estimate a time tree.
-The paragraphs below will detail how to exert more control on each of these steps through additional options the refine command.
-
-
-Specify the evolutionary rate
-=============================
-
-By default ``augur`` (through ``treetime``) will estimate the rate of evolution from the data by regressing divergence vs sampling date.
-In some scenarios, however, there is insufficient temporal signal to reliably estimate the rate and the analysis will be more robust and reproducible if one fixes this rate explicitly.
-This can be done via the flag ``--clock-rate <value>`` where the implied units are substitutions per site and year.
-In our zika example, this would look like this
-
-.. code-block:: diff
-
-    rule refine:
-        input:
-            tree = rules.tree.output.tree,
-            alignment = rules.align.output,
-            metadata = "data/metadata.tsv"
-        output:
-            tree = "results/tree.nwk",
-            node_data = "results/branch_lengths.json"
-    +    params:
-    +    	clock_rate = 0.0008
-        shell:
-            """
-            augur refine \
-                --tree {input.tree} \
-                --alignment {input.alignment} \
-                --metadata {input.metadata} \
-                --timetree \
-    +           --clock-rate {params.clock_rate} \
-                --output-tree {output.tree} \
-                --output-node-data {output.node_data}
-            """
-
-
-
-Confidence intervals for divergence times
-=========================================
-
-Divergence time estimates are probabilistic and uncertain for multiple reasons, primarily because the accumulation of mutations is a probabilistic process and the rate estimate itself is not precise.
-Augur/TreeTime will account for this uncertainty if the refine command is run with the flag ``--date-confidence`` and the standard deviation of the rate estimate is specified.
-
-.. code-block:: diff
-
-    rule refine:
-        input:
-            tree = rules.tree.output.tree,
-            alignment = rules.align.output,
-            metadata = "data/metadata.tsv"
-        output:
-            tree = "results/tree.nwk",
-            node_data = "results/branch_lengths.json"
-        params:
-            clock_rate = 0.0008,
-    +    	clock_std_dev = 0.0002
-        shell:
-            """
-            augur refine \
-                --tree {input.tree} \
-                --alignment {input.alignment} \
-                --metadata {input.metadata} \
-                --timetree \
-                --date-confidence \
-    +            --clock-rate {params.clock_rate} \
-    +            --clock-std-dev {params.clock_std_dev} \
-                --output-tree {output.tree} \
-                --output-node-data {output.node_data}
-            """
-
-If run with these parameters, augur will save an confidence interval (e.g. ``[2014.5,2014.7]``) for each node in the tree.
-
-By default, augur runs TreeTime in a "covariance-aware" mode where the root-to-tip regression accounts for shared ancestry and covariance between terminal nodes.
-This, however, is sometimes unstable when the temporal signal is low and can be switch off with the flag ``--no-covariance``.
-
-
-Specifying the root of the tree
-===============================
-
-By default, augur/TreeTime reroots your input tree to optimize the temporal signal in the data. This is robust when there is robust temporal signal.
-In other situations, you might want to specify the root explicitly, specify a rerooting mechanisms, or keep the root of the input tree.
-The latter can be achieved by passing the argument ``--keep-root``.
-To specify a particular strain (or the common ancestor of a group of strains), pass the name(s) of the(se) strain(s) like so:
-
-.. code-block:: bash
-
-    --root strain1 [strain2 strain3 ...]
-
-Other available rooting mechanisms are
-
-  * ``least-squares`` (default): minimize squared deviation of the root-to-tip regression
-  * ``min-dev``: essentially midpoint rooting minimizing the variance in root-to-tip distance
-  * ``oldest``: use the oldest strain as outgroup
-
-
-Polytomy resolution
-===================
-
-if the data set contains many very similar sequences, their evolutionary relationship some times remains ambiguous resulting in zero-length branches or polytomies (that is internal nodes with more than 2 children).
-Augur partially resolves those polytomies if such resolution helps the make the tree fit the temporal structure in the data.
-If this is undesired, this can be switched-off using ``--keep-polytomies``.
-
-
+See :doc:`../../faq/refine`.