Skip to content

Commit

Permalink
Merge pull request #1285 from nextstrain/fix-distance-docs
Browse files Browse the repository at this point in the history
distance: Document ignored characters and counting of gaps as indels
  • Loading branch information
huddlej committed Aug 17, 2023
2 parents 17a2746 + 9d9fee5 commit 7c6b823
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 0 deletions.
5 changes: 5 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

## __NEXT__

### Bug fixes

* distance: Improve documentation by describing how gaps get treated as indels and how users can ignore specific characters in distance calculations. [#1285][] (@huddlej)

[#1285]: https://github.com/nextstrain/augur/pull/1285

## 22.3.0 (14 August 2023)

Expand Down
19 changes: 19 additions & 0 deletions augur/distance.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,12 @@
parameters allow users to specify a fixed time interval for pairwise
calculations, limiting the computationally complexity of the comparisons.
For all distance calculations, a consecutive series of gap characters (`-`)
counts as a single difference between any pair of sequences. This behavior
reflects the assumption that there was an underlying biological process that
produced the insertion or deletion as a single event as opposed to multiple
independent insertion/deletion events.
**Distance maps**
Distance maps are defined in JSON format with two required top-level keys.
Expand All @@ -47,6 +53,19 @@
"map": {}
}
To ignore specific characters such as gaps or ambiguous nucleotides from the
distance calculation, define a top-level `ignored_characters` key with a list of
characters to ignore.
.. code-block:: json
{
"name": "Hamming distance",
"default": 1,
"ignored_characters": ["-", "N"],
"map": {}
}
By default, distances are floating point values whose precision can be controlled with the `precision` key that defines the number of decimal places to retain for each distance.
The following example shows how to specify a precision of two decimal places in the final output:
Expand Down

0 comments on commit 7c6b823

Please sign in to comment.