Skip to content

Commit

Permalink
incorporated Antonio and Daniel's suggestions
Browse files Browse the repository at this point in the history
  • Loading branch information
qiyunzhu committed Aug 3, 2021
1 parent 7771670 commit 634535d
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
4 changes: 3 additions & 1 deletion doc/collapse.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ With this tool one can achieve the following goals:

The last usage is an important complement to the main classification workflow, which currently relies on a tree structure and does not support one-to-many mapping. This can be achieved by using the profile collapsing function (although one can only move up one level per run).

See [considerations](#considerations) below for a discussion of the potential change of statistical properties of data.


## Mapping file format

Expand Down Expand Up @@ -86,7 +88,7 @@ Once a profile is collapsed, the metadata of the source features ("Name", "Rank"
It is important to highlight that one-to-many mapping may change some of the
underlying statistical assumptions of downstream analyses.

In the default mode, because one source may be collapsed into multiple targets, the total feature count per sample may be inflated, and the relative abundance of each feature may no longer correspond to that of the sequences assigned to it. In another word, this breaks the [compositionality](https://en.wikipedia.org/wiki/Compositional_data) of the data.
In the default mode, because one source may be collapsed into multiple targets, the total feature count per sample may be inflated, and the relative abundance of each feature may no longer correspond to that of the sequences assigned to it. In other words, this breaks the [compositionality](https://en.wikipedia.org/wiki/Compositional_data) of the data.

How significantly this may impact an analysis depends on the relative frequency of multiple mappings found in the data, the biological relevance of the affected features, and the statistical nature of the analysis.

Expand Down
2 changes: 1 addition & 1 deletion doc/hierarchy.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Woltka features a highly flexible hierarchical classification system. It is repr

The term "**rank**" (or "level") is still relevant, but it is merely a property of a feature, and does not bear information of hierarchy. For example, above a _genus_-level unit it does not have to be a _family_-level one, but could directly go to _order_, or have a _tribe_ which isn't common for the rest of the tree, or one or more nodes which do NOT have the rank assignment.

In another word, Woltka classification is **rank-independent**. This design enables finer-grain resolution of feature relationships, in addition to flexibility. It is therefore suitable for complex systems, such as phylogenetic trees.
In other words, Woltka classification is **rank-independent**. This design enables finer-grain resolution of feature relationships, in addition to flexibility. It is therefore suitable for complex systems, such as phylogenetic trees.

That being said, Woltka still supports ranked hierarchies and one can instruct the program to target one or more specific ranks.

Expand Down

0 comments on commit 634535d

Please sign in to comment.