added collapsing stratified table #126

qiyunzhu · 2021-06-27T05:57:02Z

This PR adds support for collapsing a stratified table.

For example, the input profile phylum_ko.tsv is like:

Feature ID	Sample 1	Sample 2	Sample 3
Actinobacteria\|K00001	10	6	2
Actinobacteria\|K00002	4	20	3
Bacteroidetes\|K00001	105	0	0
Bacteroidetes\|K00003	75	3	0
Firmicutes\|K00002	8	0	0
Firmicutes\|K00003	0	0	2
...

One wants to collapse KOs into modules using a mapping file ko-to-module.tsv, while leaving phyla the same:

K00001 <tab> M00123
K00002 <tab> M00123
K00002 <tab> M00456
...

One can do (2 means the second field in each feature ID):

woltka tools collapse -i phylum_ko.tsv -m ko-to-module.tsv -o phylum_module.tsv -f 2

The output will be like:

Feature ID	Sample 1	Sample 2	Sample 3
Actinobacteria\|M00123	14	26	5
Bacteroidetes\|M00123	105	0	0
Bacteroidetes\|M00456	75	3	0
Firmicutes\|M00123	8	0	0
Firmicutes\|M00456	0	0	2
...

@droush

droush

This is a handy feature update. I think it will be very useful.

droush · 2021-07-01T21:11:41Z

doc/cli.md

@@ -126,10 +126,10 @@ Option | Description
 `--input`, `-i` (required) | Path to input profile.
 `--map`, `-m` (required) | Path to mapping of source features to target features.
 `--output`, `-o` (required) | Path to output profile.
-`--frac`, `-f` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.
+`--divide`, `-d` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.


Check --frac parameter. Previously was --normalize, and is now inconsistent within the codebase.

--frac actually does sound more intuitive than --divide.

It was modified during a previous effort in replacing Travis CI with GitHub Actions (I changed some trivial code in order to fire a PR...). Now it should be consistently --divide.

droush · 2021-07-01T21:19:23Z

doc/cli.md

@@ -126,10 +126,10 @@ Option | Description
 `--input`, `-i` (required) | Path to input profile.
 `--map`, `-m` (required) | Path to mapping of source features to target features.
 `--output`, `-o` (required) | Path to output profile.
-`--frac`, `-f` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.
+`--divide`, `-d` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.
+`--field`, `-f` | Index of field to be collapsed in a stratified profile. For example, use `-f 2` to collapse "gene" in "microbe\|gene".


Perhaps a future feature update where the field can be a name like 'gene' instead of an index.

Index is fine for now, but this is something to consider for ease of use for end users.

Good point! One reason this hasn't been implemented is that QIIME 2 has mandatory index headers (like #FeatureID). We need to think about where else we can store these field definitions.

droush · 2021-07-01T21:24:53Z

woltka/tools.py

@@ -223,9 +224,13 @@ def collapse_wf(input_fp:  str,
    if not mapping:
        exit(f'No source-target relationship is found in {basename(map_fp)}.')

+    # convert field index
+    if field:


What is this for?

This is change a user-entered field number (starting from 1) into a Python list index (starting from 0).

qiyunzhu added 3 commits June 26, 2021 22:48

added collapsing stratified table

7ef4670

Merge branch 'master' of github.com:qiyunzhu/woltka into upgrade

c2e1fd6

pulled

d62a250

droush reviewed Jul 1, 2021

View reviewed changes

droush merged commit 9a21a2c into master Jul 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added collapsing stratified table #126

added collapsing stratified table #126

qiyunzhu commented Jun 27, 2021

droush left a comment

droush Jul 1, 2021

qiyunzhu Jul 1, 2021

droush Jul 1, 2021

qiyunzhu Jul 1, 2021

droush Jul 1, 2021

qiyunzhu Jul 1, 2021

added collapsing stratified table #126

added collapsing stratified table #126

Conversation

qiyunzhu commented Jun 27, 2021

droush left a comment

Choose a reason for hiding this comment

droush Jul 1, 2021

Choose a reason for hiding this comment

qiyunzhu Jul 1, 2021

Choose a reason for hiding this comment

droush Jul 1, 2021

Choose a reason for hiding this comment

qiyunzhu Jul 1, 2021

Choose a reason for hiding this comment

droush Jul 1, 2021

Choose a reason for hiding this comment

qiyunzhu Jul 1, 2021

Choose a reason for hiding this comment