Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added collapsing stratified table #126

Merged
merged 3 commits into from
Jul 1, 2021
Merged

added collapsing stratified table #126

merged 3 commits into from
Jul 1, 2021

Conversation

qiyunzhu
Copy link
Owner

This PR adds support for collapsing a stratified table.

For example, the input profile phylum_ko.tsv is like:

Feature ID Sample 1 Sample 2 Sample 3
Actinobacteria|K00001 10 6 2
Actinobacteria|K00002 4 20 3
Bacteroidetes|K00001 105 0 0
Bacteroidetes|K00003 75 3 0
Firmicutes|K00002 8 0 0
Firmicutes|K00003 0 0 2
...

One wants to collapse KOs into modules using a mapping file ko-to-module.tsv, while leaving phyla the same:

K00001 <tab> M00123
K00002 <tab> M00123
K00002 <tab> M00456
...

One can do (2 means the second field in each feature ID):

woltka tools collapse -i phylum_ko.tsv -m ko-to-module.tsv -o phylum_module.tsv -f 2

The output will be like:

Feature ID Sample 1 Sample 2 Sample 3
Actinobacteria|M00123 14 26 5
Bacteroidetes|M00123 105 0 0
Bacteroidetes|M00456 75 3 0
Firmicutes|M00123 8 0 0
Firmicutes|M00456 0 0 2
...

@droush

Copy link
Collaborator

@droush droush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a handy feature update. I think it will be very useful.

@@ -126,10 +126,10 @@ Option | Description
`--input`, `-i` (required) | Path to input profile.
`--map`, `-m` (required) | Path to mapping of source features to target features.
`--output`, `-o` (required) | Path to output profile.
`--frac`, `-f` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.
`--divide`, `-d` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check --frac parameter. Previously was --normalize, and is now inconsistent within the codebase.

--frac actually does sound more intuitive than --divide.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was modified during a previous effort in replacing Travis CI with GitHub Actions (I changed some trivial code in order to fire a PR...). Now it should be consistently --divide.

@@ -126,10 +126,10 @@ Option | Description
`--input`, `-i` (required) | Path to input profile.
`--map`, `-m` (required) | Path to mapping of source features to target features.
`--output`, `-o` (required) | Path to output profile.
`--frac`, `-f` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.
`--divide`, `-d` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.
`--field`, `-f` | Index of field to be collapsed in a stratified profile. For example, use `-f 2` to collapse "gene" in "microbe\|gene".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a future feature update where the field can be a name like 'gene' instead of an index.

Index is fine for now, but this is something to consider for ease of use for end users.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! One reason this hasn't been implemented is that QIIME 2 has mandatory index headers (like #FeatureID). We need to think about where else we can store these field definitions.

@@ -223,9 +224,13 @@ def collapse_wf(input_fp: str,
if not mapping:
exit(f'No source-target relationship is found in {basename(map_fp)}.')

# convert field index
if field:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is change a user-entered field number (starting from 1) into a Python list index (starting from 0).

@droush droush merged commit 9a21a2c into master Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants