Skip to content

Commit

Permalink
Merge pull request #128 from qiyunzhu/test
Browse files Browse the repository at this point in the history
renamed normalize as frac
  • Loading branch information
qiyunzhu committed Jun 27, 2021
2 parents 46c2ff2 + 297f549 commit 244e4a0
Show file tree
Hide file tree
Showing 11 changed files with 28 additions and 54 deletions.
29 changes: 14 additions & 15 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,33 @@ name: main CI

on:
push:
branches: [ $default-branch ]
branches: [ master ]
pull_request:
branches: [ $default-branch ]
branches: [ master ]

jobs:
build:
runs-on: ${{ matrix.os }}-latest
strategy:
max-parallel: 5
matrix:
os: ['ubuntu', 'macos']
python-version: ['3.6', '3.8']
os: ['ubuntu']
python-version: ['3.6']

steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
- name: Set up Conda
uses: s-weigand/setup-conda@v1
with:
update-conda: true
python-version: ${{ matrix.python-version }}

- name: Set up Conda
run: echo $CONDA/bin >> $GITHUB_PATH
conda-channels: anaconda, conda-forge

- name: Install dependencies
run: conda install -c conda-forge --file ci/conda_requirements.txt
run: conda install --file ci/conda_requirements.txt

- name: Install CI packages
run: conda install -c conda-forge flake8 coveralls
run: conda install flake8 coveralls

- name: Install program
run: pip install -e .
Expand All @@ -40,7 +39,7 @@ jobs:
- name: Run unit tests
run: coverage run -m unittest

- name: Coveralls GitHub Action
uses: coverallsapp/github-action@v1.1.2
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
- name: Coveralls
run: coveralls --service=github
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
23 changes: 0 additions & 23 deletions .travis.yml

This file was deleted.

3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# Woltka

[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
[![Build Status](https://travis-ci.org/qiyunzhu/woltka.svg?branch=master)](https://travis-ci.org/qiyunzhu/woltka)
[![Workflow](https://github.com/qiyunzhu/woltka/actions/workflows/main.yml/badge.svg)](https://github.com/qiyunzhu/woltka/actions)
[![CI Status](https://github.com/qiyunzhu/woltka/actions/workflows/main.yml/badge.svg)](https://github.com/qiyunzhu/woltka/actions)
[![Coverage Status](https://coveralls.io/repos/github/qiyunzhu/woltka/badge.svg?branch=master)](https://coveralls.io/github/qiyunzhu/woltka?branch=master)

**Woltka** (Web of Life Toolkit App), is a bioinformatics package for shotgun metagenome data analysis. It takes full advantage of, and it not limited by, the [WoL](https://biocore.github.io/wol/) reference phylogeny. It bridges first-pass sequence aligners with advanced analytical platforms (such as QIIME 2). Highlights of this program include:
Expand Down
2 changes: 1 addition & 1 deletion doc/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ Option | Description
`--input`, `-i` (required) | Path to input profile.
`--map`, `-m` (required) | Path to mapping of source features to target features.
`--output`, `-o` (required) | Path to output profile.
`--normalize`, `-z` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.
`--frac`, `-f` | Count each target feature as 1 / _k_ (_k_ is the number of targets mapped to a source). Otherwise, count as one.
`--names`, `-n` | Path to mapping of target features to names. The names will be appended to the collapsed profile as a metadata column.


Expand Down
2 changes: 1 addition & 1 deletion doc/collapse.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ source4 <tab> target3

### Normalization

By default, if one source feature is simultaneously mapped to _k_ targets, each target will be counted once. With the `--normalize` or `-z` flag added to the command, each target will be counted 1 / _k_ times.
By default, if one source feature is simultaneously mapped to _k_ targets, each target will be counted once. With the `--frac` or `-f` flag added to the command, each target will be counted 1 / _k_ times.

Whether to enable normalization depends on the nature and aim of your analysis. For example, one gene is involved in two pathways (which isn't uncommon), should each pathway be counted once, or half time?

Expand Down
2 changes: 1 addition & 1 deletion doc/metacyc.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ woltka tools collapse -i pathway.biom -m metacyc/pathway-to-super_pathway.txt -n
woltka tools collapse -i super_pathway.biom -m metacyc/pathway_type.txt -n metacyc/all_class_name.txt -o pathway_type.biom
```

The collapse command supports **many-to-many** mapping. For example, if one reaction is found in three pathways, each pathway will be counted **once**. In some instances (e.g., to retain compositionality of the profile), one may consider adding the `--normalize` flag, which will instruct the program to count each pathway 1 / 3 times ([see details](collapse.md)).
The collapse command supports **many-to-many** mapping. For example, if one reaction is found in three pathways, each pathway will be counted **once**. In some instances (e.g., to retain compositionality of the profile), one may consider adding the `--frac` flag, which will instruct the program to count each pathway 1 / 3 times ([see details](collapse.md)).


## Pathway coverage
Expand Down
3 changes: 1 addition & 2 deletions doc/wol.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,8 +220,7 @@ So on so forth. See [here](metacyc.md) for a graph of all available collapsing d

`classify` only supports a tree structure, in which one child unit has exactly one parent unit. This is typical in taxonomic classification. If multiple parents are present, all but the first parent will be discarded. In contrast, `collapse` supports **one-to-multiple** mappings, therefore it is more suitable when this is the norm instead of exception, especially in functional classification (where one gene can be involved in multiple metabolic pathways).

`classify` always ensures the **compositionality** of the feature table, in which the frequencies match the numbers of aligned sequences. `collapse` however does not by default. In a one-to-multiple mapping, all parents will be counted once. But one can add `--normalize` to the `collapse` command to normalize the counts by the number of parents so that the compositionality is retained.

`classify` always ensures the **compositionality** of the feature table, in which the frequencies match the numbers of aligned sequences. `collapse` however does not by default. In a one-to-multiple mapping, all parents will be counted once. But one can add `--frac` to the `collapse` command to normalize the counts by the number of parents so that the compositionality is retained.

## Stratified taxonomic / functional classification

Expand Down
2 changes: 1 addition & 1 deletion woltka/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@ def merge_cmd(ctx, **kwargs):
type=click.Path(writable=True, dir_okay=False),
help='Path to output profile.')
@click.option(
'--normalize', '-z', is_flag=True,
'--frac', '-f', is_flag=True,
help=('Count each target feature as 1/k (k is the number of targets '
'mapped to a source). Otherwise, count as one.'))
@click.option(
Expand Down
2 changes: 1 addition & 1 deletion woltka/tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ def test_collapse_cmd(self):
'--map', map_fp,
'--output', output_fp,
'--names', names_fp,
'--normalize']
'--frac']
res = self.runner.invoke(collapse_cmd, params)
self.assertEqual(res.exit_code, 0)
self.assertEqual(res.output.splitlines()[-1],
Expand Down
2 changes: 1 addition & 1 deletion woltka/tests/test_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ def test_collapse_wf(self):
# wrong mapping file
map_fp = join(self.datdir, 'tree.nwk')
with self.assertRaises(SystemExit) as ctx:
collapse_wf(input_fp, map_fp, output_fp, normalize=True)
collapse_wf(input_fp, map_fp, output_fp, frac=True)
errmsg = 'No source-target relationship is found in tree.nwk.'
self.assertEqual(str(ctx.exception), errmsg)

Expand Down
12 changes: 6 additions & 6 deletions woltka/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,11 +192,11 @@ def _read_profile(fp):
click.echo('Merged profile written.')


def collapse_wf(input_fp: str,
map_fp: str,
output_fp: str,
normalize: bool = False,
names_fp: str = None):
def collapse_wf(input_fp: str,
map_fp: str,
output_fp: str,
frac: bool = False,
names_fp: str = None):
"""Workflow for collapsing a profile based on many-to-many mapping.
Raises
Expand Down Expand Up @@ -225,7 +225,7 @@ def collapse_wf(input_fp: str,

# collapse profile by mapping
click.echo('Collapsing profile...', nl=False)
table = collapse_table(table, mapping, normalize)
table = collapse_table(table, mapping, frac)
click.echo(' Done.')
n = table_shape(table)[0]
click.echo(f'Number of features after collapsing: {n}.')
Expand Down

0 comments on commit 244e4a0

Please sign in to comment.