Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: ANCOM-BC R wrapper #88

Merged
merged 62 commits into from
Oct 21, 2022
Merged
Show file tree
Hide file tree
Changes from 60 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
dbd13df
FEAT: gutting q2-comp to re-write as R wrapper for ANCOM/ANCOMBC/SECOM
lizgehret Jun 21, 2022
556615f
lint
lizgehret Jun 21, 2022
a2f7677
revert: changing copyright back to v1 timeframe
lizgehret Jun 21, 2022
22e1401
adding placeholder files for R wrapper scripts
lizgehret Jun 22, 2022
fc3538a
whoops, lint
lizgehret Jun 22, 2022
af80999
Merge branch 'ANCOM-wrapper' of github.com:qiime2/q2-composition into…
lizgehret Jun 27, 2022
de303c9
adding back some of Jamie's functionality for ANCOMBC R wrapper with …
lizgehret Jun 27, 2022
9f8c477
remainder of pipeline for ancombc R wrapper script
lizgehret Jun 27, 2022
17f81eb
adding python script for ancom-bc
lizgehret Jun 27, 2022
1e0b36c
some variable name changes for ancom-bc 3.15 release
lizgehret Jun 28, 2022
74d947f
style preference/modification for ancombc method arguments
lizgehret Jun 28, 2022
8ec3213
refactor: re-writing args using optparse
lizgehret Jun 29, 2022
e11bfd3
removing c() for option list items
lizgehret Jun 29, 2022
6d5838e
wiring: plugin setup wiring for ancombc method
lizgehret Jun 29, 2022
d393f92
removing help text from optparse commands
lizgehret Jun 29, 2022
ec30f00
linting
lizgehret Jun 29, 2022
d568eca
wiring: adding ancombc to __init__
lizgehret Jun 29, 2022
ca9ad36
refactor: removing duplicated functionality for metadata handling in …
lizgehret Jun 29, 2022
82aa1b8
revert: debugging
lizgehret Jun 29, 2022
a57adad
some functionality de-bugging
lizgehret Jul 12, 2022
8afd8e2
functioning R script that produces FeatureData[Differential]
lizgehret Jul 13, 2022
b3f8dd8
lint
lizgehret Jul 13, 2022
0eabd06
splitting apart output & some debugging
lizgehret Jul 14, 2022
f09e7db
debug: fixing wiring issue for structural zeros
lizgehret Jul 20, 2022
d1799c3
imp: adding leveling for group data to select intercept for contrast …
lizgehret Jul 21, 2022
2d26f63
first pass at leveling order for both formula and group parameters - …
lizgehret Jul 26, 2022
702f803
lint
lizgehret Jul 27, 2022
c3e5a9f
imp: first working draft for leveling order using multiple formula ar…
lizgehret Jul 27, 2022
d55d9f0
misc code clean-up
lizgehret Aug 1, 2022
a21a121
todo notes
lizgehret Aug 1, 2022
41dd06c
adding global param
lizgehret Aug 2, 2022
1e5e95a
removing global test from params and output
lizgehret Aug 2, 2022
7cbbd92
metadata validation for parameter inputs
lizgehret Aug 2, 2022
6945cd0
refactoring data validation for ANCOM-BC
lizgehret Aug 3, 2022
69ee666
adding validation for samples present in table but not in metadata
lizgehret Aug 3, 2022
c0a50db
linty
lizgehret Aug 3, 2022
7bdd20b
removing todo comments
lizgehret Aug 3, 2022
440aed2
misc renaming & more verbose error handling for missing IDs in metadata
lizgehret Aug 4, 2022
d7cf5b7
small tweaks per @ebolyen's review comments
lizgehret Aug 4, 2022
e815e20
linty?
lizgehret Aug 4, 2022
ce1dc01
wip: starting wiring for DataLoaf output
lizgehret Aug 11, 2022
bde4af6
wip: dataloaf cont'd
lizgehret Aug 12, 2022
446a0f3
drafting up wiring for DataLoaf output functionality
lizgehret Aug 16, 2022
c9d5485
working DataLoaf output
lizgehret Aug 22, 2022
0f68854
some output refactoring and unit test skeleton
lizgehret Sep 15, 2022
0eef389
ex: adding usage examples for method testing
lizgehret Sep 19, 2022
64b0246
debug: adding traceback details when an error is thrown in R
lizgehret Sep 21, 2022
850787f
maint: segmenting out struc_zeros
lizgehret Sep 22, 2022
9f0bba8
dep: updating recipe
lizgehret Sep 22, 2022
a642dc7
ci: updating recipe cont'd
lizgehret Sep 22, 2022
fad01ab
ex: more usage examples
lizgehret Sep 22, 2022
79884ce
oops linty
lizgehret Sep 22, 2022
eaaa927
test: adding unit test list (unfinished)
lizgehret Sep 22, 2022
3138b47
test: adding unit test functionality
lizgehret Sep 22, 2022
dd2bb14
test: fixes for existing unit tests
lizgehret Sep 23, 2022
8931ced
imp: adding formula validation
lizgehret Sep 30, 2022
0bb55d0
imp: adds formula validation and modifies data slice ID names
lizgehret Oct 6, 2022
b9eecae
imp: removing group and struc_zeros
lizgehret Oct 12, 2022
841bdec
test: adding level ordering tests
lizgehret Oct 12, 2022
c186218
imp: some tweaks from @oddant1 review comments
lizgehret Oct 13, 2022
70fada0
imp: updates per @ebolyen's review comments
lizgehret Oct 21, 2022
77375ba
squash
lizgehret Oct 21, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,15 @@ target/
#Ipython Notebook
.ipynb_checkpoints

#Rhistory
.Rhistory

# vi
.*.swp

# tmp files
input.biom.tsv
input.map.txt
output.summary.txt

.DS_Store
6 changes: 6 additions & 0 deletions ci/recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,15 @@ requirements:
- biom-format {{ biom_format }}
- scipy
- pandas {{ pandas }}
- formulaic
- bioconductor-phyloseq
- r-tidyverse
- r-optparse
- r-frictionless
- qiime2 {{ qiime2_epoch }}.*
- q2templates {{ qiime2_epoch }}.*
- q2-types {{ qiime2_epoch }}.*
- q2-stats {{ qiime2_epoch }}.*

test:
requires:
Expand Down
3 changes: 2 additions & 1 deletion q2_composition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@
# ----------------------------------------------------------------------------

from ._impute import add_pseudocount
from ._ancombc import ancombc
from ._version import get_versions


__version__ = get_versions()['version']
del get_versions

__all__ = ['add_pseudocount']
__all__ = ['add_pseudocount', 'ancombc']
7 changes: 0 additions & 7 deletions q2_composition/_ancom.py

This file was deleted.

159 changes: 159 additions & 0 deletions q2_composition/_ancombc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2016-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file LICENSE, distributed with this software.
# ----------------------------------------------------------------------------
import subprocess
import tempfile
import pandas as pd
import os
import formulaic

import qiime2

from q2_stats._format import DataLoafPackageDirFmt


def run_commands(cmds, verbose=True):
lizgehret marked this conversation as resolved.
Show resolved Hide resolved
if verbose:
print('Running external command line application(s). This may print'
' messages to stdout and/or stderr.')
print('The command(s) being run are below. These commands cannot be'
' manually re-run as they will depend on temporary files that no'
' longer exist.')
for cmd in cmds:
if verbose:
print('\nCommand:', end=' ')
print(' '.join(cmd), end='\n\n')
subprocess.run(cmd, check=True)


def ancombc(table: pd.DataFrame, metadata: qiime2.Metadata, formula: str,
lizgehret marked this conversation as resolved.
Show resolved Hide resolved
p_adj_method: str = 'holm', prv_cut: float = 0.1, lib_cut: int = 0,
level_ordering: str = None, neg_lb: bool = False,
tol: float = 1e-05, max_iter: int = 100, conserve: bool = False,
alpha: float = 0.05) -> DataLoafPackageDirFmt:

return _ancombc(
table=table,
metadata=metadata,
formula=formula,
p_adj_method=p_adj_method,
prv_cut=prv_cut,
lib_cut=lib_cut,
level_ordering=level_ordering,
neg_lb=neg_lb,
tol=tol,
max_iter=max_iter,
conserve=conserve,
alpha=alpha,
)


# utility functions for formula parsing and column validation
def _parse_terms(formula):
parse = formulaic.parser.parser.DefaultFormulaParser(
include_intercept=False)
terms = parse.get_ast(formula=formula).flatten()
formula_terms = _leaf_collector(terms)
return formula_terms


def _leaf_collector(term):
lizgehret marked this conversation as resolved.
Show resolved Hide resolved
if isinstance(term, formulaic.parser.types.Token):
return [str(term)]

if type(term) is not list:
return []

return _leaf_collector(term[1]) + _leaf_collector(term[2])


def _column_validation(metadata, parameter, value):
if value not in metadata.columns:
raise ValueError('Value provided in the `%s` parameter was not found'
' in any of the metadata columns. Please make sure to'
' only include values that are present within the'
' metadata columns.'
' \n\n'
' Value that was not found as a metadata column: "%s"'
% (parameter, value))


def _ancombc(table, metadata, formula, p_adj_method, prv_cut, lib_cut,
level_ordering, neg_lb, tol, max_iter, conserve, alpha):

meta = metadata.to_dataframe()

# error on IDs found in table but not in metadata
missing_ids = table.index.difference(meta.index).values

if missing_ids.size > 0:
raise KeyError('Not all samples present within the table were found in'
' the associated metadata file. Please make sure that'
' all samples in the FeatureTable are also present in'
' the metadata.'
' Sample IDs not found in the metadata: %s'
% missing_ids)

# column validation for the formula parameter
formula_terms = _parse_terms(formula=formula)
for term in formula_terms:
_column_validation(meta, 'formula', term)

# column & level validation for the level_ordering parameter
if level_ordering is not None:
for i in level_ordering:
column = i.split('::')[0]
level_value = i.split('::')[1]

_column_validation(meta, 'level_ordering', column)

if level_value not in pd.unique(meta[column].values):
raise ValueError('Value provided in `level_ordering` parameter'
' not found in the associated column within'
' the metadata. Please make sure each'
' column::value pair is present within the'
' metadata file.'
' \n\n'
' column::value pair with a value that was'
' not found: "%s"' % i)
else:
level_ordering = ''

with tempfile.TemporaryDirectory() as temp_dir_name:
temp_dir_name = '.'
biom_fp = os.path.join(temp_dir_name, 'input.biom.tsv')
meta_fp = os.path.join(temp_dir_name, 'input.map.txt')

table.to_csv(biom_fp, sep='\t', header=True)
meta.to_csv(meta_fp, sep='\t', header=True)

output_loaf = DataLoafPackageDirFmt()

cmd = ['run_ancombc.R',
'--inp_abundances_path', biom_fp,
'--inp_metadata_path', meta_fp,
'--formula', str(formula),
'--p_adj_method', p_adj_method,
'--prv_cut', str(prv_cut),
'--lib_cut', str(lib_cut),
'--level_ordering', str(level_ordering),
'--neg_lb', str(neg_lb),
'--tol', str(tol),
'--max_iter', str(max_iter),
'--conserve', str(conserve),
'--alpha', str(alpha),
'--output_loaf', str(output_loaf)
]

try:
lizgehret marked this conversation as resolved.
Show resolved Hide resolved
run_commands([cmd])
except subprocess.CalledProcessError as e:
raise Exception('An error was encountered while running ANCOM-BC'
' in R (return code %d), please inspect stdout and'
' stderr to learn more.' % e.returncode)
lizgehret marked this conversation as resolved.
Show resolved Hide resolved

return output_loaf
66 changes: 66 additions & 0 deletions q2_composition/_examples.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2022-2022, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file LICENSE, distributed with this software.
# ----------------------------------------------------------------------------

import os
import pkg_resources

import qiime2


def _get_data_from_tests(path):
return pkg_resources.resource_filename('q2_composition.tests',
os.path.join('data', path))


def ancom_md_factory():
return qiime2.Metadata.load(
_get_data_from_tests('sample-md-ancom.tsv'))


def ancom_table_factory():
return qiime2.Artifact.load(
_get_data_from_tests('table-ancom.qza'))


def ancombc_single_formula(use):
table = use.init_artifact('table', ancom_table_factory)
metadata = use.init_metadata('metadata', ancom_md_factory)

dataloaf, = use.action(
use.UsageAction('composition', 'ancombc'),
use.UsageInputs(
table=table,
metadata=metadata,
formula='bodysite'
),
use.UsageOutputNames(
differentials='dataloaf'
)
)

dataloaf.assert_output_type('FeatureData[DifferentialAbundance]')


def ancombc_multi_formula_with_level_ordering(use):
table = use.init_artifact('table', ancom_table_factory)
metadata = use.init_metadata('metadata', ancom_md_factory)

dataloaf, = use.action(
use.UsageAction('composition', 'ancombc'),
use.UsageInputs(
table=table,
metadata=metadata,
formula='bodysite + month',
level_ordering=["bodysite::tongue"]
),
use.UsageOutputNames(
differentials='dataloaf'
)
)

dataloaf.assert_output_type('FeatureData[DifferentialAbundance]')
Loading