Skip to content

Commit

Permalink
Merge pull request #7 from gibsramen/develop
Browse files Browse the repository at this point in the history
Update to 0.4.2
  • Loading branch information
gibsramen committed Apr 5, 2022
2 parents c70ade3 + 02ef6d2 commit 34da092
Show file tree
Hide file tree
Showing 47 changed files with 1,192 additions and 553 deletions.
17 changes: 7 additions & 10 deletions .github/workflows/main_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ on:
pull_request:
branches:
- main
- develop
push:
branches:
- main
- develop

jobs:
build:
Expand Down Expand Up @@ -39,19 +41,14 @@ jobs:
shell: bash -l {0}
run: pip install gemelli evident

- name: Install xebec
shell: bash -l {0}
run: pip install -e .

- name: Run tests
shell: bash -l {0}
run: make test

- name: Run Snakemake
shell: bash -l {0}
run: |
TMPDIR=$(mktemp -d)
COOKIE_DIR=$(pwd)
TABLE_FILE=$(realpath tests/data/table.biom)
MD_FILE=$(realpath tests/data/metadata.tsv)
TREE_FILE=$(realpath tests/data/tree.tre)
cd ${TMPDIR}
cookiecutter --no-input ${COOKIE_DIR} feature_table_file=${TABLE_FILE} sample_metadata_file=${MD_FILE} phylogenetic_tree_file=${TREE_FILE}
cd diversity-benchmark
snakemake --cores 1
run: make snaketest
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
*.swo
*.swp
*egg-info/
*__pycache__/
dist/
build/
3 changes: 3 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
include xebec/cookiecutter.json
graft xebec/{{cookiecutter.project_name}}
graft xebec/js
10 changes: 5 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
TMPDIR := $(shell mktemp -d)
COOKIE_DIR := $(shell pwd)
TABLE_FILE := $(shell realpath tests/data/table.biom)
MD_FILE := $(shell realpath tests/data/metadata.tsv)
TREE_FILE := $(shell realpath tests/data/tree.tre)
COOKIE_DIR := $(shell pwd)/xebec
TABLE_FILE := $(shell realpath xebec/tests/data/table.biom)
MD_FILE := $(shell realpath xebec/tests/data/metadata.tsv)
TREE_FILE := $(shell realpath xebec/tests/data/tree.tre)

all: test snaketest

test:
pytest
pytest --template $(COOKIE_DIR)

snaketest:
@cd $(TMPDIR); \
Expand Down
107 changes: 68 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[![Main CI](https://github.com/gibsramen/xebec/actions/workflows/main_ci.yml/badge.svg)](https://github.com/gibsramen/xebec/actions/workflows/main_ci.yml)
[![PyPI](https://img.shields.io/pypi/v/xebec.svg)](https://pypi.org/project/xebec)

# xebec

Snakemake pipeline for microbiome diversity effect size benchmarking
Expand All @@ -6,43 +9,60 @@ Snakemake pipeline for microbiome diversity effect size benchmarking

## Installation

To use xebec, you will need several dependencies:
To use xebec, you will need several dependencies.
We recommend using [`mamba`](https://github.com/mamba-org/mamba) to install these packages when possible.

* [snakemake](https://github.com/snakemake/snakemake)
* [cookiecutter](https://github.com/cookiecutter/cookiecutter)
* [biom-format](https://github.com/biocore/biom-format)
* [unifrac](https://github.com/biocore/unifrac)
* [iow](https://github.com/biocore/improved-octo-waddle)
* [scikit-bio](https://github.com/biocore/scikit-bio)
* [numpy](https://github.com/numpy/numpy)
* [pandas](https://github.com/pandas-dev/pandas)
* [seaborn](https://github.com/mwaskom/seaborn)
* [bokeh](https://github.com/bokeh/bokeh)
* [evident](https://github.com/gibsramen/evident)
* [gemelli](https://github.com/biocore/gemelli)
```
mamba install -c conda-forge -c bioconda biom-format h5py==3.1.0 snakemake pandas unifrac scikit-bio bokeh cookiecutter unifrac-binaries
We recommend using `conda`/`mamba` to install these packages when possible.
Note that at time of writing, evident and gemelli are only available through PyPi.
pip install evident>=0.2.0 gemelli>=0.0.8
```

From the command line, run the following command:
To install xebec, run the following command from the command line:

```
cookiecutter https://github.com/gibsramen/xebec
pip install xebec
```

You should enter a prompt where you can input the required values to setup xebec.
## Usage

If you run `xebec --help`, you should see the following:

```
$ xebec --help
Usage: xebec [OPTIONS]
Options:
-ft, --feature-table PATH Feature table in BIOM format. [required]
-m, --metadata PATH Sample metadata in TSV format. [required]
-t, --tree PATH Phylogenetic tree in Newick format.
[required]
-o, --output PATH Output workflow directory. [required]
--max-category-levels INTEGER Max number of levels in a category.
[default: 5]
--min-level-count INTEGER Min number of samples per level per
category. [default: 3]
--rarefy-percentile FLOAT Percentile of sample depths at which to
rarefy. [default: 10]
* `project_name`: Name of the directory to create with the Snakemake pipeline files (defaults to `diversity-benchmark`).
* `feature_table_file`: *absolute* path to the feature table to be used in BIOM format.
* `sample_metadata_file`: *absolute* path to the sample metadata file to be used in TSV format.
* `phylogenetic_tree_file`: *absolute* path to the phylogenetic tree file to be used in Newick format.
* `max_category_levels`: Maximum number of levels in a category to consider. Any categories with more than this number of levels will be dropped (defaults to 5).
* `min_level_count`: Minimum number of samples in a given level to continue. If a level is represented by fewer than this many samples, this level will be set to NaN (defaults to 3).
* `rarefaction_depth_percentile`: Depth percentile at which to rarefy for diversity metrics that require it (defaults to 10th percentile).
--validate-input / --no-validate-input
Whether to validate input before creating
workflow. [default: True]
This will create the directory structure needed to run xebec under the project name you specified.
--execute / --no-execute Whether to automatically execute the
workflow. [default: False]
The directory structure should be as follows:
--help Show this message and exit.
```

To create the workflow structure, pass in the filepaths for the feature table, sample metadata, and phylogenetic tree.
You must also pass in a path to a directory in which to create the workflow.
Additionally, you can provide parameters for determining how to process your sample metadata.

After running this command, navigate to the output directory you created. The directory structure should be as follows:

```
diversity-benchmark/
Expand All @@ -54,7 +74,6 @@ diversity-benchmark/
├── rules
│   ├── alpha_diversity.smk
│   ├── beta_diversity.smk
│   ├── common.smk
│   ├── evident.smk
│   ├── preprocess_data.smk
│   └── visualization.smk
Expand All @@ -69,12 +88,8 @@ diversity-benchmark/
│   ├── rarefy.py
│   └── run_evident.py
└── Snakefile
4 directories, 19 files
```

## Usage

Navigate inside the `<project_name>` directory.
To start the pipeline , run the following command:

Expand All @@ -84,13 +99,10 @@ snakemake --cores 1

You should see the Snakemake pipeline start running the jobs.
If this pipeline runs sucessfully, the processed results will be located at `<project_name>/results`.
Open the `results/beta_div/effect_size_plot.html` and `results/alpha_div/effect_size_plot.html` webpages and you should be taken to an interactive visualization.
On the left are the effect sizes of diversity differences for binary categories.
On the right are the effect sizes of diversity differences for multi-class categories.
You can move around these plots, zoom in, as well as toggle the visibility of diversity metrics by clicking on the legend.
These plots are generated using [Bokeh](https://github.com/bokeh/bokeh).
Included in the results are the concatenated effect size values as well as interactive plots summarizing the effect sizes for each metadata column for each diversity metric.
These plots are generated using [Bokeh](https://github.com/bokeh/bokeh) and can be visualized in any modern web browser.

![Bokeh](imgs/bokeh.png)
![Bokeh](https://raw.githubusercontent.com/gibsramen/xebec/main/imgs/bokeh.png)

## Workflow Overview

Expand All @@ -103,4 +115,21 @@ xebec performs four main steps, some of which have substeps.

An overview of the DAG is shown below:

![xebec DAG](imgs/dag.png)
![xebec DAG](https://raw.githubusercontent.com/gibsramen/xebec/main/imgs/dag.png)

## Configuration

### Diversity Metrics

xebec allows configuration of what alpha and beta diversity metrics are included in the workflow.
To add or remove metrics, modify the `config/alpha_div_metrics.yml` and `config/beta_div_metrics.yml` files.
For alpha diversity, any metric that can be passed into `skbio.alpha_diversity` should work.
For beta diversity, any non-phylogenetic metric that can be passed into `skbio.beta_diversity` should work.
Valid phylogenetic beta diversity are those that can be passed into [Striped UniFrac](https://github.com/biocore/unifrac).
Make sure that any additional diversity metrics are annotated with `phylo` or `non_phylo` so xebec knows how to process them.

### Snakemake Options

The xebec workflow can be decorated with many configuration options available in Snakemake, including resource usage and HPC scheduling.
We recommend reading through the [Snakemake documentation](https://snakemake.readthedocs.io/en/stable/index.html) for details on these options.
Note that some of these options may require creating new configuration files.
3 changes: 3 additions & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[pytest]
filterwarnings =
ignore::DeprecationWarning
67 changes: 67 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Much of this page is taken from the gemelli setup file
import ast
import re
from setuptools import find_packages, setup

# version parsing from __init__ pulled from Flask's setup.py
# https://github.com/mitsuhiko/flask/blob/master/setup.py
_version_re = re.compile(r"__version__\s+=\s+(.*)")

with open("xebec/__init__.py", "rb") as f:
hit = _version_re.search(f.read().decode("utf-8")).group(1)
version = str(ast.literal_eval(hit))

with open("README.md") as f:
long_description = f.read()

classes = """
Development Status :: 3 - Alpha
License :: OSI Approved :: BSD License
Topic :: Software Development :: Libraries
Topic :: Scientific/Engineering
Topic :: Scientific/Engineering :: Bio-Informatics
Programming Language :: Python :: 3
Programming Language :: Python :: 3 :: Only
Operating System :: Unix
Operating System :: POSIX
Operating System :: MacOS :: MacOS X
"""
classifiers = [s.strip() for s in classes.split("\n") if s]

description = (
"Snakemake pipeline for microbiome diversity effect size "
"benchmarking."
)

standalone = ["xebec=xebec.cli.cli:xebec"]

setup(
name="xebec",
author="Gibraan Rahman",
author_email="grahman@eng.ucsd.edu",
description=description,
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/gibsramen/xebec",
version=version,
license="BSD-3-Clause",
packages=["xebec/src", "xebec/cli", "xebec"],
install_requires=[
"numpy",
"h5py==3.1.0",
"evident>=0.2.0",
"gemelli>=0.0.8",
"pandas>=1.0.0",
"scikit-bio>=0.5.6",
"unifrac",
"snakemake",
"cookiecutter",
"seaborn",
"bokeh",
"click"
],
include_package_data=True,
package_data={"": ["xebec/js", "xebec/{{cookiecutter.project_name}}"]},
entry_points={"console_scripts": standalone},
classifiers=classifiers,
)
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
import logging
import os


__version__ = "0.4.2"
COOKIE_DIR = os.path.dirname(__file__)


def get_logger(logfile, rulename):
Expand Down
Empty file added xebec/cli/__init__.py
Empty file.
77 changes: 77 additions & 0 deletions xebec/cli/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import os
from pathlib import PurePath
import subprocess

import click
from cookiecutter.main import cookiecutter

from xebec import COOKIE_DIR
from xebec.src._validate import (validate_table, validate_metadata,
validate_tree)


@click.command(name="xebec")
@click.option("--feature-table", "-ft", required=True, type=click.Path(),
help="Feature table in BIOM format.")
@click.option("--metadata", "-m", required=True, type=click.Path(),
help="Sample metadata in TSV format.")
@click.option("--tree", "-t", required=True, type=click.Path(),
help="Phylogenetic tree in Newick format.")
@click.option("--output", "-o", required=True, type=click.Path(),
help="Output workflow directory.")
@click.option("--max-category-levels", default=5, show_default=True,
type=int, help="Max number of levels in a category.")
@click.option("--min-level-count", default=3, show_default=True,
type=int, help="Min number of samples per level per category.")
@click.option("--rarefy-percentile", default=10, show_default=True,
type=float, help="Percentile of sample depths at which to rarefy.")
@click.option("--validate-input/--no-validate-input", default=True,
help="Whether to validate input before creating workflow.",
show_default=True)
@click.option("--execute/--no-execute", default=False, type=bool,
help="Whether to automatically execute the workflow.",
show_default=True)
def xebec(
feature_table,
metadata,
tree,
output,
max_category_levels,
min_level_count,
rarefy_percentile,
validate_input,
execute
):
feature_table = os.path.abspath(feature_table)
metadata = os.path.abspath(metadata)
tree = os.path.abspath(tree)

if validate_input:
validate_table(feature_table)
validate_metadata(metadata)
validate_tree(tree)

output = PurePath(output)
project_dir = output.parent
project_name = os.path.basename(output)
os.chdir(project_dir)

args={
"project_name": project_name,
"feature_table_file": feature_table,
"sample_metadata_file": metadata,
"phylogenetic_tree_file": tree,
"max_category_levels": max_category_levels,
"min_level_count": min_level_count,
"rarefaction_depth_percentile": rarefy_percentile,
}

cookiecutter(COOKIE_DIR, no_input=True, extra_context=args)

if execute:
os.chdir(project_name)
subprocess.run(["snakemake", "--cores", "1"], check=True)


if __name__ == "__main__":
xebec()
File renamed without changes.
Loading

0 comments on commit 34da092

Please sign in to comment.