No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
database/HOMD/14.5
input
misc
scripts
tools
.gitignore
README.md
Snakefile
cluster.json
config.json

README.md

Snakemake Microbiome Pipeline

Download pipeline

git clone https://github.com/vitmy0000/SnaMPv2_HOMD.git

Create environment

module load python/anaconda2-4.2.0
conda create -c bioconda -m -p pyenvs/py35-snakemake python=3.5 pandas snakemake

Load environment

module load python/anaconda2-4.2.0
source activate pyenvs/py35-snakemake

Remove environment

source deactivate

Workflow

  • Closed reference OTU picking
  • BLAST agaist HOMD database

workflow_0

Usage

  1. Prepare sequencing data

Repalce the $SOURCE_FILES with the zipped sequencing result, e.g. WHI_Repo/RT530_Batch2/*.gz

cd input
ln -s $SOURCE_FILES .
cd ..
  1. Launch jobs

The pipeline will utilize CCR resource to parallel execution. OTU table and statisics about merge rate, filter rate, hit rate wiil be placed under table

snakemake -p -j 100 --cluster-config cluster.json --cluster "sbatch --partition {cluster.partition} --time {cluster.time} --nodes {cluster.nodes} --ntasks-per-node {cluster.ntasks-per-node}"
  1. Results

Three result files are placed under table directory.

  • QC_table.txt
  • raw_OTU_table_collapsed.txt
  • raw_OTU_table_uncollapsed.txt

Misc

To remove generated files:

snakemake clean

Check snakemake version:

snakemake -v
# 3.13.3

Known Issues

  • cluster time limit: if any of the jobs run of the time limit, it will be silently killed by CCR and the pipeline will keep waiting. To solve this, you need to terminate Snakemake, increase time limit specified in cluster.json accordingly and rerun the pipeline with extra option --rerun-incomplete.