Skip to content

moritzbuck/0053_metasssnake2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

metasssnake2

snakemake/python based metagenomics pipeline better then the first

basic usage

It requires snakemake and conda to run (well conda not necessarily but if you don't have it, it's gonna be a pain).

You will need to fill out a config file similar to the JSON-file included in the sample_configs-folder that gives paths to a set of csv-files (also some samples in the sample_configs-folder). Each csv-file describes the libraries, assemblies, binnings, and bin-sets respectively.

Things to avoid in your config-files: the - symbol, duplicate names.

The pipeline is then simply run as:

snakemake --use-conda --configfile YOUR_CONFIG_FILE    --cores NB_THREADS

This will run all the bin-sets.

If you want to run all libraries, all assemblies or all binnings respectively run:

snakemake --use-conda --configfile YOUR_CONFIG_FILE    --cores NB_THREADS --until all_libs
snakemake --use-conda --configfile YOUR_CONFIG_FILE    --cores NB_THREADS --until all_asses
snakemake --use-conda --configfile YOUR_CONFIG_FILE    --cores NB_THREADS --until all_binnings

All options of snakemake are obviously available, for example:

#shows the jobs to compute
snakemake --use-conda --configfile YOUR_CONFIG_FILE  -n --quiet

#makes a fancy diagram somehow, google it
snakemake --use-conda --configfile YOUR_CONFIG_FILE  -dag

#run it on your SLURM cluster
snakemake --use-conda --configfile YOUR_CONFIG_FILE -j MAX_NB_OF_SUBMITED_JOBS --use-conda --local-cores NB_OF_THREADS --cluster "sbatch -D `pwd` -A YOUR_ACCOUNT  -t '7-00:00:00' -n 20"

specific files can also be generated by directly "making" them, for example:

#make the MYLIN library
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores THREADS ROOT_PATH/libraries/MYlIB/MYLIB_fwd.fastq.gz

#make the MYBINNING binning
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores THREADS ROOT_PATH/binnings/MYBINNING/binned_assembly.fna

#make the MYASS assembly
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores THREADS ROOT_PATH/assemblies/MYASS/assembly.fna

#make the MYBINSET binset
snakemake --use-conda --configfile YOUR_CONFIG_FILE --cores THREADS ROOT_PATH/binsets/MYBINSET/MYBINSET.fna

supplemented scripts

A few additional scripts are available for various purposes, they might need some python libs, do check your error messages, Also they probably only runs if you are in the folder where the workflow is.:

config validation

run :

python $metasssnake2_path/workflow/scripts/utils.py validate_descriptor YOUR_CONFIG_FILE

to check your config-file

generate csvs from path

if your libraries are in the right folder structure, somehow, generates a bunch of csvs as a pair of reads per library and single sample assemblies and binning, and one big binset with all. As well as a config-file.

python $metasssnake2_path/workflow/scripts/utils.py csv_generator ABSOLUTE_PATH_TO_THE_FOLDER ABSOLUTE_OUT_FILE_PREFIX

some editing of the outputted JSON is necessary though