This is a repository to reconstruct the analysis done for a metagenomics
project. Only the rules to do the analysis are stored, since the actual data is
too big to store online. One can however see which files have been changed by
having a look at how the *_track.txt
files have changed over the various
commits.
Snakemake has been used to make the analysis reproducible. Snakemake uses python3. A common approach for handling Python scripts is to create a virtual environment for each of them so your modules of different python scripts do not interfer with each other. Here we use conda to install snakemake:
conda create -n py3-snakemake python=3 anaconda
source activate py3-snakemake
pip install snakemake
You can deactivate the environment again with:
source deactivate
The idea is to cluster metagenomic assemblies of six related samples that we have into bins using CONCOCT. For this we follow the complete example.
The assemblies have already been done, so they have been put in the config.json file. All the rules to do the analysis are in the Snakefile. Start by cutting up the contigs into chunks of 10K:
snakemake --cores 6 -p all_cutup_10K
This results in:
$ ls concoct/*/cutup/contigs_10K.fasta
concoct/101B/cutup/contigs_10K.fasta concoct/103/cutup/contigs_10K.fasta concoct/105/cutup/contigs_10K.fasta
concoct/102/cutup/contigs_10K.fasta concoct/104/cutup/contigs_10K.fasta concoct/106/cutup/contigs_10K.fasta