Skip to content

0.0 Quick Start Guide

Rachael Storo edited this page Feb 26, 2024 · 17 revisions

Requirements

See below for details on each

  • Dependencies
  • Reference Databases
  • Conda Environments
  • Setup

1.0 Dependencies

conda

  • see installation instructions here.

mamba

  • in your base conda environment, run the following:
conda install -c conda-forge mamba

bbtools

  • download bbtools from here and move it to your Amethyst working directory.
  • run the following:
tar -xvzf BBMap_39.06.tar.gz

snakemake

  • in your base conda environment, run the following:
mamba install -c conda-forge -c bioconda snakemake

python

  • v 3.6.0 and above
  • see installation instructions here

2.0 Conda Environments

To run Amethyst, the following conda environments must be created:

NOTE:

You can either install all of these manually, or use the included yaml files (currently only for linux systems). To initiate each environment from the yaml files (found in the Amethyst home directory) you can use the following, replacing "environment" with the actual file name:

conda env create -f environment.yml

Sequence QC

  • fastqc
  • multiqc
mamba create -n mg-qc -c bioconda fastqc multiqc  

Normalization

  • bbtools
mamba create -n mg-norm -c bioconda bbmap  

Sequence Trimming

  • multitrim
git clone https://github.com/KGerhardt/multitrim
conda create -n multitrim -f multitrim.yml

Assembly

  • megahit
  • seqkit
mamba create -n mg-assembly -c bioconda megahit seqkit prodigal 
mamba create -n mg-assembly2 -c conda-forge -c bioconda -c defaults prokka

Binning

  • bowtie2
mamba create -n mg-binning -c bioconda bowtie2
mamba create -n mg-binning2 -c bioconda bowtie2 minimap2 maxbin2 metabat2 seqkit
mamba create -n mg-binning3 -c bioconda drep gtdbtk checkm-genome 
mamba create -n mg-checkm -c bioconda checkm-genome

mamba create -n checkm python=3.9
conda activate checkm
mamba install -c bioconda numpy matplotlib pysam
mamba install -c bioconda hmmer prodigal pplacer
pip3 install checkm-genome
conda deactivate

Diversity

  • biobakery
  • sourmash
  • krona
  • humann
  • metaphlan
  • seqkit
mamba create -n mg-diversity -c bioconda -c biobakery sourmash krona humann metaphlan seqkit

3.0 Reference Databases

Download of all reference databases is required for Amethyst to run. The following should be downloaded into the dbs/ subfolder within Amethyst

Sourmash

  • Proceed to the Sourmash prepared databases
  • Download the GTDB zipped file for k=31 into /Amethyst/dbs/
  • Download the accompanying taxa sheet here into /Amethyst/dbs

Checkm

  • Download the reference database here while in the /Amethyst/dbs directory.
  • Decompress the database with the following:
tar xvzf checkm_data_2015_01_16.tar.gz

GTDBtk

  • activate the mg-binning3 conda environment
  • Use the R214 release of GTDB-tk found here
  • Execute the following code, putting your path to the gtdbtk data as the variable:
conda activate mg-binning3
conda env config vars set GTDBTK_DATA_PATH="/path/to/unarchived/gtdbtk/data"
download-db.sh
conda deactivate

4.0 Setup

  • Clone the Amethyst directory and navigate to it:
git clone https://github.com/rckarns8/amethyst.git
  • To run Amethyst, ensure that your data files are located in amethyst/00_data/fastq/R1 and amethyst/00_data/fastq/R2, respectively.

5.0 Running Amethyst

  • To run Amethyst, use the following commands from you Amethyst directory with data files in the correct place (see above).
  • If possible, use a job submission script to avoid memory kills of jobs.
  • Make sure you edit the --cores flag to reflect the number of cores available to you.
snakemake -s Snakefile --keep-incomplete  --use-conda --cores 10 -p --latency-wait 90 --verbose --rerun-triggers mtime

FAQs

I am getting an error that says "cannot read x file" or similar:

  • Try changing the permissions in the directory using chmod:
chmod -R 777 .

I am getting an error with CheckM, bbtools, GTDB-tk, or Sourmash

  • Check to be sure you followed the instructions above for each of these programs.