Skip to content
Lucas Anchieri edited this page Feb 7, 2023 · 14 revisions

Below you may find an answer to your question about mapache.

Q: How can I report the versions of the software used in mapache?

Q: How to launch a dry run?

You can see a list of the jobs that will be executed with the line

snakemake  -n

and also print the command lines to be executed with

snakemake  -n -p

Q: How can I update my mappings by adding new FASTQ files?

See the information here

snakemake  -R `snakemake --list-input-changes`

do not forget to add the --cores or --jobs and --profile arguments depending on how you want to execute the jobs.

Q: How can I re-run a part of the pipeline after changing one parameter in the config file?

Similarly as above, you need to include the options --cores or --jobs and --profile to the following line:

snakemake -n -R `snakemake --list-params-changes`

Q: I tweaked some parameters of the plots in the config file, but the plots in the report look the same as before. How can I fix that?

You first need to ask snakemake to re-create the plots, and then generate the report.

snakemake -R `snakemake --list-params-changes` --cores 1
snakemake --report

Q: Best practices

As it is common to make mistakes in both wet and dry labs, we highly recommend to adhere to meaningful names for the samples, libraries and ID columns of the samples files. This might be helpful in the case you spot an odd output (e.g., empty BAM files, high contamination, less reads than expected) due to different sources of errors (sample swaps, truncated input files, wrong paths).

We encourage you to make use of the dry runs (-n) and subsample a small number of reads to verify that a random subset of your reads are well mapped, but also to get familiar with the pipeline.

Finally, the report contains several statistics that are usually helpful to have a quick and first idea of the quality of the mappings.

Q: GATK licence.

Due to license restrictions, the mapache conda package cannot distribute and install GATK 3.8 directly (please note that GATK IndelRealigner used in the pipeline is not available in GATK >v4). To fully install GATK, you must download a licensed copy of GATK from the Broad Institute, and call “gatk-register,” which will copy GATK into your mapache conda environment:

# (download licensed copy of GATK)
gatk-register /path/to/GenomeAnalysisTK.jar

In short, you have

  1. to activate the conda environment containing GATK.
conda activate mapache
  1. type gatk to see if GATK is already registered. If you get the following output, you have to register GATK:
$ gatk3
GATK jar file not found. Have you run "gatk3-register"?
  1. download the registration file from Broad Institute.

  2. register GATK

gatk3-register GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2

Q: Conda: using rule-specific environments

When running the pipeline with rule-based conda environments (--use-conda), snakemake will create several envs that store different packages needed by the pipeline. In order to properly install GATK if you are using this, you will fist need to built the conda environments:

snakemake --use-conda --conda-create-envs-only --cores N

Where N is the number of CPUs you want to use (1 for only one core, all for all of them)

You then need to find which of these environments contains GATK:

$ grep -i gatk /path/to/mapache/.snakemake/conda/*.yaml
/path/to/mapache/.snakemake/conda/82f2bb3b7e488f30c27f6219ba709caf.yaml:  - gatk = 3.8

and load the corresponding environment by copy-pasting it’s path (without the .yaml extension):

$ conda activate /path/to/mapache/.snakemake/conda/82f2bb3b7e488f30c27f6219ba709caf

Once this is done, you can install the licence with the command:

gatk3-register /path/to/GenomeAnalysisTK.jar