This is the primary DNA-mapping pipeline. It can be used both alone or upstream of the ATAC-seq and ChIP-seq pipelines. This has a wide array of options, including trimming and various QC steps (e.g., marking duplicates and plotting coverage and PCAs). In addition, basic coverage tracks are created to facilitate viewing the data in IGV.
The only requirement is a directory of gzipped fastq files. Files could be single or paired end, and the read extensions could be modified using the keys in the defaults.yaml
file below.
There is a configuration file in snakePipes/workflows/DNA-mapping/defaults.yaml
:
## General/Snakemake parameters, only used/set by wrapper or in Snakemake cmdl, but not in Snakefile pipeline: dna-mapping outdir: configFile: clusterConfigFile: local: False maxJobs: 5 ## directory with fastq files indir: ## preconfigured target genomes (mm9,mm10,dm3,...) , see /path/to/snakemake_workflows/shared/organisms/ ## Value can be also path to your own genome config file! genome: ## FASTQ file extension (default: ".fastq.gz") ext: '.fastq.gz' ## paired-end read name extension (default: ['_R1', "_R2"]) reads: [_R1, _R2] ## mapping mode mode: mapping aligner: Bowtie2 ## Number of reads to downsample from each FASTQ file downsample: ## Options for trimming trim: False trimmer: cutadapt trimmerOptions: ## Bin size of output files in bigWig format bwBinSize: 25 ## Run FASTQC read quality control fastqc: false ## Run computeGCBias quality control GCBias: false ## Retain only de-duplicated reads/read pairs dedup: false ## Retain only reads with at least the given mapping quality mapq: 0 ## Retain only reads mapping in proper pairs properPairs: false ## Mate orientation in paired-end experiments for Bowtie2 mapping ## (default "--fr" is appropriate for Illumina sequencing) mateOrientation: --fr ## other Bowtie2 stuff insertSizeMax: 1000 alignerOpts: plotFormat: png UMIBarcode: False bcPattern: NNNNCCCCCCCC #default: 4 base umi barcode, 8 base cell barcode (eg. RELACS barcode) UMIDedup: False UMIDedupSep: "_" UMIDedupOpts: ## Median/mean fragment length, only relevant for single-end data (default: 200) fragmentLength: 200 qualimap: false verbose: false
Many of these options can be more conveniently set on the command-line (e.g., --qualimap
sets qualimap: true
). However, you may need to change the reads:
setting if your paired-end files are not denoted by sample_R1.fastq.gz
and sample_R2.fastq.gz
, but rather sample_1.fastq.gz
and sample_2.fastq.gz
.
The DNA mapping pipeline will generate output of the following structure:
. ├── bamCoverage ├── Bowtie2 ├── deepTools_qc │ ├── bamPEFragmentSize │ ├── estimateReadFiltering │ ├── multiBamSummary │ ├── plotCorrelation │ ├── plotCoverage │ └── plotPCA ├── FASTQ ├── FastQC ├── filtered_bam ├── multiQC │ └── multiqc_data └── Sambamba
In addition to the FASTQ module results (see :ref:`running_snakePipes`), the workflow produces the following outputs:
- Bowtie2 : Contains the BAM files after mapping with Bowtie2 and indexed by Samtools.
- filtered_bam : Contains the BAM files filtered by the provided criteria, such as mapping quality (
--mapq
) or PCR duplicates (--dedup
). This file is used for most downstream analysis in the DNA-mapping and ChIP-seq/ATAC-seq pipeline.- bamCoverage : Contains the coverage files (bigWig format) produced from the BAM files by deepTools bamCoverage . The files are either raw, or 1x normalized (by sequencing depth). They are useful for plotting and inspecting the data in IGV.
- deepTools_qc : Contains various QC files and plots produced by deepTools on the filtered BAM files. These are very useful for evaluation of data quality. The folders are named after the tools. Please look at the deepTools documentation on how to interpret the outputs from each tool.
- Sambamba : Contains the alignment metrices evaluated on the BAM files by Sambamba.
A number of other directories may optionally be present if you specified read trimming, using Qualimap, or a variety of other options. These are typically self-explanatory.
A fair number of useful QC plots are or can be generated by the pipeline. These include correlation and PCA plots as well as the output from MultiQC.
.. argparse:: :func: parse_args :filename: ../snakePipes/workflows/DNA-mapping/DNA-mapping :prog: DNA-mapping :nodefault: