MicroRunQC

Generate Metrics and Summary Statistics for paired-end Illumina Bacterial Whole-Genome Sequencing (WGS) fastq data. The pipeline was originally developed on the Galaxy platform and the workflow is made available.

Galaxy installation and setup

Install tools below from the Galaxy Tool Shed

trimmomatic
microrunqc

Import the MicroRunQC workflow. The workflow is intended for paired-collections of fastq files.

Dependencies for local installation

SKESA
- Strategic k-mer extension for scrupulous assemblies
mlst
- Scan contig files against traditional PubMLST typing schemes
trimmomatic
- A flexible read trimming tool for Illumina NGS data
bwa
- BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome
fastq-scan
- reads a FASTQ and outputs summary statistics (read lengths, per-read qualities, per-base qualities)

Create conda environment.

% conda create --name microrunqc
% conda activate microrunqc

Install dependencies using Conda and Bioconda

% conda install -c conda-forge -c bioconda -c defaults mlst skesa trimmomatic bwa fastq-scan

Install and setup from source

% cd $HOME
% git clone https://github.com/estrain/MicroRunQC.git
% export PATH=$PATH:$HOME/MicroRunQC/bin
% chmod a+x $HOME/MicroRunQC/bin/*
% microrunqc.py --help

Example

% microrunqc.py --forward forward.fastq.gz --reverse reverse.fastq.gz --cores 12 --output example

Output

Output is a tab delimited file.

Column	Description
File	Input filename for skesa, taken from forward read.
Contigs	Number of contigs in the de-novo SKESA assembly. Contigs smaller than 200 base-pairs (bp) are not counted.
Length	Total length of all contigs > 200bp. This should approximate the size of the genome for the target organism.
EstCov	Mean coverage for contigs in the assembly as reported by SKESA.
N50	Sequence length of the shortest contig at 50% of the total genome length.
MedianInsert	Distance between forward and reverse reads. Calculated by mapping reads to SKESA assembly using bwa.
MeanLength_R1	Mean length of forward read.
MeanLength_R2	Mean length of reverse read.
MeanQ_R1	Mean Q-score of forward read.
MeanQ_R2	Mean Q-score of reverse read.
Scheme	PubMLST database scheme (e.g. senterica for Salmonella enterica)
ST	Sequence Type
Loci	gene (allele number) – for example aroC(118)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
bin		bin
galaxy_workflows		galaxy_workflows
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MicroRunQC

Galaxy installation and setup

Dependencies for local installation

Install and setup from source

Example

Output

About

Releases

Packages

Languages

estrain/MicroRunQC

Folders and files

Latest commit

History

Repository files navigation

MicroRunQC

Galaxy installation and setup

Dependencies for local installation

Install and setup from source

Example

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages