Galru allows for rapid spoligotyping for Mycobacterium tuberculosis directly from long read sequencing. It is fast and accurate. It requires a minimal amount of information to produce a spoligotype, and allows for near real-time typing when used to process sequencing data as it is produced by a Nanopore sequencer.
If you just want to quickly try out the software please try a Docker continer. This software is designed to run on Linux and OSX. It will not run on Windows.
To install Galru, first install conda with Python3 then run:
conda install -c conda-forge -c bioconda galru
Install Docker. There is a docker container which gets automatically built from the latest version of Galru. To install it:
docker pull quadraminstitute/galru
To use it you would use a command such as this (substituting in your filename/directories), using the example file in this repository:
docker run --rm -it -v /path/to/example_data:/example_data quadraminstitute/galru galru /example_data/example_reads.fastq
This is the most difficult installation method and assumes you know how to install dependancies yourself. Install the dependancies, as listed below. Then install galru
pip install galru
To install the development version:
pip install git+https://github.com/quadram-institute-bioscience/galru.git
To run the software provide a FASTQ or FASTA file of reads. This can also be an assembly. Short reads will not work.
galru example_reads.fastq
This will output a spoligotype in the format
1001111111111111111111111111000010110001111
where 1 indicates the spacer is present and 0 indicates it is absent.
usage: galru [options] uncorrected_long_reads.fastq
Spoligotyping from uncorrected long reads
positional arguments:
input_file Input FASTQ file of uncorrected long reads (optionally gzipped)
optional arguments:
-h, --help show this help message and exit
--db_dir DB_DIR, -d DB_DIR
Base directory for species databases, defaults to bundled (default: None)
--cas_fasta CAS_FASTA, -c CAS_FASTA
Cas gene FASTA file (optionally gzipped), defaults to bundled (default: None)
--technology {map-ont,map-pb,ava-pb,ava-ont}, -y {map-ont,map-pb,ava-pb,ava-ont}
Sequencing technology (default: map-ont)
--threads THREADS, -t THREADS
No. of threads to use (default: 1)
--output_file OUTPUT_FILE, -o OUTPUT_FILE
Output filename, defaults to STDOUT (default: None)
--extended_results, -x
Output extended results (default: False)
--gene_start_offset GENE_START_OFFSET, -g GENE_START_OFFSET
Only count CRISPR reads which cover this base (default: 30)
--min_mapping_quality MIN_MAPPING_QUALITY, -m MIN_MAPPING_QUALITY
Minimum mapping quality score (default: 10)
--qcov_margin QCOV_MARGIN, -q QCOV_MARGIN
Maximum perc coverage difference between CRISPR and read (default: 100)
--min_bitscore MIN_BITSCORE, -b MIN_BITSCORE
Minimum blast bitscore (default: 38)
--min_identity MIN_IDENTITY, -i MIN_IDENTITY
Minimum blast identity (default: 95)
--species SPECIES, -s SPECIES
Species name, use galru_species to see all available (default: Mycobacterium_tuberculosis)
--debug Turn on debugging and save intermediate files (default: False)
--verbose, -v Turn on verbose output (default: False)
--version show program's version number and exit
Galru is free software, licensed under GPLv3.
Please report any issues or to provide feedback please go to the issues page. If you make improvements to the software, add databases or extend profiles, please send us the changes though a pull request so that the whole community may benefit from your work.
"Rapid Mycobacterium tuberculosis spoligotyping from uncorrected long reads using Galru", Andrew J Page, Nabil-Fareed Alikhan, Michael Strinden, Thanh Le Viet, Timofey Skvortsov, bioRxiv 2020.05.31.126490; doi: https://doi.org/10.1101/2020.05.31.126490
galrú (Gal-roo) is the word for infection in Irish (Gaeilge).
- grep
- xargs
- find
- gunzip
- cd-hit-est (from cd-hit)
- minced
- bedtools
- samtools (1.3 or above)
- minimap2
- ncbi-genome-download
- blast+
- mlst
- fastaq (from pyfastaq)